This documents the analysis performed in using minSNPs to analyse Nanopore sequence data. The analysis were performed on a HPC using Slurm scheduler, and the scripts may need to be adjusted accordingly when ran elsewhere.
Steps:
- Use SNIPPY to call SNPs for the new sample (see
Scripts/SNIPPY_RUN.sh
). - Extract all the SNPs defined in megaalignment from SNIPPY result (see
Scripts/extract_megaalignment_snps.R
), output of final alignmentmega_with_newdata.fasta
is in figshare see here. - Assign CC metadata to new samples based on most similar sample in megaalignment (see
Scripts/assign_cc_meta.R
), seeResults/new_sample_most_similar_mega.csv
for the result containing the most similar sample in megaalignment and the CC to be assigned,Results/disagreement.csv
for the samples with CC metadata from PubMLST that is different from the CC to be assigned. - Neighbour Joining tree is created with MEGA with
mega_with_newdata.fasta
as the input and all default parameters, output:Result/mega_with_newdata_NJT.nwk
. see here for the interactive tree.
All the SNPs are scrambled and the first 400 is taken, see Scripts/random_snps_selection.R
.
For
- SNPs, see
Scripts/generate_snp_search_sequence.R
. - gene sequences, see
Scripts/generate_gene_search_sequence.R
.
Steps:
- Scan lab generated Nanopore data
Scripts/scan_lab_nanopore.R
. - Transform most similar isolate to most likely CC and aggregated all results for different number of SNPs or reads used :
Scripts/aggregate_lab_result.R
.
Steps:
- Simulate long read with pbsim2 (see
Scripts/simulated_long_read_generation.R
), seeData/BIGSdb_3343897_1178826571_39602.csv
for the list of data download from pubMLST andResults/sim_n_reads.csv
output for number of reads generated. - Assign most likely CC for tested samples based on SNPs distance:
Scripts/assign_cc_meta.R
- Scan simulated Nanopore data
Scripts/scan_simulated_read.R
. - Transform most similar isolate to most likely CC and aggregated all results for different number of SNPs or reads used :
Scripts/aggregate_simulation_result.R
.
- Scan lab generated Nanopore data
Scripts/scan_lab_gene.R
. - Transform most similar isolate to most likely CC and aggregated all results for different number of SNPs or reads used :
Scripts/aggregate_lab_gene.R
.
- Scan simulated Nanopore data
Scripts/scan_simulated_gene.R
. - Transform most similar isolate to most likely CC and aggregated all results for different number of SNPs or reads used :
Scripts/aggregate_simulation_gene.R
.
- Simulated data:
- Script:
Scripts/krocus_comparison.R
- Result:
Results/krocus_full_result.csv
andResults/krocus_summary.csv
- Script:
- Lab generated data:
- Script:
Scripts/lab_krocus_comparison.R
- Result:
Results/lab_krocus_full_result.csv
andResults/lab_krocus_summary.csv
- Script:
- Simulated data:
- Script:
Scripts/sketchy_comparison.R
- Result:
Results/sketchy_gene_summary.csv
andResults/sketchy_summary.csv
- Script:
- Lab generated data:
- Script:
Scripts/lab_sketchy_comparison.R
- Result:
Results/lab_sketchy_gene_summary.csv
andResults/lab_sketchy_summary.csv
- Script: