minSNPs_nanopore

This documents the analysis performed in using minSNPs to analyse Nanopore sequence data. The analysis were performed on a HPC using Slurm scheduler, and the scripts may need to be adjusted accordingly when ran elsewhere.

Testing with minSNPs

Identifying the SNPs for the new PubMLST samples

Steps:

Use SNIPPY to call SNPs for the new sample (see Scripts/SNIPPY_RUN.sh).
Extract all the SNPs defined in megaalignment from SNIPPY result (see Scripts/extract_megaalignment_snps.R), output of final alignment mega_with_newdata.fasta is in figshare see here.
Assign CC metadata to new samples based on most similar sample in megaalignment (see Scripts/assign_cc_meta.R), see Results/new_sample_most_similar_mega.csv for the result containing the most similar sample in megaalignment and the CC to be assigned, Results/disagreement.csv for the samples with CC metadata from PubMLST that is different from the CC to be assigned.
Neighbour Joining tree is created with MEGA with mega_with_newdata.fasta as the input and all default parameters, output: Result/mega_with_newdata_NJT.nwk. see here for the interactive tree.

Sampling 400 random SNPs

All the SNPs are scrambled and the first 400 is taken, see Scripts/random_snps_selection.R.

Search string generation

For

SNPs, see Scripts/generate_snp_search_sequence.R.
gene sequences, see Scripts/generate_gene_search_sequence.R.

Major lineage assignment

Testing with lab generated Nanopore data

Steps:

Scan lab generated Nanopore data Scripts/scan_lab_nanopore.R.
Transform most similar isolate to most likely CC and aggregated all results for different number of SNPs or reads used : Scripts/aggregate_lab_result.R.

Testing with simulated long-read data

Steps:

Simulate long read with pbsim2 (see Scripts/simulated_long_read_generation.R), see Data/BIGSdb_3343897_1178826571_39602.csv for the list of data download from pubMLST and Results/sim_n_reads.csv output for number of reads generated.
Assign most likely CC for tested samples based on SNPs distance: Scripts/assign_cc_meta.R
Scan simulated Nanopore data Scripts/scan_simulated_read.R.
Transform most similar isolate to most likely CC and aggregated all results for different number of SNPs or reads used : Scripts/aggregate_simulation_result.R.

Gene detection

Testing with lab generated Nanopore data

Scan lab generated Nanopore data Scripts/scan_lab_gene.R.
Transform most similar isolate to most likely CC and aggregated all results for different number of SNPs or reads used : Scripts/aggregate_lab_gene.R.

Testing with simulated data

Scan simulated Nanopore data Scripts/scan_simulated_gene.R.
Transform most similar isolate to most likely CC and aggregated all results for different number of SNPs or reads used : Scripts/aggregate_simulation_gene.R.

Comparison

Comparison with Krocus

Simulated data:
- Script: Scripts/krocus_comparison.R
- Result: Results/krocus_full_result.csv and Results/krocus_summary.csv
Lab generated data:
- Script: Scripts/lab_krocus_comparison.R
- Result: Results/lab_krocus_full_result.csv and Results/lab_krocus_summary.csv

Comparison with Sketchy

Simulated data:
- Script: Scripts/sketchy_comparison.R
- Result: Results/sketchy_gene_summary.csv and Results/sketchy_summary.csv
Lab generated data:
- Script: Scripts/lab_sketchy_comparison.R
- Result: Results/lab_sketchy_gene_summary.csv and Results/lab_sketchy_summary.csv

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Data		Data
Results		Results
Scripts		Scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

Results

Results

Scripts

Scripts

README.md

README.md

Repository files navigation

minSNPs_nanopore

Testing with minSNPs

Identifying the SNPs for the new PubMLST samples

Sampling 400 random SNPs

Search string generation

Major lineage assignment

Testing with lab generated Nanopore data

Testing with simulated long-read data

Gene detection

Testing with lab generated Nanopore data

Testing with simulated data

Comparison

Comparison with Krocus

Comparison with Sketchy

About

Releases

Packages

Languages

ludwigHoon/minSNPs_nanopore

Folders and files

Latest commit

History

Repository files navigation

minSNPs_nanopore

Testing with minSNPs

Identifying the SNPs for the new PubMLST samples

Sampling 400 random SNPs

Search string generation

Major lineage assignment

Testing with lab generated Nanopore data

Testing with simulated long-read data

Gene detection

Testing with lab generated Nanopore data

Testing with simulated data

Comparison

Comparison with Krocus

Comparison with Sketchy

About

Resources

Stars

Watchers

Forks

Languages