Skip to content

Repository to hold code and data for the How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies manuscript

License

Notifications You must be signed in to change notification settings

gbouras13/depth_vs_polishing_analysis

Repository files navigation

How Low Can You Go? Short-read polishing of Oxford Nanopore bacterial genome assemblies Code Repository

This repository holds code and data for this manuscript:
Bouras G, Judd LM, Edwards RA, Vreugde S, Stinear TP, Wick RR. How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies. Microbial Genomics. 2024. doi:10.1099/mgen.0.001254.

Contents:

  • figures: contains all of the manuscript's main and supplementary figures along with their captions.
  • supp_tables.xlsx: contains the paper's supplementary tables.
  • ont_assemblies: contains the ONT-only Trycycler assemblies used as a starting point for polishing.
  • reference_assemblies: contains the polished and manually curated assemblies used as a ground truth.
  • pypolca_example_plot: contains code to simulate reads, errors and make (Figure 1).
  • main_analysis: contains the read subsampling, polishing and plotting commands for the main analysis (Figures 2, S2, S3 and S7).
  • errors_in_repeats: contains the details of the errors-in-repeats analysis (Figure S1).
  • long_homopolymer: contains the details of the long-homopolymer analysis (Figure S4).
  • error_characterisation: contains the detailed error characterisation of the 37 existing errors and all polisher introduced errors (Table S2 and Figures S5 and S6).
  • hybracter_analysis: contains the read subsampling assembly and plotting commands for the Hybracter analysis in the paper (Figures S8 and S9).
  • reference_chromosome_assemblies_hybracter: contains the polished and manually curated assemblies used as a ground truth, chromosomes only. Used for the Hybracter analysis.
  • low_quality_draft: contains the details of the polishing analysis using low-quality draft assemblies (Figure S10).
  • parameter_sweep: contains the details of the low-depth parameter sweep analysis (Table S6).
  • compare_assemblies.py: assembly comparison script used for counting/characterising errors.
  • hapog: contains additional figure panels for Hapo-G results (produced after the manuscript was published).

ONT and Illumina reads are not included in this repository due to size, but they can be found on SRA:

Genome ONT reads Illumina reads
Campylobacter jejuni (ATCC-33560) SRR27638397 SRR26899120
Campylobacter lari (ATCC-35221) SRR27638396 SRR26899115
Escherichia coli (ATCC-25922) SRR27638398 SRR26899128
Listeria ivanovii (ATCC-19119) SRR27638399 SRR26899136
Listeria monocytogenes (ATCC-BAA-679) SRR27638394 SRR26899101
Listeria welshimeri (ATCC-35897) SRR27638395 SRR26899109
Salmonella enterica (ATCC-10708) SRR27638402 SRR26899135
Vibrio cholerae (ATCC-14035) SRR27638401 SRR26899095
Vibrio parahaemolyticus (ATCC-17802) SRR27638400 SRR26899141

These are easily downloaded using the fastq-dl program e.g.

CPUS=16
fastq-dl --accession SRR27638397 --cpus $CPUS
fastq-dl --accession SRR27638396 --cpus $CPUS
fastq-dl --accession SRR27638398 --cpus $CPUS
fastq-dl --accession SRR27638399 --cpus $CPUS
fastq-dl --accession SRR27638394 --cpus $CPUS
fastq-dl --accession SRR27638395 --cpus $CPUS
fastq-dl --accession SRR27638402 --cpus $CPUS
fastq-dl --accession SRR27638401 --cpus $CPUS
fastq-dl --accession SRR27638400 --cpus $CPUS

About

Repository to hold code and data for the How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies manuscript

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published