This is a tutorial on genome assembly prepared for an EMBO course, taking place in Florence (Italy) from September 7th to 13th. The tutorial addresses all steps in a genome assembly pipeline of a eukaryote using high-accuracy PacBio HiFi, Nanopore R10.4 long reads and Hi-C. These steps include analyzing long reads prior to assembly, generating the first draft phased assembly, evaluating the quality of the assembly, purging haplotigs to generate collapsed assemblies, polishing to improve accuracy, removing contigs from contaminants, and Hi-C scaffolding to obtain chromosome-level scaffolds.
Datasets for this tutorial are available in Galaxy. They are a subset of datasets published in European Nucleotide Archive for the nematodes Plectus sambesii project PRJEB74285 and Propanagrolaimus sp. JU765 project PRJEB87118.
Take a look at these tutorials/scripts if you want to look deeper into genome assembly:
- Vertebrate Genome Project Galaxy Pipeline : a tutorial for the VGP Galaxy pipeline to easily run your assemblies
- Nanopore and PacBio HiFi assembly scripts for the paper "Revisiting genomes of non-model species with long reads yields new insights into their biology and evolution" that includes extra command lines for genome assembly
- A review on genome assembly for non-vertebrate animals that also includes an exhaustive list of tools for genome assembly
