Skip to content
Victoria Cepeda edited this page Aug 6, 2018 · 17 revisions

Reference-based Assembly Using MetaCompass

Intro to MetaCompass

MetaCompass, developed at University of Maryland, represents the first effective approach for reference-guided metagenomic assembly of low-abundance bacterial genomes that can complement and improve upon de novo metagenomic assembly methods. Given a set of reference genomes, a set of shotgun reads, and the alignment between each read and reference genome, MetaCompass applies ideas from AMOScmp building contigs from metagenomic samples. Briefly, MetaCompass selects reference genomes based on taxonomic profiles and then metagenomic reads are quickly mapped to the reference genomes. When building contigs, MetaCompass employ a greedy solution of the minimum set-covering problem to produce longer contigs.

overview Overview of the MetaCompass pipeline. Each color represent a different genome present in the environment. Short colored lines represent reads and long lines genomes. First, we use a marker gene approach to identify reference genomes that are most closely related to the data represented in the input a sample. We use the NCBI RefSeq genome database (June 2016) as the standard reference collection for MetaCompass. We only retain for further consideration the genomes estimated to be represented at sufficient depth of coverage. These genomes are aligned using Bowtie2. The resulting read alignments are then used to identify a minimal set of genomes that best explain all read alignments, then the read alignments are used to construct contigs. We developed the tool buildcontig to generate a consensus sequence for the contigs and then use Pilon to correct the contigs in a way that reflects the genome being assembled and to avoid biasing the reconstruction towards the reference sequence. Contigs may be broken at this stage if the metagenomic sequence diverges from the reference sequence. Finally the reads that were not included in the reference-guided process outlined above are assembled using MEGAHIT to reconstruct the metagenomic segments not represented in the reference collection.

MetaCompass Publications

Victoria Cepeda, Bo Liu, Mathieu Almeida, Christopher M. Hill, Sergey Koren, Todd J. Treangen, Mihai Pop. bioRxiv 212506; doi: https://doi.org/10.1101/212506

MetaCompass Software downloads

The MetaCompass software package can be downloaded here on GitHub. https://github.com/marbl/MetaCompass

Contact

vcepeda@cs.umd.edu