Skip to content
Guanliang MENG edited this page Jul 6, 2023 · 20 revisions

Fragmented results or few genes recovered

Try the multiple kmer strategies (via the --assembler option):

  1. try different kmers for MitoAssemble.
  2. try Megahit assembler
  3. try the SPAdes assembler
  4. use more data (also check the --data_size_for_mt_assembly option)

See https://github.com/linzhi2013/MitoZ/wiki/New-Features#2-two-new-assemblers.

How good are my results?

There are several aspects you should consider:

1. The sequencing depth along the mitochondrial sequence(s)

Check the circos.png or circos.svg files, and circos.depth.txt for the exact coverage of each site. Extreme low (e.g. 0X) and high coverage regions can be dubious.

2. Internal stop codons within PCGs

Could be due to:

  • Ns in the sequences.

If the region around the internal stop codons has high sequencing coverage,

  • please check if you are using the correct genetic code (--genetic_code)
  • could be Numts?
  • could be contaminants (i.e. belonging to other clades instead of your target species)

You can align the nucleotide and/or protein sequences from MitoZ (in https://github.com/linzhi2013/MitoZ/wiki/Tutorial#5-the-final-resulting-files) with publicly known genes of closely related species from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/) using Ugene (http://ugene.net/) or MEGA (https://www.megasoftware.net/), and check if there are indels, Ns, internal stop codons, etc.

3. The circularity of your mitochondrial sequence

  • Check the summary.txt file

  • check the *.overlap_information file. How long is the overlapping region? Is the overlapping region a simple repeat (e.g. 'AAAAAA', which is not reliable as an indicator of circularity)?

    If you are using your own mitogenome file, make sure the sequence header line has topology=circular or topology=linear.

4. Is there any gene missing?

Sometimes, even when the result mitogenome is complete ('circular'), due to the limitation of the annotation program, e.g., MiTFi for tRNA, some tRNA genes can still be missing.

See also https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database.

5. Can my result sequences be contaminations or NUMTs?

Blast your result sequences again NCBI NT database.

Can I use MitoZ for mitogenome assembly based on UCE/Target-enrichment/Transcriptome data?

MitoZ was designed to assemble mitogenomes from WGS data of a single species.

However, some of my runs also show that MitoZ is able to assemble mitogenomes from UCE/Target-enrichment data, so it is worth giving MitoZ a try, especially now MitoZ has three de novo assemblers, i.e., MitoAssemble, Megahit, and SPAdes (--assembler option). The Megahit and SPAdes should be more powerful to handle these cases since they both use multiple kmers for assembly. Meanwhile, you can also manually try different kmers (--kmers option) for assembly with MitoAssemble.

I haven't officially tested MitoZ on transcriptome data yet, although we did try it for some specimens already:

  • Song, Hojun, et al. "Phylogenomic analysis sheds light on the evolutionary pathways towards acoustic communication in Orthoptera." Nature communications 11.1 (2020): 1-16. https://doi.org/10.1038/s41467-020-18739-4

Can I apply MitoZ to metagenomic (multiple species) dataset?

MitoZ performs best when the WGS data consisting a single species. However, MitoZ has the ability to output any scaffolds with five or more protein-coding genes (PCGs), and in theory, all three de novo assemblers (Megahit, SPAdes, mitoAssemble) should work for datasets with multiple species. For example, we used SOAPdenovo-Trans (mitoAssemble was modified from SOAPdenovo-Trans) to assemble mitogenomes from a single dataset containing 49 species (other assemblers were used as well):

  • Tang, Min, et al. "Multiplex sequencing of pooled mitochondrial genomes—a crucial step toward biodiversity analysis using mito-metagenomics." Nucleic acids research 42.22 (2014): e166-e166. https://doi.org/10.1093/nar/gku917

If your mixed sample dataset only got one mitogenome (in the mt_assembly/megahit/DM01.megahit.mitogenome.fa file), you can check these intermediate files for other mitogenomes:

image

(see https://github.com/linzhi2013/MitoZ/wiki/Tutorial#2-the-mitogenome-assembly-step )

So, if MitoZ outputs multiple long scaffolds with many PCGs, you can annotate each sequence separately, they might correspond to different species.

Can I apply MitoZ to other clades?

In the --clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges} option, only a few animals clades are defined. What if my sample doesn't belong to these clades?

The answer is that you can still use MitoZ.

You can choose one of the clades used for mitochondrial sequence searching (the --clade option), and at the same time, specify a loose --requiring_taxa value so that it would cover your target species (e.g. Metazoa) (and thus your target species won't be filtered out, but MitoZ should still work if the mitogenome is long with >= 5 PCGs even you choose a different rank not containing your target species).

And set a proper --genetic_code for your species.

In case the existing database doesn't work well for the gene annotation, you can check out https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ-s-database

Please also refer to https://github.com/linzhi2013/MitoZ/wiki/Known-issues#10-valueerror-can-not-find-taxid-for-nemertea-ribbon-worms-maybe-its-a-misspelling.

Meanings of the different tracks on the Circos plot

They are explained at the help instruction of the mitoz visualize command, see https://github.com/linzhi2013/MitoZ/wiki/The-%27visualize%27-subcommand or execute $ mitoz visualize -h to see it.

Clone this wiki locally