-
Notifications
You must be signed in to change notification settings - Fork 39
FAQ
Try the multiple kmer strategies (via the --assembler
option):
- try different kmers for MitoAssemble.
- try Megahit assembler
- try the SPAdes assembler
- use more data (also check the
--data_size_for_mt_assembly
option)
See https://github.com/linzhi2013/MitoZ/wiki/New-Features#2-two-new-assemblers.
There are several aspects you should consider:
1. The sequencing depth along the mitochondrial sequence(s)
Check the circos.png
or circos.svg
files, and circos.depth.txt
for the exact coverage of each site. Extreme low (e.g. 0X) and high coverage regions can be dubious.
2. Internal stop codons within PCGs
Could be due to:
- Ns in the sequences.
If the region around the internal stop codons has high sequencing coverage,
- please check if you are using the correct genetic code (
--genetic_code
) - could be Numts?
- could be contaminants (i.e. belonging to other clades instead of your target species)
You can align the nucleotide and/or protein sequences from MitoZ (in https://github.com/linzhi2013/MitoZ/wiki/Tutorial#5-the-final-resulting-files) with publicly known genes of closely related species from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/) using Ugene (http://ugene.net/) or MEGA (https://www.megasoftware.net/), and check if there are indels, Ns, internal stop codons, etc.
3. The circularity of your mitochondrial sequence
-
Check the
summary.txt
file -
check the
*.overlap_information
file. How long is the overlapping region? Is the overlapping region a simple repeat (e.g. 'AAAAAA', which is not reliable as an indicator of circularity)?If you are using your own mitogenome file, make sure the sequence header line has
topology=circular
ortopology=linear
.
4. Is there any gene missing?
Sometimes, even when the result mitogenome is complete ('circular'), due to the limitation of the annotation program, e.g., MiTFi for tRNA, some tRNA genes can still be missing.
See also https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database.
5. Can my result sequences be contaminations or NUMTs?
Blast your result sequences again NCBI NT database.
MitoZ was designed to assemble mitogenomes from WGS data of a single species.
However, some of my runs also show that MitoZ is able to assemble mitogenomes from UCE/Target-enrichment data, so it is worth giving MitoZ a try, especially now MitoZ has three de novo assemblers, i.e., MitoAssemble, Megahit, and SPAdes (--assembler
option). The Megahit and SPAdes should be more powerful to handle these cases since they both use multiple kmers for assembly. Meanwhile, you can also manually try different kmers (--kmers
option) for assembly with MitoAssemble.
I haven't officially tested MitoZ on transcriptome data yet, although we did try it for some specimens already:
- Song, Hojun, et al. "Phylogenomic analysis sheds light on the evolutionary pathways towards acoustic communication in Orthoptera." Nature communications 11.1 (2020): 1-16. https://doi.org/10.1038/s41467-020-18739-4
MitoZ performs best when the WGS data consisting a single species. However, MitoZ has the ability to output any scaffolds with five or more protein-coding genes (PCGs), and in theory, all three de novo assemblers (Megahit, SPAdes, mitoAssemble) should work for datasets with multiple species. For example, we used SOAPdenovo-Trans (mitoAssemble was modified from SOAPdenovo-Trans) to assemble mitogenomes from a single dataset containing 49 species (other assemblers were used as well):
- Tang, Min, et al. "Multiplex sequencing of pooled mitochondrial genomes—a crucial step toward biodiversity analysis using mito-metagenomics." Nucleic acids research 42.22 (2014): e166-e166. https://doi.org/10.1093/nar/gku917
If your mixed sample dataset only got one mitogenome (in the mt_assembly/megahit/DM01.megahit.mitogenome.fa
file), you can check these intermediate files for other mitogenomes:
(see https://github.com/linzhi2013/MitoZ/wiki/Tutorial#2-the-mitogenome-assembly-step )
So, if MitoZ outputs multiple long scaffolds with many PCGs, you can annotate
each sequence separately, they might correspond to different species.
In the --clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}
option, only a few animals clades are defined. What if my sample doesn't belong to these clades?
The answer is that you can still use MitoZ.
You can choose one of the clades used for mitochondrial sequence searching (the --clade
option), and at the same time, specify a loose --requiring_taxa
value so that it would cover your target species (e.g. Metazoa) (and thus your target species won't be filtered out, but MitoZ should still work if the mitogenome is long with >= 5 PCGs even you choose a different rank not containing your target species).
And set a proper --genetic_code
for your species.
In case the existing database doesn't work well for the gene annotation, you can check out https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ-s-database
Please also refer to https://github.com/linzhi2013/MitoZ/wiki/Known-issues#10-valueerror-can-not-find-taxid-for-nemertea-ribbon-worms-maybe-its-a-misspelling.
They are explained at the help instruction of the mitoz visualize
command, see https://github.com/linzhi2013/MitoZ/wiki/The-%27visualize%27-subcommand or execute $ mitoz visualize -h
to see it.
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command