New Features

1. Faster raw data filter

Now we use the Fastp (https://github.com/OpenGene/fastp) program to raw data filter, which is much faster.

By default, MitoZ only uses a subset (5 Gbp) of raw fastq data for mitogenome assembly. You can change the portion of data to be used via the --data_size_for_mt_assembly option, or set --data_size_for_mt_assembly 0 to tell MitoZ to use all raw fastq data for assembly.

In MitoZ >=3.5, we use the --data_size_for_mt_assembly <float1>,<float2> style, which can subsample the raw data (but not clean data!). The float1 means the size (Gbp) of raw data to be subsampled, while the float2 means the size of clean data must be >= float2 Gbp, otherwise MitoZ will STOP running! When only float1 is set, float2 is assumed to be 0.

(1) Set float1 to be 0 if you want to use ALL raw data;

(2) Set 0,0 if you want to use ALL raw data and do NOT interrupt MitoZ even if you got very little clean data.

Don't forget to cite the program when you use this function:

Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560

In MitoZ 3.5, if you want to subsample your input clean data, try --skip_filter and --data_size_for_mt_assembly <float1>,<float2> at the same time. For example, --skip_filter --data_size_for_mt_assembly 0,5 will extract 5Gbp of the input clean data.

2. Two new assemblers

Now we include Megahit (https://github.com/voutcn/megahit) and SPAdes (https://github.com/ablab/spades) for mitogenome assembly. These two programs try multiple kmers during one assembly run, which might sometimes achieve better results when the MitoAssemble doesn't, or versus. See warnings here: https://github.com/linzhi2013/MitoZ/wiki/Known-issues#8-megahit-gets-very-long-sequences.

With the two new de novo assemblers and the multi-kmer mode for MitoAssemble (see below), it is also more possible to achieve better mitogenome assembly results from UCE/target-enrichment/hybrid-enrichment/transcriptome/etc data (https://github.com/linzhi2013/MitoZ/wiki/FAQ#can-i-use-mitoz-for-mitogenome-assembly-based-on-ucetarget-enrichmenttranscriptome-data)
Multiple-species pooling dataset? See https://github.com/linzhi2013/MitoZ/wiki/FAQ#can-i-apply-mitoz-to-metagenomic-multiple-species-dataset
The input data size of fastq files to MitoZ can now be larger than that of the previous versions of MitoZ (in case you want to use larger dataset for mitogenome assembly).

The two programs also support limiting the RAM usage via the --memory option, this is useful when the users' servers do not have enough memory. But remember that, if this value is too small (especially when you use a big --thread_number, e.g. 24), the two programs may fail to run.

To specify a specific assembler, use the --assembler option.

===Warning: --assembler spades only accepts paired-end data, which means that you need to provide both --fq1 and --fq2, and they must be paired!===

If your fq1 and fq2 are not properly paired, you may get errors like this https://github.com/linzhi2013/MitoZ/issues/193, https://github.com/ablab/spades/issues/420, and https://www.biostars.org/p/311603/ and https://www.biostars.org/p/9514582/ .

The solution could be using the https://github.com/linsalrob/fastq-pair tool to correct your fastq files.

Don't forget to cite them when you use them for mitogenome assembly:

Li, Dinghua, et al. "MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph." Bioinformatics 31.10 (2015): 1674-1676. https://doi.org/10.1093/bioinformatics/btv033
Nurk, Sergey, et al. "metaSPAdes: a new versatile metagenomic assembler." Genome research 27.5 (2017): 824-834. http://www.genome.org/cgi/doi/10.1101/gr.213959.116.

See also https://github.com/linzhi2013/MitoZ/wiki/Tutorial.

3. Multi-kmers for MitoAssemble

You can now set multiple kmers for the MitoAssemble program, for example, --kmers 51 71 91. Depending on your data, some kmers might achieve better results than others sometimes. In this case, MitoZ simply performs independent runs of MitoAssemble with different kmers.

4. Customizing the annotation database

Sometimes, MitoZ fails to annotate some protein-coding genes (e.g. ATP8 is very divergent for some clades), mainly due to a lack of a more closely related annotation database. When this is the case, users now can easily customize the annotation database for their own samples. Please refer to https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database for more details.

5. Customizing Sqn template

--template_sbt <file>
                        The sqn template to generate the resulting genbank file. Go to
                        https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/#Template to generate your own template
                        file if you like. ['/home/gmeng/dev/MitoZ_private/mitoz/annotate/script/template.sbt']

About:

Commands:

Usages:

MitoZ-tools:

Provide feedback

Saved searches