Skip to content

New Features

Guanliang MENG edited this page May 19, 2023 · 32 revisions

1. Faster raw data filter

Now we use the Fastp (https://github.com/OpenGene/fastp) program to raw data filter, which is much faster.

By default, MitoZ only uses a subset (5 Gbp) of raw fastq data for mitogenome assembly. You can change the portion of data to be used via the --data_size_for_mt_assembly option, or set --data_size_for_mt_assembly 0 to tell MitoZ to use all raw fastq data for assembly.

In MitoZ >=3.5, we use the --data_size_for_mt_assembly <float1>,<float2> style, which can subsample the raw data (but not clean data!). The float1 means the size (Gbp) of raw data to be subsampled, while the float2 means the size of clean data must be >= float2 Gbp, otherwise MitoZ will STOP running! When only float1 is set, float2 is assumed to be 0.

(1) Set float1 to be 0 if you want to use ALL raw data;

(2) Set 0,0 if you want to use ALL raw data and do NOT interrupt MitoZ even if you got very little clean data.

Don't forget to cite the program when you use this function:

In MitoZ 3.5, if you want to subsample your input clean data, try --skip_filter and --data_size_for_mt_assembly <float1>,<float2> at the same time. For example, --skip_filter --data_size_for_mt_assembly 0,5 will extract 5Gbp of the input clean data.

2. Two new assemblers

Now we include Megahit (https://github.com/voutcn/megahit) and SPAdes (https://github.com/ablab/spades) for mitogenome assembly. These two programs try multiple kmers during one assembly run, which might sometimes achieve better results when the MitoAssemble doesn't, or versus. See warnings here: https://github.com/linzhi2013/MitoZ/wiki/Known-issues#8-megahit-gets-very-long-sequences.

The two programs also support limiting the RAM usage via the --memory option, this is useful when the users' servers do not have enough memory. But remember that, if this value is too small (especially when you use a big --thread_number, e.g. 24), the two programs may fail to run.

To specify a specific assembler, use the --assembler option.

===Warning: --assembler spades only accepts paired-end data, which means that you need to provide both --fq1 and --fq2, and they must be paired!===

If your fq1 and fq2 are not properly paired, you may get errors like this https://github.com/linzhi2013/MitoZ/issues/193, https://github.com/ablab/spades/issues/420, and https://www.biostars.org/p/311603/ and https://www.biostars.org/p/9514582/ .

The solution could be using the https://github.com/linsalrob/fastq-pair tool to correct your fastq files.

Don't forget to cite them when you use them for mitogenome assembly:

See also https://github.com/linzhi2013/MitoZ/wiki/Tutorial.

3. Multi-kmers for MitoAssemble

You can now set multiple kmers for the MitoAssemble program, for example, --kmers 51 71 91. Depending on your data, some kmers might achieve better results than others sometimes. In this case, MitoZ simply performs independent runs of MitoAssemble with different kmers.

4. Customizing the annotation database

Sometimes, MitoZ fails to annotate some protein-coding genes (e.g. ATP8 is very divergent for some clades), mainly due to a lack of a more closely related annotation database. When this is the case, users now can easily customize the annotation database for their own samples. Please refer to https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database for more details.

5. Customizing Sqn template

--template_sbt <file>
                        The sqn template to generate the resulting genbank file. Go to
                        https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/#Template to generate your own template
                        file if you like. ['/home/gmeng/dev/MitoZ_private/mitoz/annotate/script/template.sbt']
Clone this wiki locally