Skip to content

The assemble subcommand

Guanliang MENG edited this page Jun 22, 2023 · 1 revision

You can use this subcommand to assemble mitogenomes (not-yet-annotated!) from your fastq files.

$ mitoz assemble -h
usage: mitoz assemble [-h] [--workdir <STR>] --outprefix <STR> [--thread_number <INT>] --fq1 <file>
                      [--fq2 <file>] [--insert_size <INT>] [--fastq_read_length <INT>]
                      [--assembler {mitoassemble,spades,megahit}] [--tmp_dir <STR>] [--kmers <INT> [<INT> ...]]
                      [--kmers_megahit <INT> [<INT> ...]] [--kmers_spades <INT> [<INT> ...]] [--memory <INT>]
                      [--resume_assembly] [--profiles_dir <STR>] [--slow_search] [--filter_by_taxa]
                      --requiring_taxa <STR> [--requiring_relax {0,1,2,3,4,5,6}] [--min_abundance <float>]
                      [--abundance_pattern <STR>] [--genetic_code <INT>]
                      [--clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}]

Mitochondrial genome assembly from input fastq files.

optional arguments:
  -h, --help            show this help message and exit

Common arguments:
  --workdir <STR>       workdir [./]
  --outprefix <STR>     output prefix
  --thread_number <INT>
                        thread number. Caution: For spades, --thread_number 32 can take 150 GB RAM! Setting this
                        to 8 to 16 is typically good. [8]

Input fastq files:
  --fq1 <file>          fastq 1 file. Set only this option but not --fastq2 means SE data. [required]
  --fq2 <file>          fastq 2 file (optional for mitoassemble and megahit, required for spades)
  --insert_size <INT>   insert size of input fastq files [250]
  --fastq_read_length <INT>
                        read length of fastq reads, used by mitoAssemble. [150]

Assembly arguments:
  --assembler {mitoassemble,spades,megahit}
                        Assembler to be used. [megahit]
  --tmp_dir <STR>       Set temp directory for megahit if necessary (See
                        https://github.com/linzhi2013/MitoZ/issues/176)
  --kmers <INT> [<INT> ...]
                        kmer size(s) to be used. Multiple kmers can be used, separated by space. Only for
                        mitoassemble [71]
  --kmers_megahit <INT> [<INT> ...]
                        kmer size(s) to be used. Multiple kmers can be used, separated by space. Only for
                        megahit [21 29 39 59 79 99 119 141]
  --kmers_spades <INT> [<INT> ...]
                        kmer size(s) to be used. Multiple kmers can be used, separated by space. Only for spades
                        ['auto']
  --memory <INT>        memory size limit for spades/megahit, no enough memory will make the two programs halt
                        or exit [50]
  --resume_assembly     to resume previous assembly running [False]

Search mitochondrial sequences arguments:
  --profiles_dir <STR>  Directory cotaining 'CDS_HMM/', 'MT_database/' and 'rRNA_CM/'.
                        [/home/gmeng/.conda/envs/mybase/envs/mitozEnv.test3.6/lib/python3.8/site-
                        packages/mitoz/profiles]
  --slow_search         By default, we firstly use tiara to perform quick sequence classification (100 times
                        faster than usual!), however, it is valid only when your mitochondrial sequences are >=
                        3000 bp. If you have missing genes, set '--slow_search' to use the tradicitiona search
                        mode. [False]
  --filter_by_taxa      filter out non-requiring_taxa sequences by mito-PCGs annotation to do taxa
                        assignment.[True]
  --requiring_taxa <STR>
                        filtering out non-requiring taxa sequences which may be contamination [required]
  --requiring_relax {0,1,2,3,4,5,6}
                        The relaxing threshold for filtering non-target-requiring_taxa. The larger digital means
                        more relaxing. [0]
  --min_abundance <float>
                        the minimum abundance of sequence required. Set this to any value <= 0 if you do NOT
                        want to filter sequences by abundance [10]
  --abundance_pattern <STR>
                        the regular expression pattern to capture the abundance information in the header of
                        sequence ['abun\=([0-9]+\.*[0-9]*)']
  --genetic_code <INT>  which genetic code table to use? 'auto' means determined by '--clade' option. [auto]
  --clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}
                        which clade does your species belong to? [Arthropoda]

Single-end (SE)data: set --fq1 only.

Paired-end (PE) data: set both --fq1 and --fq2.

Example:

mitoz assemble \
--fq1 test.1.fq.gz \
--fq2 test.2.fq.gz \
--assembler megahit  \
--requiring_taxa Arthropoda \
--genetic_code 5 \
--clade Arthropoda
Clone this wiki locally