Parameter combination recommendations #215

kiran-lee · 2024-07-03T09:31:48Z

on which platform/server? (Windows? Windows Sublinux? MacOS? Ubuntu? etc.)

Linux

MitoZ version?

3.6

How did you install MitoZ? (e.g. Docker, Udocker, Singularity, Conda-Pack, Conda, or source code)

Conda

Did you run a test after your installation, and was the test run okay?

Yes. OK.

How much data (roughly) did you use for mitogenome assembly? e.g. 5Gbp?

25 Gbp.

The command you used?

mitoz all
--outprefix sw
--thread_number 20
--clade Chordata
--requiring_taxa Chordata
--genetic_code 2
--species_name "Seychelles warbler"
--fq1 102_ACTTAGATCG-CGGAATTCTT_L002__trimmed_paired_R1.fastq.gz
--fq2 102_ACTTAGATCG-CGGAATTCTT_L002__trimmed_paired_R2.fastq.gz
--fastq_read_length 151
--data_size_for_mt_assembly 25,0
--assembler megahit
--kmers_megahit 39 59 79 99 119 141
--memory 100
--requiring_taxa Chordata
--min_abundance 0

Problem description

From your experience do you have suggestions for combinations of parameters to use on a sample of raw paired-end reads, with mean read depth of 15x?

I have tried 13 combinations that vary in the 1) sample used (either a ~17x coverage or 10x coverage sample), 2) assembler used (megahit or spades), 3) data size used for assembly (5, 25 ,50 and 80), 4) kmers ("Large
39 59 79 99 119 141" or "Small 21 31 41 51 61 71 81 91”) and 5) whether reads were trimmed or not. I attach the below table summarising the combinations I have tried (MitoZ_combinations.xlsx).

The command that works best (attached above) finds all genes but is non-circular and produces two seq_id (combo5summary.txt). The read depth across the genome looks OK apart from the beginning (combo5circos.depth.txt). This is the command :

mitoz all
--outprefix sw
--thread_number 20
--clade Chordata
--requiring_taxa Chordata
--genetic_code 2
--species_name "Seychelles warbler"
--fq1 102_ACTTAGATCG-CGGAATTCTT_L002__trimmed_paired_R1.fastq.gz
--fq2 102_ACTTAGATCG-CGGAATTCTT_L002__trimmed_paired_R2.fastq.gz
--fastq_read_length 151
--data_size_for_mt_assembly 25,0
--assembler megahit
--kmers_megahit 39 59 79 99 119 141
--memory 100
--requiring_taxa Chordata
--min_abundance 0

The raw paired-end reads can be found here:
102: https://cgr.liv.ac.uk/illum/LIMS26629_51a15827930a0b65/Raw/Sample_102/
53: https://cgr.liv.ac.uk/illum/LIMS25133_4f8b5ec41474a239/Raw/Sample_53-11998DH0147L01_4879/

Log messages from MitoZ (stdout and stderr, e.g., both `m.log` and `m.err` files)

Attached as combo5.log
combo5.log
and combo5errorsummaryval.txt
combo5errorsummaryval.txt

The text was updated successfully, but these errors were encountered:

linzhi2013 · 2024-07-04T13:29:58Z

Hi @kiran-lee ,

Thanks for your detailed explaination!

Based on my experience on mammals (your samples are birds), 2-5Gbp or 8Gbp is good enough for assembling circular mitogenome.

I have no better recommendations now. But maybe you can map all the raw data to the mitogenomes of some closely related species? And use a loose cutoff to keep many alignable reads. Then use the mapped reads to assemble the mitogenome with MitoZ?

Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter combination recommendations #215

Parameter combination recommendations #215

kiran-lee commented Jul 3, 2024 •

edited

Loading

linzhi2013 commented Jul 4, 2024

Parameter combination recommendations #215

Parameter combination recommendations #215

Comments

kiran-lee commented Jul 3, 2024 • edited Loading

on which platform/server? (Windows? Windows Sublinux? MacOS? Ubuntu? etc.)

MitoZ version?

How did you install MitoZ? (e.g. Docker, Udocker, Singularity, Conda-Pack, Conda, or source code)

Did you run a test after your installation, and was the test run okay?

How much data (roughly) did you use for mitogenome assembly? e.g. 5Gbp?

The command you used?

Problem description

Log messages from MitoZ (stdout and stderr, e.g., both m.log and m.err files)

linzhi2013 commented Jul 4, 2024

kiran-lee commented Jul 3, 2024 •

edited

Loading

Log messages from MitoZ (stdout and stderr, e.g., both `m.log` and `m.err` files)