Any additional hints to reduce memory usage? #27

quokkamole · 2019-12-21T12:57:21Z

GetOrganelle v1.6.2e

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0]
Python libs: numpy 1.17.4; sympy 1.4; scipy 1.3.2; psutil 5.4.2
Dependencies: Bowtie2 2.3.5.1; SPAdes 3.13.0; Blast 2.2.30; Bandage 0.8.1
./get_organelle_from_reads.py -1 /home/xub/host/opt/sra_processor/output/SRR7002309_pass_1.fastq.gz -2 /home/xub/host/opt/sra_processor/output/SRR7002309_pass_2.fastq.gz -o /home/xub/get0rganelle_output/ta_mitochondria_output -k 21,45,65,85,105 --disentangle-time-limit=1200 -R 500 -P 1000000 -F embplant_mt --memory-save

2019-12-21 12:38:13,652 - WARNING: removing duplicates was inactive, so that the pre-grouping was disabled.
2019-12-21 12:38:13,653 - INFO: Pre-reading fastq ...
2019-12-21 12:38:13,654 - INFO: Estimating reads to use ... (to skip, set '--reduce-reads-for-coverage inf')
2019-12-21 12:38:15,334 - INFO: Tasting 100000+100000 reads ...
2019-12-21 12:38:37,949 - INFO: Tasting 500000+500000 reads ...
2019-12-21 12:39:26,174 - INFO: Tasting 2500000+2500000 reads ...
2019-12-21 12:43:41,578 - INFO: Estimating reads to use finished.
2019-12-21 12:43:41,580 - INFO: Unzipping reads file: /home/xub/host/opt/sra_processor/output/SRR7002309_pass_1.fastq.gz (7112302855 bytes)
2019-12-21 12:50:01,188 - INFO: Unzipping reads file: /home/xub/host/opt/sra_processor/output/SRR7002309_pass_2.fastq.gz (7314668784 bytes)
2019-12-21 12:56:30,094 - INFO: Counting read qualities ...
2019-12-21 12:56:30,798 - INFO: Identified quality encoding format = Sanger
2019-12-21 12:56:30,800 - INFO: Trimming bases with qualities (0.00%): 33..33 !
2019-12-21 12:56:30,934 - INFO: Mean error rate = 0.0047
2019-12-21 12:56:30,935 - INFO: Counting read lengths ...
2019-12-21 13:07:44,830 - INFO: Mean = 158.4 bp, maximum = 160 bp.
2019-12-21 13:07:44,831 - INFO: Reads used = 75000000+75000000
2019-12-21 13:07:44,831 - INFO: Pre-reading fastq finished.

2019-12-21 13:07:44,831 - INFO: Making seed reads ...
2019-12-21 13:07:44,832 - INFO: Seed bowtie2 index existed!
2019-12-21 13:07:44,832 - INFO: Mapping reads to seed bowtie2 index ...
2019-12-21 14:19:30,064 - INFO: Mapping finished.
2019-12-21 14:19:30,065 - INFO: Seed reads made: /home/xub/get0rganelle_output/ta_mitochondria_output/seed/embplant_mt.initial.fq (42323924 bytes)
2019-12-21 14:19:30,065 - INFO: Making seed reads finished.

2019-12-21 14:19:30,066 - INFO: Checking seed reads and parameters ...
2019-12-21 14:19:30,066 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2019-12-21 14:19:30,066 - INFO: If the result graph is not a circular organelle genome,
2019-12-21 14:19:30,066 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run.
2019-12-21 14:19:50,995 - INFO: Pre-assembling mapped reads ...
2019-12-21 14:20:27,164 - INFO: Pre-assembling mapped reads finished.
2019-12-21 14:20:27,164 - INFO: Estimated embplant_mt-hitting base-coverage = 185.92
2019-12-21 14:20:27,165 - INFO: Estimated word size(s): 98
2019-12-21 14:20:27,165 - INFO: Setting '-w 98'
2019-12-21 14:20:27,165 - INFO: Setting '--max-extending-len inf'
2019-12-21 14:20:27,443 - INFO: Checking seed reads and parameters finished.

2019-12-21 14:20:27,443 - INFO: Making read index ...
2019-12-21 15:03:48,887 - INFO: Mem 17.049 G, 149411106 reads
2019-12-21 15:04:03,093 - INFO: Making read index finished.

2019-12-21 15:04:03,093 - INFO: Extending ...
2019-12-21 15:04:03,093 - INFO: Adding initial words ...
2019-12-21 15:04:16,654 - INFO: AW 3772354
2019-12-21 16:04:24,226 - INFO: Round 1: 149411106/149411106 AI 2996481 AW 28419298 Mem 5.646
2019-12-21 17:03:34,772 - INFO: Round 2: 149411106/149411106 AI 6364512 AW 73504374 Mem 13.613
2019-12-21 18:06:20,553 - INFO: Round 3: 149411106/149411106 AI 12032426 AW 149532000 Mem 20.882
2019-12-21 18:33:23,886 - ERROR:
Traceback (most recent call last):
File "./get_organelle_from_reads.py", line 3705, in main
echo_step=echo_step, log_handler=log_handler)
File "./get_organelle_from_reads.py", line 2362, in extending_no_lim
accepted_words.add(this_seq[i:i + word_size])
MemoryError

Total cost 21311.26 s
Please email jinjianjun@mail.kib.ac.cn or phylojin@163.com if you find bugs!
Please provide me with the get_org.log.txt file!

Kinggerm · 2019-12-22T04:32:25Z

Hi @quokkamole ,

Assembly of mitogenome usually really costs a lot of memory.

You can try to reduce the data usage by setting a smaller "--max-reads" value such as 40000000 or less. You can also try to increase the "-w" value, such as 103 or larger. However, both adjustments would risk incomplete result. But it's worth trying if you don't have enough physical memory.

Best,
Jianjun

quokkamole · 2019-12-22T09:14:08Z

Ok. Thank you. Nice pipeline BTW. I like it.

quokkamole closed this as completed Dec 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any additional hints to reduce memory usage? #27

Any additional hints to reduce memory usage? #27

quokkamole commented Dec 21, 2019

Kinggerm commented Dec 22, 2019

quokkamole commented Dec 22, 2019

Any additional hints to reduce memory usage? #27

Any additional hints to reduce memory usage? #27

Comments

quokkamole commented Dec 21, 2019

Kinggerm commented Dec 22, 2019

quokkamole commented Dec 22, 2019