Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any additional hints to reduce memory usage? #27

Closed
quokkamole opened this issue Dec 21, 2019 · 2 comments
Closed

Any additional hints to reduce memory usage? #27

quokkamole opened this issue Dec 21, 2019 · 2 comments

Comments

@quokkamole
Copy link

GetOrganelle v1.6.2e

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0]
Python libs: numpy 1.17.4; sympy 1.4; scipy 1.3.2; psutil 5.4.2
Dependencies: Bowtie2 2.3.5.1; SPAdes 3.13.0; Blast 2.2.30; Bandage 0.8.1
./get_organelle_from_reads.py -1 /home/xub/host/opt/sra_processor/output/SRR7002309_pass_1.fastq.gz -2 /home/xub/host/opt/sra_processor/output/SRR7002309_pass_2.fastq.gz -o /home/xub/get0rganelle_output/ta_mitochondria_output -k 21,45,65,85,105 --disentangle-time-limit=1200 -R 500 -P 1000000 -F embplant_mt --memory-save

2019-12-21 12:38:13,652 - WARNING: removing duplicates was inactive, so that the pre-grouping was disabled.
2019-12-21 12:38:13,653 - INFO: Pre-reading fastq ...
2019-12-21 12:38:13,654 - INFO: Estimating reads to use ... (to skip, set '--reduce-reads-for-coverage inf')
2019-12-21 12:38:15,334 - INFO: Tasting 100000+100000 reads ...
2019-12-21 12:38:37,949 - INFO: Tasting 500000+500000 reads ...
2019-12-21 12:39:26,174 - INFO: Tasting 2500000+2500000 reads ...
2019-12-21 12:43:41,578 - INFO: Estimating reads to use finished.
2019-12-21 12:43:41,580 - INFO: Unzipping reads file: /home/xub/host/opt/sra_processor/output/SRR7002309_pass_1.fastq.gz (7112302855 bytes)
2019-12-21 12:50:01,188 - INFO: Unzipping reads file: /home/xub/host/opt/sra_processor/output/SRR7002309_pass_2.fastq.gz (7314668784 bytes)
2019-12-21 12:56:30,094 - INFO: Counting read qualities ...
2019-12-21 12:56:30,798 - INFO: Identified quality encoding format = Sanger
2019-12-21 12:56:30,800 - INFO: Trimming bases with qualities (0.00%): 33..33 !
2019-12-21 12:56:30,934 - INFO: Mean error rate = 0.0047
2019-12-21 12:56:30,935 - INFO: Counting read lengths ...
2019-12-21 13:07:44,830 - INFO: Mean = 158.4 bp, maximum = 160 bp.
2019-12-21 13:07:44,831 - INFO: Reads used = 75000000+75000000
2019-12-21 13:07:44,831 - INFO: Pre-reading fastq finished.

2019-12-21 13:07:44,831 - INFO: Making seed reads ...
2019-12-21 13:07:44,832 - INFO: Seed bowtie2 index existed!
2019-12-21 13:07:44,832 - INFO: Mapping reads to seed bowtie2 index ...
2019-12-21 14:19:30,064 - INFO: Mapping finished.
2019-12-21 14:19:30,065 - INFO: Seed reads made: /home/xub/get0rganelle_output/ta_mitochondria_output/seed/embplant_mt.initial.fq (42323924 bytes)
2019-12-21 14:19:30,065 - INFO: Making seed reads finished.

2019-12-21 14:19:30,066 - INFO: Checking seed reads and parameters ...
2019-12-21 14:19:30,066 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2019-12-21 14:19:30,066 - INFO: If the result graph is not a circular organelle genome,
2019-12-21 14:19:30,066 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run.
2019-12-21 14:19:50,995 - INFO: Pre-assembling mapped reads ...
2019-12-21 14:20:27,164 - INFO: Pre-assembling mapped reads finished.
2019-12-21 14:20:27,164 - INFO: Estimated embplant_mt-hitting base-coverage = 185.92
2019-12-21 14:20:27,165 - INFO: Estimated word size(s): 98
2019-12-21 14:20:27,165 - INFO: Setting '-w 98'
2019-12-21 14:20:27,165 - INFO: Setting '--max-extending-len inf'
2019-12-21 14:20:27,443 - INFO: Checking seed reads and parameters finished.

2019-12-21 14:20:27,443 - INFO: Making read index ...
2019-12-21 15:03:48,887 - INFO: Mem 17.049 G, 149411106 reads
2019-12-21 15:04:03,093 - INFO: Making read index finished.

2019-12-21 15:04:03,093 - INFO: Extending ...
2019-12-21 15:04:03,093 - INFO: Adding initial words ...
2019-12-21 15:04:16,654 - INFO: AW 3772354
2019-12-21 16:04:24,226 - INFO: Round 1: 149411106/149411106 AI 2996481 AW 28419298 Mem 5.646
2019-12-21 17:03:34,772 - INFO: Round 2: 149411106/149411106 AI 6364512 AW 73504374 Mem 13.613
2019-12-21 18:06:20,553 - INFO: Round 3: 149411106/149411106 AI 12032426 AW 149532000 Mem 20.882
2019-12-21 18:33:23,886 - ERROR:
Traceback (most recent call last):
File "./get_organelle_from_reads.py", line 3705, in main
echo_step=echo_step, log_handler=log_handler)
File "./get_organelle_from_reads.py", line 2362, in extending_no_lim
accepted_words.add(this_seq[i:i + word_size])
MemoryError

Total cost 21311.26 s
Please email jinjianjun@mail.kib.ac.cn or phylojin@163.com if you find bugs!
Please provide me with the get_org.log.txt file!

@Kinggerm
Copy link
Owner

Hi @quokkamole ,

Assembly of mitogenome usually really costs a lot of memory.

You can try to reduce the data usage by setting a smaller "--max-reads" value such as 40000000 or less. You can also try to increase the "-w" value, such as 103 or larger. However, both adjustments would risk incomplete result. But it's worth trying if you don't have enough physical memory.

Best,
Jianjun

@quokkamole
Copy link
Author

Ok. Thank you. Nice pipeline BTW. I like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants