-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strange segfault #6
Comments
I should note that this machine has 256G of RAM and I don't appear to have any limitations as far as disk space. |
Hi Erik! Curious, and that's definitely not tied to the number of fastq files, as at this stage minia only considers the counted kmers and no longer the input files. That's the first time I see this stage fail, it could be due to some special unhandled case in the graph structure. I'd be happy to assist with debugging.
|
It does complete with a shorter kmer size (41) and the same abundance. I would love to share the data but I will need to ask my collaborators. I will send you an email if I can share. I definitely understand that's the only way to really resolve this problem. Interestingly the k=41 assembly has very few contigs (in the order of a few hundred kb) while the unitig set is several hundred mb. Also this data is a little strange, it's going to consist of many small fragments by library design (GBS/RadSeq). |
I see. Well, I'll keep an eye for that email, otherwise, just let me know if you encounter that bug again in another dataset. I'd like to get a sense of whether this is a one-of-a-kind thing. |
Hello, (2018-08-13 22:18:53) GATB-pipeline starting (2018-08-13 22:18:53) Setting maximum kmer length to: 151 bp (2018-08-13 22:18:53) Minia assembling at k=21 min_abundance=2 (2018-08-14 03:36:01) Minia assembling at k=41 min_abundance=2 |
Hi Mozart, thanks for reporting it. I'm assuming you cannot share the data either, therefore I'd be curious to see if the problem occurs with a different k-mer combinations. Can you please try the following command line? |
Hi Ryan,
Thanks for your response. Actually I would be Ok with me to share the data,
if it could be useful for your.
Let me know what is the best way to do it for you.
All the best!
Pavel
2018-08-26 7:41 GMT-04:00 Rayan Chikhi <notifications@github.com>:
… Hi Mozart, thanks for reporting it. I'm assuming you cannot share the data
either, therefore I'd be curious to see if the problem occurs with a
different k-mer combinations. Can you please try the following command
line? ./gatb --kmer-sizes 31,51,71 -1 [..] -2 [..] -o [..]
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AoTCNqdcCoukMZVFE1SiJUbLVIXRhUALks5uUolygaJpZM4SsR9k>
.
--
Pavel B. Klimov, Ph.D.
University of Michigan, Department of Ecology and Evolutionary Biology,
Museum of Zoology
3600 Varsity Dr. #1030, Ann Arbor, Michigan 48108 USA
Phone (office): (734)763-4354
Fax: (734)763-4080
Email (business): pklimov@umich.edu
Web: http://insects.ummz.lsa.umich.edu/ACARI/staff/pklimov
Bee mites: http://idtools.org/id/mites/beemites/
|
Hi Pavel, |
HI Rayan, Minia 3, git commit 4b32fec The command line that I used was the following: and the sequences.txt file contains the following reads: the reads are from GIAB: Let me know if you need more information to reproduce the error. The best |
Hi Alex, |
Hi Rayan:
Thanks for the answer, I was running the command in a cluster and it crushed in different nodes.
The node hardware is the following:
Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 47Gb of RAM, Lustre filesytem.
To me the problem is the memory that the program use, because it ran without problems when the coverage is less than 30X for a human dataset.
I mean the BCALM step use less than 35GB of RAM in such datasets.
I observed that the same error occur in the following situations:
1.- Coverage is more than 30X for a human genome.
2.- When the length of short reads is 250bp, I got the same error on arabidopsis or human datasets.
Of course a solution is to increase the RAM to something like 100Gb, but unfortunately we don’t have such kind of machines.
Do you think I have to update to the current master?
Thank in advance
Best
Alex
… On Oct 29, 2018, at 1:09 PM, Rayan Chikhi ***@***.***> wrote:
Hi Alex,
Thanks for the very detailed bug report and sorry for the answer delay. Can you reproduce this problem on another machine? Unfortunately I cannot, the pipeline finished without crashing on my server using the command line you provided.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#6 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHexyE68Rjg0e1zr9KcUD41AdWPA6jiUks5upvAVgaJpZM4SsR9k>.
|
After offline discussion with Alex it seems that the problem he reported was related to higher memory usage than what was available on the system. |
I'm testing minia using ~400 input fastq.gz files. I observed a strange segfault and was immediately curious if there might be something dependent on the number of input files. The input is ~60G or so, around what might be normal for a lower coverage human assembly, but derived from a reduced representation of the genome (this is for Capsicum, and we're using "genotyping-by-sequencing" data).
Here's the error log:
The text was updated successfully, but these errors were encountered: