-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out of memory #67
Comments
Hi guys, As a follow up. Using only half the reads in the SE threading step helped reduce the memory foorprint and the program is currently writing the SE.thread.ctp.gz file as we speak. I'm not sure how well memory is going to be managed for the PE threading. I also saw it was possible to use PacBio reads for SE threading, would these be uncorrected/raw PB reads or do these need to be corrected before using them? Regards, |
You could try using cortexjdk for this. Cortexjdk uses binary searches on the cortex graph for all operations and so does not need to load the graph into memory. You might have to contact Kiran to get him to expose that functionality to the command line, but I'm pretty sure the functionality is already there... |
Are you reads very long? Did you clean the graph (.ctx file) before threading the reads? Unfortunately the graph threading step does require a large amount of memory. Writing links to memory instead of storing them in memory would fix this issue - it sounds like cortexjdk might do that. Cleaning the de Bruijn graph or cleaning more aggressively may fix these issues. |
@jdmontenegro, to answer your question about threading of pacbio reads: My understanding is that you should not have to correct the pacbio reads before threading. In essence, threading is a way of correcting pacbio reads against illumina data. My understanding is also that mccortex will fill in gabs in the pacbio reads based on the graph if there are no junctions in that region of the graph. Do make sure you pick a k-mer size that is appropriate for your type of pacbio sequencing. For example, k=47 with pacbio reads is only going to work with CCS reads. |
Hi everyone,
After extensive correspondence with Kiran Garimella who has been extremely
helpful a few things are clear:
1) It is possible to divide the dataset into smaller chunks to reduce
memory usage during threading.
2) threading with PB reads requires corrected reads This step is not
intended for error correction but for cleaning up the DBG graph and
improving the assembly.
BTW, I'm definitively going to try the cortexjdk, it seems it will be very
helpful in this case.
Thank you for all your answers.
Regards,
Juan D. Montenegro
2018-04-28 17:00 GMT+10:00 Winni Kretzschmar <notifications@github.com>:
… @jdmontenegro <https://github.com/jdmontenegro>, to answer your question
about threading of pacbio reads: My understanding is that you should not
have to correct the pacbio reads before threading. In essence, threading is
a way of correcting pacbio reads against illumina data. My understanding is
also that mccortex will fill in gabs in the pacbio reads based on the graph
if there are no junctions in that region of the graph.
Do make sure you pick a k-mer size that is appropriate for your type of
pacbio sequencing. For example , k=47 with pacbio reads is only going to
work with CCS reads.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#67 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AI8lule5c93Epe8Y_ljiWlDNssLErbqdks5ttBOegaJpZM4SxGx4>
.
|
Hi Juan, Yes you can thread a few reads at a time (using
We don't need to store all links in memory as we are threading but unfortunately this implementation does. So we can safely do the threading step in stages and merge the result. |
Hi guys,
First of all, thank you for this great software it looks really impressive.
I would like to know if there is any workaround to an out of memory Fatal Error while threading the DBG.
I have a 1.7Gbp genome sequenced to a coverage of ~60X. I am using the largest node I have available which has 1.5Tb of RAM, but still get the following error (I've removed full paths to make it simpler to read):
From the log, it looks like it should fit in memory, but it doesn't. Would it help if I reduced the number of reads to half (~30X) to reduce memory usage? Or if I split the fastq in two chunks and do a two step threading of SE information to reduce peak memory? Orthe only solution would be to ask for a larger node (more memory)?
I look forward to any suggestion.
Kind regards,
The text was updated successfully, but these errors were encountered: