-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to interpret this meryl count warning message #1565
Comments
You can leave it as is. There is only a small penalty for using too little memory; meryl will compute counts in batches, then merge the batches at the end. Here, it's warning that there might be 2 or 3 batches -- perfectly reasonable; the merge is quite fast -- but in fact, it fit everything into one batch. For this specific step in canu, there is a benefit to using even smaller memory sizes. Canu will then be able to run multiple meryl jobs at once, and merge the results at the end. So, instead of running one big 30gb job, you can run 6 jobs at 6gb each (so 1/6th the wall time), then do a quick merge at the end. |
Hmm... Interesting. canu repartitioned my parental short reads into approximately 4000 (4k) files. |
And one follow up question: my understanding is that the meryl steps in trio binning the child reads are configured to be using the same amount of memory, i.e. |
That's correct. I'm sure there's some sweet spot where the benefit of multiple parallel jobs balances the overhead of merging those results. I have no idea where it is. Running one gigantic batch will be very slow, but fast to merge. Running 4000 batches will be pretty quick (assuming 4000 cpus) but need enormous I/O bandwidth. Also, recall that this step doesn't listen to merylMemory. It's picking a partitioning based on the number of files - https://github.com/marbl/canu/blob/master/src/pipelines/canu/HaplotypeReads.pm#L368 (and about 100 lines before there too). |
Thanks for the answers Brian! I remember that now. I think I do have a bit of luxury for better IO for our use case (scattering the count operations across one hundred google VM seems to be acceptable to us). |
Closing as Brian answered the question perfectly. |
Hi,
My meryl-count
out
files have the following warning messages. I don't know how to interpret that. Should I allocate more memory for meryl?Thanks!
Full out file
The text was updated successfully, but these errors were encountered: