Speeding up dnmtools states #26

hchetia · 2022-10-31T21:54:30Z

Hi,
Is there a way to use higher memory/cores with DNMtools states?

Best
H

andrewdavidsmith · 2022-10-31T22:56:58Z

Possibly. I'll keep this open and probably rename the issue, add detail so we might have a roadmap on how to do that. However, there's no simple switch that we can use to make this happen right away.

hchetia · 2022-11-01T15:07:20Z

Hi @andrewdavidsmith
Thanks for your response.
I have run DNMTOOLS states on sam file (44GB). It has been running for >100 hours and have generated 3 MB of epireads.
A snippet from "top"

Seems like "states" could be made capable of using more memory and cores. The input sam could be split the temporary samlets and those samlets can be read in parallel.

Thanks.

andrewdavidsmith · 2022-11-01T16:14:31Z

That's not the expected behavior. If the reads are not sorted in the expected order, there's a chance it could turn from a linear time computation into a quadratic time one. If you could find a way to share the data with me I can check. Although I know you might not want to share all the data, the problem might not be easily revealed on just a small part. Let me know and feel free to email me.

andrewdavidsmith · 2022-11-01T16:21:14Z

The right thing for us to do is have the code verify the sorting of reads, but sometimes the code attempts to just proceed and compensate if it gets unexpected input.

I also notice from your screen capture that the program is using 17.2g of vmem, but only 3.2g of pmem, which suggests something else is going on and the program is likely thrashing. Are you sure you have sufficient available physical memory for that process?

hchetia · 2022-12-06T17:11:12Z

@andrewdavidsmith You were right, the algorithm was compensating for unsorted inputs. I have now been successfully ran the conversion to epiread pretty quickly.
In terms of accelerating the program, I agree that the code should verify the sorting first.
Run details- ~15 mins to generate 1.5G epireads from 35G deduped sorted sam inputs (HG38).
Adding CPU info and meminfo here in case it's helpful to you-
CPU(s): 96 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
Memory: 790 Gigs
I don't understand the thrashing part. Sharing another snippet here-

andrewdavidsmith · 2023-08-17T22:07:58Z

@hchetia I'm closing this because I think the issue has been solved. It should not continue if the input is unsorted in a way that will cause slowdown. Specifically, all reads from the same chromosome need to be consecutive.

hchetia closed this as completed Nov 1, 2022

hchetia reopened this Nov 1, 2022

andrewdavidsmith closed this as completed Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up dnmtools states #26

Speeding up dnmtools states #26

hchetia commented Oct 31, 2022

andrewdavidsmith commented Oct 31, 2022

hchetia commented Nov 1, 2022

andrewdavidsmith commented Nov 1, 2022

andrewdavidsmith commented Nov 1, 2022 •

edited

Loading

hchetia commented Dec 6, 2022

andrewdavidsmith commented Aug 17, 2023

Speeding up dnmtools states #26

Speeding up dnmtools states #26

Comments

hchetia commented Oct 31, 2022

andrewdavidsmith commented Oct 31, 2022

hchetia commented Nov 1, 2022

andrewdavidsmith commented Nov 1, 2022

andrewdavidsmith commented Nov 1, 2022 • edited Loading

hchetia commented Dec 6, 2022

andrewdavidsmith commented Aug 17, 2023

andrewdavidsmith commented Nov 1, 2022 •

edited

Loading