Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage, tidyverse copy-on-modify and mclapply #34

Closed
whtns opened this issue May 25, 2022 · 2 comments
Closed

Memory usage, tidyverse copy-on-modify and mclapply #34

whtns opened this issue May 25, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@whtns
Copy link
Contributor

whtns commented May 25, 2022

Thanks for this package. I have run the development release successfully on a few tumor samples. I have lately been running into persistent memory issues especially in phylogeny steps. I am running on our lab's local server with 64 Gb RAM and 8 cores. I have begun to suspect that the issue is the use of tidyverse in mclapply routines. I have recently converted every mclapply step into a basic for loop assigning to a pre-allocated list--ignoring any parallelization speed benefits. That has helped a little bit. I have considered trying to use dtplyr to ease some memory issues.

Do you think this suspicion is reasonable? Have you already made efforts to reduce the memory footprint?

@evanbiederstedt
Copy link
Contributor

Hi @whtns

I was thinking about trying something like this, but I didn't have the time.

The correct approach above is if you create a pull request with those changes for us to review----please provide statistics on memory usages improvements as well if you could.

After forking the repo, please use the develop branch for these changes, and create a PR against that branch.

Thanks, Evan

@teng-gao teng-gao added the enhancement New feature or request label Jun 3, 2022
@teng-gao
Copy link
Collaborator

teng-gao commented Jul 2, 2022

Hello @whtns,

In the most recent update 0.1.3, we have completely replaced mclapply in the phylogeny part using RcppParallel. This led to a 10-20x speedup and the memory usage should now stay constant with respect to the number of threads. Please let us know if that helps with your memory issue!

Thanks,
Teng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants