Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelism and co-expression inference speed #36

Open
TheAustinator opened this issue Oct 1, 2023 · 4 comments
Open

Parallelism and co-expression inference speed #36

TheAustinator opened this issue Oct 1, 2023 · 4 comments

Comments

@TheAustinator
Copy link

Hi Liang, I'm attempting to run gene network inference with 100 seed genes on 92 cores via the argument ncore=92 to the nebula function and running each of the 100 differential expression models in sequence. Presumedly, the seed gene is being tested against all genes in the counts matrix in parallel. However, I'm finding that the model's utilization of parallelism is very low, and wondering if there's a way I could improve this. Here are the ideas I have in order of increasing optimization:

  1. Of course, I could parallelize across seed genes by kicking off many R jobs, but each job would have to load the entire counts matrix, which would cause the machine to run out of memory quickly
  2. For now, I'll chunk the counts matrix into 92 subsets along the gene axis, then parallelize across the counts matrix chunks rather than using the ncore parameter
  3. Is there a bottleneck that could be removed in the original nebula parallelization?
  4. Is there a way to parallelize nebula both across the counts matrix genes (assuming that's how it's originally parallelized) and simultaneously across seed genes for the specific case of co-expression/GRN inference so that each (seed gene, counts matrix gene) pair gets a thread?

Here are my parameters -- please let me know if there's a better set of parameters for high-fidelity co-expression inference that will run faster.
Data: Analyzing 10827 genes with 4 subjects and 8003 cells.
Params: kappa=200, ncore=64, model="NBLMM", method="LN"

Low parallelism utilization:
Screen Shot 2023-09-30 at 9 01 51 PM

@TheAustinator
Copy link
Author

Update: I parallelized over gene-chunks of the counts matrix and efficiency skyrocketed. I'm a python guy, so I wrote a wrapper with joblib, which I'll share once I get it cleaned up. But I'm sure many users would love to have a solution built into the nebula package, so even though I've solved the problem for myself, I'm happy to help brainstorm further.

Screen Shot 2023-10-01 at 3 37 09 PM

@lhe17
Copy link
Owner

lhe17 commented Oct 2, 2023 via email

@TheAustinator
Copy link
Author

I'm just running on a large Ubuntu AWS EC2 instance (so just a single machine with 192 cores). My python wrapper may have obscured parallelism-related error messages from R -- I do pipe stdout and stderr to log files, which didn't show any sign of error, but if R routes logging elsewhere, I could have missed it. However, when using the standard nebula parallelization, if I watched htop (the application in the screenshots) for a while, I occasionally saw all cores fire up for a split second before a return to single-core.

Is there a chance that there's a single-threaded bottleneck that's taking most of the time with the parallel parts finishing quickly? Are you parallelizing over genes or something else? Or if your CPU utilization is near 100% (or CPU load is roughly equal to the number of cores) when you're running it, then it could just be something about my system.

Let me know if you're interested in the python wrapper, although if this isn't just an issue with my system, I imagine you'd be more interested in an R fix. And happy to help if there's anything else I can do!

Cheers,
Austin

@lhe17
Copy link
Owner

lhe17 commented Oct 4, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants