Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speeding up test_clusters_approx #5

Closed
aguang opened this issue Mar 21, 2022 · 1 comment
Closed

speeding up test_clusters_approx #5

aguang opened this issue Mar 21, 2022 · 1 comment

Comments

@aguang
Copy link

aguang commented Mar 21, 2022

Like others who have opened issues, I'm interested in running this neat method to look at the difference of means in my clusters from single cell data. (Well, really I would like to adjust p-values for differential expression analysis, similar to #3 and #4, but as I understand it that extension is not possible right now). However running test_clusters_approx is painfully slow, as in #2. I took a look at the source code and it seems like the function could easily be sped up by running the piece where it runs cl_fun in a for loop in parallel (using future or any other parallellization package):

From

for(j in 1:ndraws) {
    if(phi[j] < 0) next
    
    # Compute perturbed data set
    Xphi <- X
    Xphi[cl == k1, ] <- t(orig_k1 + (phi[j] - stat)*k1_constant)
    Xphi[cl == k2, ] <- t(orig_k2 + (phi[j] - stat)*k2_constant)
    
    # Recluster the perturbed data set
    cl_Xphi <- cl_fun(Xphi)
    if(preserve_cl(cl, cl_Xphi, k1, k2)) {
      log_survives[j] <- -(phi[j]/scale_factor)^2/2 + (q-1)*log(phi[j]/scale_factor) - (q/2 - 1)*log(2) - log(gamma(q/2)) - log(scale_factor) -
        stats::dnorm(phi[j], mean=stat, sd=scale_factor, log=TRUE)
    }
  }

to

future_lapply(X = 1:ndraws, FUN = function(j) {
    if(phi[j] < 0) next
    
    # Compute perturbed data set
    Xphi <- X
    Xphi[cl == k1, ] <- t(orig_k1 + (phi[j] - stat)*k1_constant)
    Xphi[cl == k2, ] <- t(orig_k2 + (phi[j] - stat)*k2_constant)
    
    # Recluster the perturbed data set
    cl_Xphi <- cl_fun(Xphi)
    if(preserve_cl(cl, cl_Xphi, k1, k2)) {
      log_survives[j] <- -(phi[j]/scale_factor)^2/2 + (q-1)*log(phi[j]/scale_factor) - (q/2 - 1)*log(2) - log(gamma(q/2)) - log(scale_factor) -
        stats::dnorm(phi[j], mean=stat, sd=scale_factor, log=TRUE)
    }
  }

Would you be willing to implement this? I am also happy to make the modifications and submit a pull request for it.

@lucylgao
Copy link
Owner

Thank you very much for the suggestion! I totally agree that it would be helpful to make the code less embarrassingly parallel.

I just pushed a commit that replaces the for loop in the test_clusters_approx function with future_lapply, as per your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants