You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Like others who have opened issues, I'm interested in running this neat method to look at the difference of means in my clusters from single cell data. (Well, really I would like to adjust p-values for differential expression analysis, similar to #3 and #4, but as I understand it that extension is not possible right now). However running test_clusters_approx is painfully slow, as in #2. I took a look at the source code and it seems like the function could easily be sped up by running the piece where it runs cl_fun in a for loop in parallel (using future or any other parallellization package):
From
for(j in 1:ndraws) {
if(phi[j] < 0) next
# Compute perturbed data set
Xphi <- X
Xphi[cl == k1, ] <- t(orig_k1 + (phi[j] - stat)*k1_constant)
Xphi[cl == k2, ] <- t(orig_k2 + (phi[j] - stat)*k2_constant)
# Recluster the perturbed data set
cl_Xphi <- cl_fun(Xphi)
if(preserve_cl(cl, cl_Xphi, k1, k2)) {
log_survives[j] <- -(phi[j]/scale_factor)^2/2 + (q-1)*log(phi[j]/scale_factor) - (q/2 - 1)*log(2) - log(gamma(q/2)) - log(scale_factor) -
stats::dnorm(phi[j], mean=stat, sd=scale_factor, log=TRUE)
}
}
to
future_lapply(X = 1:ndraws, FUN = function(j) {
if(phi[j] < 0) next
# Compute perturbed data set
Xphi <- X
Xphi[cl == k1, ] <- t(orig_k1 + (phi[j] - stat)*k1_constant)
Xphi[cl == k2, ] <- t(orig_k2 + (phi[j] - stat)*k2_constant)
# Recluster the perturbed data set
cl_Xphi <- cl_fun(Xphi)
if(preserve_cl(cl, cl_Xphi, k1, k2)) {
log_survives[j] <- -(phi[j]/scale_factor)^2/2 + (q-1)*log(phi[j]/scale_factor) - (q/2 - 1)*log(2) - log(gamma(q/2)) - log(scale_factor) -
stats::dnorm(phi[j], mean=stat, sd=scale_factor, log=TRUE)
}
}
Would you be willing to implement this? I am also happy to make the modifications and submit a pull request for it.
The text was updated successfully, but these errors were encountered:
Like others who have opened issues, I'm interested in running this neat method to look at the difference of means in my clusters from single cell data. (Well, really I would like to adjust p-values for differential expression analysis, similar to #3 and #4, but as I understand it that extension is not possible right now). However running
test_clusters_approx
is painfully slow, as in #2. I took a look at the source code and it seems like the function could easily be sped up by running the piece where it runscl_fun
in a for loop in parallel (usingfuture
or any other parallellization package):From
to
Would you be willing to implement this? I am also happy to make the modifications and submit a pull request for it.
The text was updated successfully, but these errors were encountered: