Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with Seurat object #2

Closed
YuelinYao opened this issue Apr 27, 2021 · 2 comments
Closed

Deal with Seurat object #2

YuelinYao opened this issue Apr 27, 2021 · 2 comments

Comments

@YuelinYao
Copy link

Hi Lucy,
Thank you so much for developing this tool.
I currently identify clusters through Seurat and I would like to use your tools to test the difference in mean of two clusters. I think I might use test_clusters_approx function for this:

_X1<-as.matrix(combined@assays[["RNA"]]@CountS)
cluster<-combined@meta.data[["seurat_clusters"]]
function_cluster <- function(X) {
combined<-CreateSeuratObject(X)
combined <- NormalizeData(combined, normalization.method = "LogNormalize", scale.factor = 10000)
combined <- FindVariableFeatures(combined, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(combined)
variable_gene<-combined@assays[["RNA"]]@var.features
combined<-ScaleData(combined, features = all.genes)
combined <- RunPCA(combined,features = all.genes)
combined <- RunUMAP(combined,features = all.genes)
combined <- RunTSNE(combined,features = all.genes)
combined <- FindNeighbors(combined,features = all.genes)
combined <- FindClusters(combined, resolution = 0.5)
return(combined@meta.data[["seurat_clusters"]])
}

cluster<-function_cluster(X1)
test_clusters_approx(X1, k1=1, k2=2, cl=cluster,cl_fun = function_cluster,ndraws=10000)_

combined is the Seurat object, when i turn into the final step, the function test_clusters_approx keeps running the cluster function over and over again...

Do you have any solutions for that?

Thank you!

@lucylgao
Copy link
Owner

lucylgao commented May 6, 2021

I'm understanding your question as: how do I stop test_clusters_approx from calling function_cluster() 10000 times?

Unfortunately, that is an unavoidable step of approximating the p-value via Monte Carlo. You can find the details in Section 4.1 of the paper. The high level idea is follows: we simulate data sets where the difference in means between the estimated clusters you wish to test for a difference in means between varies, subset to only the data sets where running the clustering algorithm still yields the two original estimated clusters, then calculate the proportion of data sets where the difference in means is larger than what was observed.

I imagine that this is a slow process if you are using Seurat as your clustering algorithm, so if runtime is the main issue of concern, you could try reducing the ndraws argument. That will simulate fewer data sets, and therefore cluster fewer data sets.

@YuelinYao
Copy link
Author

I'm understanding your question as: how do I stop test_clusters_approx from calling function_cluster() 10000 times?

Unfortunately, that is an unavoidable step of approximating the p-value via Monte Carlo. You can find the details in Section 4.1 of the paper. The high level idea is follows: we simulate data sets where the difference in means between the estimated clusters you wish to test for a difference in means between varies, subset to only the data sets where running the clustering algorithm still yields the two original estimated clusters, then calculate the proportion of data sets where the difference in means is larger than what was observed.

I imagine that this is a slow process if you are using Seurat as your clustering algorithm, so if runtime is the main issue of concern, you could try reducing the ndraws argument. That will simulate fewer data sets, and therefore cluster fewer data sets.

Thanks for your answering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants