Deal with Seurat object #2

YuelinYao · 2021-04-27T15:38:09Z

Hi Lucy,
Thank you so much for developing this tool.
I currently identify clusters through Seurat and I would like to use your tools to test the difference in mean of two clusters. I think I might use test_clusters_approx function for this:

_X1<-as.matrix(combined@assays[["RNA"]]@CountS)
cluster<-combined@meta.data[["seurat_clusters"]]
function_cluster <- function(X) {
combined<-CreateSeuratObject(X)
combined <- NormalizeData(combined, normalization.method = "LogNormalize", scale.factor = 10000)
combined <- FindVariableFeatures(combined, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(combined)
variable_gene<-combined@assays[["RNA"]]@var.features
combined<-ScaleData(combined, features = all.genes)
combined <- RunPCA(combined,features = all.genes)
combined <- RunUMAP(combined,features = all.genes)
combined <- RunTSNE(combined,features = all.genes)
combined <- FindNeighbors(combined,features = all.genes)
combined <- FindClusters(combined, resolution = 0.5)
return(combined@meta.data[["seurat_clusters"]])
}

cluster<-function_cluster(X1)
test_clusters_approx(X1, k1=1, k2=2, cl=cluster,cl_fun = function_cluster,ndraws=10000)_

combined is the Seurat object, when i turn into the final step, the function test_clusters_approx keeps running the cluster function over and over again...

Do you have any solutions for that?

Thank you!

lucylgao · 2021-05-06T19:53:03Z

I'm understanding your question as: how do I stop test_clusters_approx from calling function_cluster() 10000 times?

Unfortunately, that is an unavoidable step of approximating the p-value via Monte Carlo. You can find the details in Section 4.1 of the paper. The high level idea is follows: we simulate data sets where the difference in means between the estimated clusters you wish to test for a difference in means between varies, subset to only the data sets where running the clustering algorithm still yields the two original estimated clusters, then calculate the proportion of data sets where the difference in means is larger than what was observed.

I imagine that this is a slow process if you are using Seurat as your clustering algorithm, so if runtime is the main issue of concern, you could try reducing the ndraws argument. That will simulate fewer data sets, and therefore cluster fewer data sets.

YuelinYao · 2021-05-14T16:43:53Z

I'm understanding your question as: how do I stop test_clusters_approx from calling function_cluster() 10000 times?

Unfortunately, that is an unavoidable step of approximating the p-value via Monte Carlo. You can find the details in Section 4.1 of the paper. The high level idea is follows: we simulate data sets where the difference in means between the estimated clusters you wish to test for a difference in means between varies, subset to only the data sets where running the clustering algorithm still yields the two original estimated clusters, then calculate the proportion of data sets where the difference in means is larger than what was observed.

I imagine that this is a slow process if you are using Seurat as your clustering algorithm, so if runtime is the main issue of concern, you could try reducing the ndraws argument. That will simulate fewer data sets, and therefore cluster fewer data sets.

Thanks for your answering

lucylgao closed this as completed May 14, 2021

aguang mentioned this issue Mar 21, 2022

speeding up test_clusters_approx #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with Seurat object #2

Deal with Seurat object #2

YuelinYao commented Apr 27, 2021

lucylgao commented May 6, 2021

YuelinYao commented May 14, 2021

Deal with Seurat object #2

Deal with Seurat object #2

Comments

YuelinYao commented Apr 27, 2021

lucylgao commented May 6, 2021

YuelinYao commented May 14, 2021