-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCTransform not regressing out variables - Seurat v5.0.1 #8148
Comments
I am running the analysis on a Mac (MacOS Ventura 13.3.1). Code for cell cycle regression and mitochondrial gene percentage was changed from the pbmc code to reflect differences in gene name formatting for my mouse datasets, but no other changes were made between running it on the pbmc data and my own. |
Thanks for catching this @ChristopherStephens21, I will push a fix. |
I have pushed a fix here 64a6495 remotes::install_github("sataijalab/seurat", ref="develop") Hope this helps! |
I was having similar observations to the original post. However, after installing the developer version it appears that it no longer is able to grab the values from the meta.data as I get the following message.
Second step: Get residuals using fitted parameters for 19037 genes |
Hi, I'm having the same issue as @Greysun109 Regression of mitochondrial gene content by SCTransform using Unfortunately, installing I'm also getting the same error message as @Greysun109 (Error: None of the requested variables to regress are present in the object.) although the variable is included in the metadata slot of the seurat object. I'm on macOS Ventura 13.6.1 |
Hello, I am having the same issue as @Bildungsluecke . I have installed your version using
but I have this error at the end of SCTransform() calculations:
Before your version, I had the same problem as @ChristopherStephens21 , same UMAP from regressed out samples running SCTransform() Here is my sessionInfo() if it can be useful:
EDIT (I have found clues about how to solve the problem!):It is because in the function SCTransform(), the default option ncell = 5000 split my dataset in two at some point because with verbose = TRUE, I can see:
Indeed, my dataset is composed of 5640 cells, thus using the default option ncell = 5000 might produce some weird calculations maybe? Therefore I have tried to use ncell = 5640, in addition to vars.to.regress = "mtRNA.percent", and it works fine (I have a different UMAP indicating that regressing out mitochondrial RNA worked). Plus, I don't have the two following lines:
indicating that the calculations are at some point splitted in two I guess. Remark 1: I have updated back my Seurat package to use the current version of the function (not the one you sent us with remotes::install_github("sataijalab/seurat", ref="develop") ), and the problem can be solved using ncell = your_number_of_cells_in_your_dataset Remark 2: It seems that without any regression, ncell can be set to less than 5000 (I tried 5640/2), and the output is OK. Therefore, there is a problem with ncell only when using vars.to.regress. I hope it will help troubleshooting the function. In any case, @saketkc , I think you should update the SCTransform function about this ncell option; either by setting it by default to the number of cells in the sample, or by updating the calculations that I know nothing about. Plus, you should talk about ncell in your tutorial vignette, because I didn't see it anywhere. Otherwise, many people will have the same issue... Thanks again for your amazing work, |
To add to this – I was seeing the same original issue of vars.to.regress not changing anything with Seurat_5.0.0. Can confirm that (for whatever reason) adding Never installed from |
#Example 1: With Regression
pbmc_data <- Seurat::Read10X(data.dir = FilePath)
pbmc <- CreateSeuratObject(counts = pbmc_data)
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst")
s.genes <- cc.genes.updated.2019$s.genes #Cell cycle markers loaded from Seurat
s.genes <- sapply(s.genes, str_to_title)
g2m.genes <- cc.genes.updated.2019$g2m.genes #Separating into S and G2M markers
g2m.genes <- sapply(g2m.genes, str_to_title)
pbmc <- CellCycleScoring(pbmc, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
pbmc <- PercentageFeatureSet(pbmc, pattern = "^mt-", col.name = "percent.mt")
pbmc <- SCTransform(pbmc, vars.to.regress = c("percent.mt", "S.Score", "G2M.Score"), vst.flavor = "v2", method = "glmGamPoi", verbose = T)
pbmc <- RunPCA(pbmc, verbose = T)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = T)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = T)
pbmc <- FindClusters(pbmc, verbose = T)
#Example 2: Without Regression
pbmc2 <- CreateSeuratObject(counts = pbmc_data)
pbmc2 <- NormalizeData(pbmc2)
pbmc2 <- FindVariableFeatures(pbmc2, selection.method = "vst")
pbmc2 <- CellCycleScoring(pbmc2, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
pbmc2 <- PercentageFeatureSet(pbmc2, pattern = "^mt-", col.name = "percent.mt")
pbmc2 <- SCTransform(pbmc2, vst.flavor = "v2", method = "glmGamPoi", verbose = T)
pbmc2 <- RunPCA(pbmc2, verbose = T)
pbmc2 <- RunUMAP(pbmc2, dims = 1:30, verbose = T)
pbmc2 <- FindNeighbors(pbmc2, dims = 1:30, verbose = T)
pbmc2 <- FindClusters(pbmc2, verbose = T)
Plot1 <- DimPlot(pbmc, label = TRUE) + ggtitle("With MT and CC Regression")
Plot2 <- DimPlot(pbmc2, label = TRUE) + ggtitle("Without MT and CC Regression")
Plot1 + Plot2
identical(pbmc@meta.data, pbmc2@meta.data)
Second step: Get residuals using fitted parameters for 22452 genes
Computing corrected count matrix for 22452 genes
Calculating gene attributes
Wall clock passed: Time difference of 23.15094 secs
Determine variable features
Regressing out percent.mt, S.Score, G2M.Score
|=================================================================================================================================================================================================================================================================================| 100%
Centering data matrix
|=================================================================================================================================================================================================================================================================================| 100%
Getting residuals for block 1(of 2) for counts dataset
Getting residuals for block 2(of 2) for counts dataset
Centering data matrix
|=================================================================================================================================================================================================================================================================================| 100%
Finished calculating residuals for counts
Set default assay to SCT
Number of nodes: 8924
Number of edges: 274391
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9218
Number of communities: 25
Elapsed time: 0 seconds
Second step: Get residuals using fitted parameters for 22452 genes
Computing corrected count matrix for 22452 genes
Calculating gene attributes
Wall clock passed: Time difference of 22.24379 secs
Determine variable features
Centering data matrix
|=================================================================================================================================================================================================================================================================================| 100%
Getting residuals for block 1(of 2) for counts dataset
Getting residuals for block 2(of 2) for counts dataset
Centering data matrix
|=================================================================================================================================================================================================================================================================================| 100%
Finished calculating residuals for counts
Set default assay to SCT
Number of nodes: 8924
Number of edges: 274391
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9218
Number of communities: 25
Elapsed time: 0 seconds
# insert reproducible example here
The text was updated successfully, but these errors were encountered: