Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to merge clusters and what steps needed after merging in SCTransform workflow? #4128

Closed
denvercal1234GitHub opened this issue Feb 22, 2021 · 5 comments

Comments

@denvercal1234GitHub
Copy link

denvercal1234GitHub commented Feb 22, 2021

Hi there,

In the tutorial, it states "# note that if you wish to perform additional rounds of clustering after subsetting we recommend re-running FindVariableFeatures() and ScaleData()." #1883 seems to suggest to run everything again on a subsetted cells from a cluster.

But here I don't subset cells. I just merge cells from clusters or split the cluster:

  1. So if I processed my data using SCTransform, and get my clusters as usual. Then, if I merge some clusters, or split a cluster into smaller clusters, I do not need to run SCTransform() with percent.mt regressed (as I did with my original Seurat object), then ScaleData() to regress cell cycle genes, then RunPCA, RunUMAP, FindCluster() all over again on my Seurat object, right? But if I do, do I set the Assay argument to RNA?

  2. Assigning cell type identity to clusters #3239 and Combine clusters #3202 suggest that giving 2 clusters the same name will merge them, but wouldn't that mean I will need to re-run the FindAllMarkers() again now with the re-named clusters? Because after running RenameIdents(), it only changes the active.ident but the seurat_clusters in meta.data remain the same, and I believe FindAllMarkers() take the active.ident and not seurat_clusters.

  3. Would anyone mind confirming that when I run RenameIdent(), it really merges the 2 clusters and not simply change the name of the clusters?

Thank you for your help!

@denvercal1234GitHub denvercal1234GitHub changed the title How to FindVariableFeatures and ScaleData() after merging clusters in SCTransform workflow? What steps are needed after merging clusters in SCTransform workflow? Feb 22, 2021
@denvercal1234GitHub denvercal1234GitHub changed the title What steps are needed after merging clusters in SCTransform workflow? How to merge clusters and what steps needed after merging in SCTransform workflow? Feb 22, 2021
@samuel-marsh
Copy link
Collaborator

Hi,

Not member of the dev team but hopefully can be helpful.

  1. If you simply want to change the clusters for visualization or marker detection then no you don't need to re-run anything after merging clusters. If you haven't changed what cells are present in the object and simply just created a new active.ident then re-running the processing should just result in the same original clustering anyhow.

  2. Yes, if you are looking to identify markers that are different between the clusters then you should re-run FindMarkers or FindAllMarkers after merging clusters because the results will be different.

  3. So merging and changing the name of the active.ident slot are the same thing. The active.ident slot is a factor, so if two cells have the same value/name for active.ident they are considered the same (hence merged). As you state the original clustering can be found in meta.data and you can even stash multiple clustering results in meta.data if you like. What matters for the outputs of things like FindMarkers or FindAllMarkers as you point out is what the current active.ident is. So as long as active.ident contains your new cluster names with merged values then yes the clusters are merged as it pertains to downstream analyses but if you switched back to the original clustering from meta.data then they would be unmerged.

Hope that helps!
Best,
Sam

@denvercal1234GitHub
Copy link
Author

Thank you very much, Sam!

@denvercal1234GitHub
Copy link
Author

Hi @samuel-marsh - If I subset out certain cells from the Seurat object after running the whole pipeline, should I now rerun SCTransform() to normalize and scale in order to calculate/visualize the expression levels of certain genes of the selected cells in this new object? Thank you again for your response!

@saketkc
Copy link
Collaborator

saketkc commented Aug 10, 2021

Hi @denvercal1234GitHub, SCTransform need not be run twice - the data has already been normalized.

@saketkc saketkc closed this as completed Aug 10, 2021
@samuel-marsh
Copy link
Collaborator

Hi @denvercal1234GitHub if you are just wanting to visualize gene expression then no. But if you want to analyze that subset further you should run whole pipeline basically again as the variable genes, pca, snn, louvain, UMAP, etc are all based on whole set of cells instead of just the subset of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants