New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsetting integrated data #3465
Comments
What discussion are you referring to? I don't see any reason why you shouldn't rescale after subsetting, and as you point out rescaling would generally be preferred. |
Someone states here that it is not supported to rescale a subset of the integrated assay in Seurat v3. I am using v3. Someone mentions here not to rescale a subset of the integrated assay (though they are talking about SCtransform method) In this case I notice the poster does not rescale their subset before re-clustering Here they discourage running FindVariableFeatures() on a subset of integrated assay and recommend switching to the RNA assay, and someone mentions it as " still matter of debate" whether to work with a subset of the integrated assay Reading these have left me uneasy about the way I'm handling my sub-clustering approach. I guess I'm just looking for confirmation on whether there's a strong technical reason to discourage running ScaleData() after subsetting the integrated assay, at least for the standard v3 integration method: https://satijalab.org/seurat/v3.1/integration.html. Perhaps I'm just getting confused between best practice for SCtransform vs standard approach. |
I read that issue but couldn't see where anyone said not to rescale. I made a comment that you shouldn't repeat the integration using a subset of cells which is a separate issue.
When using SCTransform you can't run ScaleData after integration as the integrated data is stored in the To be clear: you can run ScaleData on a subset of the integrated assay when using log-normalized data but not when using SCTransform-normalized data |
I have integrated data, computed using the standard workflow (not SCtransform). I wish to subset the data for sub-clustering, using an iterative hierarchical clustering approach. I understand from the discussion I've been able to find that it's not recommended to re-scale the subsetted integrated assay. The alternative options I've seen are to use the RNA assay, or use the scaled data from the original object prior to subsetting.
The issue is that my RNA assay is too batch effected to use, and attempting to use the original scaled matrix seems strange for hierarchical clustering. I compute correlation distance on the scaled data to get my input for hierarchical clustering. Using genes scaled relative to a different set of cells seems like it may impact my correlation computation in an undesirable way.
I've tried proceeding using a scaled subset, which gives clusters that looks sensible in the embedding and have clear DE genes (first dendrogram). Whereas proceeding without rescaling gives a dendrogram that suggests a lack of well defined subclusters, and an overall failure to identify distinctions even though we're confident the subgroup contains notable heterogeneity (second dendrogram). I worry that using the globally scaled data isn't showing enough subgroup-specific contrast. What is the motivation behind discouraging scaling subsets of the integrated assay, and are there situations where it might be acceptable?
The text was updated successfully, but these errors were encountered: