Subsetting integrated data #3465

fisherj-2212 · 2020-09-02T15:23:36Z

I have integrated data, computed using the standard workflow (not SCtransform). I wish to subset the data for sub-clustering, using an iterative hierarchical clustering approach. I understand from the discussion I've been able to find that it's not recommended to re-scale the subsetted integrated assay. The alternative options I've seen are to use the RNA assay, or use the scaled data from the original object prior to subsetting.

The issue is that my RNA assay is too batch effected to use, and attempting to use the original scaled matrix seems strange for hierarchical clustering. I compute correlation distance on the scaled data to get my input for hierarchical clustering. Using genes scaled relative to a different set of cells seems like it may impact my correlation computation in an undesirable way.

I've tried proceeding using a scaled subset, which gives clusters that looks sensible in the embedding and have clear DE genes (first dendrogram). Whereas proceeding without rescaling gives a dendrogram that suggests a lack of well defined subclusters, and an overall failure to identify distinctions even though we're confident the subgroup contains notable heterogeneity (second dendrogram). I worry that using the globally scaled data isn't showing enough subgroup-specific contrast. What is the motivation behind discouraging scaling subsets of the integrated assay, and are there situations where it might be acceptable?

timoast · 2020-09-04T18:22:58Z

I understand from the discussion I've been able to find that it's not recommended to re-scale the subsetted integrated assay

What discussion are you referring to? I don't see any reason why you shouldn't rescale after subsetting, and as you point out rescaling would generally be preferred.

fisherj-2212 · 2020-09-07T08:48:16Z

Someone states here that it is not supported to rescale a subset of the integrated assay in Seurat v3. I am using v3.
#1547

Someone mentions here not to rescale a subset of the integrated assay (though they are talking about SCtransform method)
#1883

In this case I notice the poster does not rescale their subset before re-clustering
#2340

Here they discourage running FindVariableFeatures() on a subset of integrated assay and recommend switching to the RNA assay, and someone mentions it as " still matter of debate" whether to work with a subset of the integrated assay
#1528

Reading these have left me uneasy about the way I'm handling my sub-clustering approach. I guess I'm just looking for confirmation on whether there's a strong technical reason to discourage running ScaleData() after subsetting the integrated assay, at least for the standard v3 integration method: https://satijalab.org/seurat/v3.1/integration.html.

Perhaps I'm just getting confused between best practice for SCtransform vs standard approach.

timoast · 2020-09-07T15:53:01Z

Someone states here that it is not supported to rescale a subset of the integrated assay in Seurat v3. I am using v3.
#1547

I read that issue but couldn't see where anyone said not to rescale. I made a comment that you shouldn't repeat the integration using a subset of cells which is a separate issue.

Perhaps I'm just getting confused between best practice for SCtransform vs standard approach.

When using SCTransform you can't run ScaleData after integration as the integrated data is stored in the scale.data slot (and so the integration results would be overwritten by re-running ScaleData), and I suspect this is the source of confusion around this issue.

To be clear: you can run ScaleData on a subset of the integrated assay when using log-normalized data but not when using SCTransform-normalized data

timoast added the more-information-needed We need more information before this can be addressed label Sep 4, 2020

no-response bot removed the more-information-needed We need more information before this can be addressed label Sep 7, 2020

timoast closed this as completed Sep 7, 2020

anastasiiaNG mentioned this issue Feb 10, 2022

How to perform subclustering and DE analysis on a subset of an integrated object #1883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsetting integrated data #3465

Subsetting integrated data #3465

fisherj-2212 commented Sep 2, 2020

timoast commented Sep 4, 2020

fisherj-2212 commented Sep 7, 2020

timoast commented Sep 7, 2020

Subsetting integrated data #3465

Subsetting integrated data #3465

Comments

fisherj-2212 commented Sep 2, 2020

timoast commented Sep 4, 2020

fisherj-2212 commented Sep 7, 2020

timoast commented Sep 7, 2020