Why using unnormalized data for FindVariableFeatures ? #2641

nebetbastet · 2020-02-20T09:54:13Z

Hi,

I switched quite recently (few months ago) on Seurat3 as I was used to Seurat2.

I use a "personnal" normalization (scran) that I integrated in a Seurat object. By mistake, the normalized data were added in two slots:
object@assays$RNA@data AND object@assays$RNA@counts
In other words, I lost raw data.

Until yesterday, I thought it was OK as I thought raw counts were not used again in the worflow.
It's only yesterday, when I decided to have a closer look on your paper on integration that I noticed that the "FindVariableFeatures" procedure was performed on raw data (at least with the default method "vst").

Unfortunately, the results are very different when I have raw counts or normalized counts in the "counts" slot : only 1/3 of genes are consistent...

I have two questions:

Why this choice of un-normalized data to identify variable genes ? Maybe I am mistaken, but I think this is not explained in the paper or in your vignette
If by mistake, you use normalized data in this procedure (as I did), which bias can you expect? In other words, why would it be "better" to use unnormalized data?

Thank you in advance for your answers!

timoast · 2020-02-26T19:55:48Z

This is explained in the section "Feature selection for individual datasets" in Stuart and Butler et al. 2019

The section shown here:

It is not valid to run the same procedure (selection.method="vst") on normalized count data

duocang · 2021-07-02T09:52:27Z

@timoast

So unnormalized data is for FindVariableFeatures. We assume it is under RNA assay.

Can I run SCTransform first and FindVariableFeatures later?

I get different variable feature if SCTransform done. The unnormalized data from SCT assay is the corrected read counts, does the corrected read counts work better than counts from RNA?

timoast closed this as completed Feb 26, 2020

zhewa mentioned this issue May 10, 2021

useAssay should be raw counts in seuratFindHVG compbiomed/singleCellTK#467

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why using unnormalized data for FindVariableFeatures ? #2641

Why using unnormalized data for FindVariableFeatures ? #2641

nebetbastet commented Feb 20, 2020

timoast commented Feb 26, 2020

duocang commented Jul 2, 2021 •

edited

Why using unnormalized data for FindVariableFeatures ? #2641

Why using unnormalized data for FindVariableFeatures ? #2641

Comments

nebetbastet commented Feb 20, 2020

timoast commented Feb 26, 2020

duocang commented Jul 2, 2021 • edited

duocang commented Jul 2, 2021 •

edited