Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why using unnormalized data for FindVariableFeatures ? #2641

Closed
nebetbastet opened this issue Feb 20, 2020 · 2 comments
Closed

Why using unnormalized data for FindVariableFeatures ? #2641

nebetbastet opened this issue Feb 20, 2020 · 2 comments

Comments

@nebetbastet
Copy link

Hi,

I switched quite recently (few months ago) on Seurat3 as I was used to Seurat2.

I use a "personnal" normalization (scran) that I integrated in a Seurat object. By mistake, the normalized data were added in two slots:
object@assays$RNA@data AND object@assays$RNA@counts
In other words, I lost raw data.

Until yesterday, I thought it was OK as I thought raw counts were not used again in the worflow.
It's only yesterday, when I decided to have a closer look on your paper on integration that I noticed that the "FindVariableFeatures" procedure was performed on raw data (at least with the default method "vst").

Unfortunately, the results are very different when I have raw counts or normalized counts in the "counts" slot : only 1/3 of genes are consistent...

I have two questions:

  • Why this choice of un-normalized data to identify variable genes ? Maybe I am mistaken, but I think this is not explained in the paper or in your vignette
  • If by mistake, you use normalized data in this procedure (as I did), which bias can you expect? In other words, why would it be "better" to use unnormalized data?

Thank you in advance for your answers!

@timoast
Copy link
Collaborator

timoast commented Feb 26, 2020

This is explained in the section "Feature selection for individual datasets" in Stuart and Butler et al. 2019

The section shown here:

Screen Shot 2020-02-26 at 2 51 30 PM

It is not valid to run the same procedure (selection.method="vst") on normalized count data

@duocang
Copy link

duocang commented Jul 2, 2021

@timoast

So unnormalized data is for FindVariableFeatures. We assume it is under RNA assay.

Can I run SCTransform first and FindVariableFeatures later?

I get different variable feature if SCTransform done. The unnormalized data from SCT assay is the corrected read counts, does the corrected read counts work better than counts from RNA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants