Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of pre-processing steps #48

Closed
parashardhapola opened this issue Nov 15, 2017 · 1 comment
Closed

Order of pre-processing steps #48

parashardhapola opened this issue Nov 15, 2017 · 1 comment

Comments

@parashardhapola
Copy link

parashardhapola commented Nov 15, 2017

Hi,

In the example notebook, seurat.ipynb, the function sc.pp.normalize_per_cell() is run before sc.pp.regress_out(). Is it not better to regress out the effect of n_counts before normalization? I do not completely understand this and it would be great if the authors could explain this order of pre-processing. Also, is there certain order(s) of steps which should always be avoided?

Thank you.

Best,
Parashar

@falexwolf
Copy link
Member

I'm very sorry for having forgotten about this issue...

Of course, sc.pp.normalize_per_cell() stores the total counts per cell prior to normalization as n_counts. See the examples here https://scanpy.readthedocs.io/en/latest/api/scanpy.api.pp.normalize_per_cell.html

Performing the normalization removes the effect of having different total counts per cell by scaling each gene with the total counts.

But one might want more: if there is still some correlation of a gene with n_counts after normalization, one concludes that the simple scaling done in normalization has not fully removed the effect of n_counts on that particular gene. Hence, using sc.pp.regress_out, one performs an additional gene-wise correction.

I have to admit that I have not investigated how necessary this is. As you know, this is adapted from the Seurat tutorial - I guess the authors of Seurat found it useful in some cases to fully remove the effect of n_counts on each single gene.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants