Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate differential expression between user defined groups of clusters #397

Closed
aditisk opened this issue Dec 13, 2018 · 20 comments
Closed

Comments

@aditisk
Copy link

aditisk commented Dec 13, 2018

Hello,

I want to be able to use sc.tl.rank_genes_groups() to calculate differential expression between two groups of my choice. For example, if I have 16 clusters in my UMAP plot and I want to compare group 1 (all cells in clusters 1 to 8) to group 2 (all cells in clusters 9 to 16) how can I do this ?

Thank you in advance for any help.

@LuckyMD
Copy link
Contributor

LuckyMD commented Dec 13, 2018

Hi,

You could just create a new .obs variable with the two groups and the perform sc.tl.rank_genes_groups() over this variable. For example, you could do something like this:

adata.obs['groups'] = ['group 1' if int(i) < 9 else 'group 2' for i in adata.obs['louvain']]
sc.tl.rank_genes_groups(adata, groupby='groups', key_added='group_DE_results')

as there are only two groups the top-ranked genes for either groups will be the up-regulated genes in that group (and down-regulated in the other group) that are most differentially expressed between the groups.

You should however note that rank_genes_groups is not a particularly sensitive test for differential gene expression. While it is good for a quick exploratory analysis, other tools like limma or MAST may give you more DEG results.

@aditisk
Copy link
Author

aditisk commented Dec 14, 2018

Hi,

Thank you so much for the prompt response. I was able to make the comparisons following your method. As you suggested, I am going to try using MAST or limma for DEG testing in the future.

Thanks.

@LuckyMD LuckyMD closed this as completed Dec 15, 2018
@falexwolf
Copy link
Member

Thank you so much for fielding so many of the issues, @LuckyMD! 😄

Can we elaborate a bit further on this one, though? For simple two-group comparisons, rank_genes_groups with method='wilcoxon' (Wilcoxon-Rank-Sum/Mann-Whitney U test) should be a legit choice, shouldn't it? It's used in many of this year's Nature, Cell and Science single-cell papers, it's the default test of Seurat (https://satijalab.org/seurat/de_vignette.html) and several people reported that it performs well in Sonison & Robinson, Nat Meth (2018). So, I don't think one needs to encourage people to immediately go to the great and powerful MAST, limma and DESeq2.

Can you point me to a reference that shows that a Wilcoxon-Rank-Sum test is less sensitive? How is this even a useful statement if you don't talk about the false positives you buy in? We should look at an AUC that scans different p-values, right? A bit more than a year ago, @tcallies and I had a full paper draft discussing AUCs for marker gene detection formulated as a classification problem, but we never finished it. In the general setting, it's not at all straightforward to make the evaluation a well-defined problem and other people will for sure have done a better job. Unfortunately, I have never fully caught up with the literature, I fear...

@falexwolf
Copy link
Member

And, I think, we'd want to start advertising diffxpy in addition to the usual suspects, shouldn't we? Any comments from your side, @davidsebfischer?

@LuckyMD
Copy link
Contributor

LuckyMD commented Dec 17, 2018

@falexwolf I try to answer where I can.

I should probably have clarified a bit above. I would argue that most real data DE tests benefit from accounting for technical covariates. For example, you should probably not perform batch correction on your data and then do a wilcoxon rank sum test, but instead take the normalized (and log transformed) data or the raw counts and include a batch covariate in the test. This also holds for technical covariates that describe the complexity of the data (such as size factors or n_genes). Often these factors are not sufficiently accounted for by simple normalization techniques (especially for plate-based data), and are thus included in the DE testing framework. This is done in MAST (and MAST performs better with this detRate covariate in the Soneson & Robinson paper you cite above), and it is also done in a recent negative binomial DE test from Mayer et al, Nature 2018. When you are not able to fit the background variability in your model, you will have a lower sensitivity. Accounting for covariates is obviously not possible with t-tests or wilcoxon rank sum tests. Hence my statement about lower sensitivity. They did perform comparatively well in the DE method comparison, which is why I'd argue that they're useful for first pass exploratory applications (and marker gene detection when you don't want to use more fancy approaches like this). However, if you can account for technical covariates, that's probably a good approach to use.

Also, according to the comparison paper you mention, there are not more false positives when using MAST or limma compared to t-tests or Wilcoxon rank sum tests.

@davidsebfischer
Copy link

I agree with @LuckyMD about the points regarding covariates.

With respect to two group comparisons without confounding, rank-sum tests have less statitical power than t-tests (https://stats.stackexchange.com/questions/130562/why-is-the-asymptotic-relative-efficiency-of-the-wilcoxon-test-3-pi-compared), disclaimer I haven't checked this proof, I think this is a standard statistics result though, this is also discussed here https://stats.stackexchange.com/questions/121852/how-to-choose-between-t-test-or-non-parametric-test-e-g-wilcoxon-in-small-sampl. I havent run simulations to check how big the influence of the difference in power is on the kind of data we encounter. However, as also pointed out by the second link, violations of the distributional assumptions for t-test impact these results and these violations will be major on scRNAseq. Intuitively I would therefore tend to rank-sum tests.

With respect to diffxpy: We can account for other noise models in the two-group comparisons by performing model fitting, tutorial here. The bioarxiv will hopefully be up in the next few weeks.

@LuckyMD
Copy link
Contributor

LuckyMD commented Dec 21, 2018

Also, I just found time to read the links you posted @davidsebfischer. Shouldn't we always use a welch t-test instead of a t-test in marker gene detection according to your second stackexchange link? They state that If you don't have a good reason to assume equal variances in the groups, then use the Welch correction... if we have a group vs rest type of setup as we do in rank_genes_groups() at the moment, then we would definitely not expect a single cluster to have an equal variance to the combination of all other cells in other clusters.

I think the default is currently t-test-overestimate-var... being oblivious to exactly how that works, might it not be better to adapt that to a welch-t-test-overestimate-var or something like that @falexwolf?

@falexwolf
Copy link
Member

@LuckyMD

Thank you for the whole in-depth discussion. It makes a lot of sense! 😄

To your question: Scanpy has used Welch's adaption of Student's t-test from the very beginning.

@davidsebfischer

Thank you! I guess it would be nice to have a single-cell tutorial in diffxpy that shows higher sensitivity by accounting for technical covariates.

@LuckyMD
Copy link
Contributor

LuckyMD commented Dec 26, 2018

It's always a nice feeling when your understanding catches up to someone else's decision making a long time ago... ;)

@sophiamaedler
Copy link

Any news when diffxpy will be up on biorxiv? The package looks super interesting and I'd like to learn more about the background to the methods you implemented.

@wangjiawen2013
Copy link

Now seurat performs DE analysis using alternative tests including MAST and DESeq2 in a convinent way, such as FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "MAST"). So I hope that Scanpy could interated more methods too, such as diffxpy in this way:
sc.tl.rank_gene_groups(adata, method='diffxpy' or 'MAST'). Here is the hyperlink of DE analysis in Seurat:

https://satijalab.org/seurat/v3.0/de_vignette.html

@DingWB
Copy link

DingWB commented Nov 8, 2019

I agree with wangjiawen2013, Scanpy should add MAST and DESeq2 into sc.tl.rank_gene_groups too.

@wangjiawen2013
Copy link

This field is developing very fast, more and more advanced DE test methos are emerging, it's better to adopt these powerful methods.

@wangjiawen2013
Copy link

wangjiawen2013 commented Nov 8, 2019

One of the shortcomings of scanpy's default DE testing is that p-values (or FDR) of a few genes are very significant (equal 0 or approximately 0 in some datasets), then it's impossible to execute -log transformation, even there is only one 0. The volcano plot will be not beautiful because of the high significance.
@falexwolf

@flying-sheep
Copy link
Member

flying-sheep commented Nov 8, 2019

@davidsebfischer is developing https://github.com/theislab/diffxpy, which is a great way to do differential expression in a scanpy workflow. We should document this!

Do you think the p-values are inflated? If so, how could we propagate uncertainty to them?

@wangjiawen2013
Copy link

please refer to this:
theislab/diffxpy#127
theislab/diffxpy#128

@chris-rands
Copy link
Contributor

Thanks all for the interesting discussion- did a consensus emerge on the 'best' way to do differential expression testing in scanpy?

@LuckyMD
Copy link
Contributor

LuckyMD commented May 28, 2020

We typically use diffxpy for this. I would check out that repo. It's fast, and quite versatile. Documentation is also getting to a pretty good state I think.

@chris-rands
Copy link
Contributor

Thank you @LuckyMD. Naive question, but what is the advantage of diffxpy over sc.tl.rank_genes_groups? I read comments above about noise models and technical covariates, but I don't fully understand the model fitting aspect and both methods seem to offer similar tests like T-tests and Wilcoxon rank-sum.

@LuckyMD
Copy link
Contributor

LuckyMD commented May 29, 2020

sc.tl.rank_genes_groups offers only the T-test (including a second version of this) and Wilcoxon rank-sum test. These tests are also in diffxpy, but there are fare more sophisticated parametric models which you can (and probably should) use to test for differential expression in different setups. Ideally, to run a DE test you would want to model raw count data as a negative binomial and then add covariates to the model (like size factors, sample, condition, etc.). This is only possible in diffxpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants