Split rank_genes_groups #1081

Koncopd · 2020-03-03T16:35:30Z

Splits rank_genes_groups into helper functions.
Related to 1), 2) of this.
I don't see any point in using dataframe internally (the second point), dict is more convenient here.

falexwolf · 2020-03-06T11:19:47Z

Thank you! This looks great!

What do you think, @ivirshup?

Now, @Koncopd, can you also add a logg.warn("default of method has been changed to 't-test' from 't-test_overestim_var'")

And actually change the value?

Finally, can you test this on the Scanpy default tutorial and a a test based on the numerical values in addition to the image based tests? https://github.com/theislab/scanpy/blob/7e058a1a6a082e34a101d65fc7ac5e9cb6563220/scanpy/tests/notebooks/test_pbmc3k.py#L109-L111

Thank you very much!

Otherwise, this is good to be merged, IMO.

Koncopd · 2020-03-06T12:45:17Z

@falexwolf , i'll do.

Koncopd · 2020-03-06T12:48:26Z

Also there are 2 numerical tests here
https://github.com/theislab/scanpy/blob/master/scanpy/tests/test_rank_genes_groups.py
https://github.com/theislab/scanpy/blob/master/scanpy/tests/test_rank_genes_groups_logreg.py

gokceneraslan · 2020-03-06T13:36:42Z

Guys, now that you're dissecting rank_genes_groups, do you mind adding additional info to the results which is the fraction of cells expressing the genes (similar to what we have in dotplots) People calculate it manually ever time and it can be painful for those who are not familiar with pandas.

Koncopd · 2020-03-06T14:10:41Z

@gokceneraslan could you point me to code or an example?

Koncopd · 2020-03-06T14:11:35Z

Test fails seem unrelated to this PR.

gokceneraslan · 2020-03-06T14:23:36Z

From Seurat tutorial:

pct.1 and pct.2 are the ones I mentioned.

ivirshup

This really needed doing, thanks for diving into it!

@Koncopd, how clean would you like to get this? I've noted a few initial things which I think would make this a bit more simple, but I'd like to hear your goals here. Are you interested in doing more with the differential expression tooling?

ivirshup · 2020-03-07T07:34:44Z

scanpy/tools/_rank_genes_groups.py

+        if corr_method == 'benjamini-hochberg':
+            _, pvals_adj, _, _ = multipletests(pvals, alpha=0.05, method='fdr_bh')
+        elif corr_method == 'bonferroni':
+            pvals_adj = np.minimum(pvals * n_genes, 1.0)


Instead of doing this in the testing function (i.e. _t_test, _wilcoxon, etc.), could this be done afterwards since it's the same regardless of test type?

This would also let us remove the corr_method argument from these functions.

ivirshup · 2020-03-07T07:36:51Z

scanpy/tools/_rank_genes_groups.py

+        foldchanges = (expm1_func(mean_group) + 1e-9) / (
+            expm1_func(mean_rest) + 1e-9
+        )  # add small value to remove 0's


Does this change for each testing function? If not, could it be separated?

ivirshup · 2020-03-07T07:53:05Z

scanpy/tools/_rank_genes_groups.py

+        scores_sort = np.abs(scores) if rankby_abs else scores
+        global_indices = _select_top_n(scores_sort, n_genes_user, n_genes)


Similarly, can these be split out from these functions?

ivirshup · 2020-03-07T07:53:10Z

scanpy/tools/_rank_genes_groups.py

+
+    if issparse(X):
+        merge = lambda tpl: vstack(tpl).todense()
+        adapt = lambda X: X.todense()


Ideally we avoid using np.matrix. Could these be X.toarray()?

Koncopd · 2020-03-09T10:51:20Z

@ivirshup thanks for catching these things, i'll fix them.
In addition to some splitting and cleaning i also want to improve wilcoxon implementation (avoid densification at least, i have some code for it). But this should be another step, i think.

Koncopd · 2020-03-09T12:01:10Z

@gokceneraslan i'll add these things.

gokceneraslan · 2020-03-16T19:04:31Z

scanpy/tools/_rank_genes_groups.py

+            )  # add small value to remove 0's
+            if 'logfoldchanges' not in d:
+                d['logfoldchanges'] = {}
+            d['logfoldchanges'][group_name] = np.log2(foldchanges[global_indices])


Sweet. Does this also produce logfoldchanges for logreg? sc.get.rank_genes_groups_df throws an exception now:

Thank you for catching this. I'll check why it happens.

Oh sorry for misunderstanding. Error I posted is from stable scanpy, not from the PR. I was asking if it's fixed now.

This is the problem with sc.get. logreg only has names and scores keys, but sc.get tries to query nonexistent 'logfoldchanges' and so on.

Should i add logfoldchanges for logreg?

I don't want to interrupt the PR, but I think logfoldchange and percentages should be available for every method.

Should the percentage for the reference group be also calculated and stored?

I think so. It might look unnecessary for one-vs-rest kind of tests but for clusterA-vs-clusterB kind of tests it'd be useful.

Koncopd · 2020-03-18T00:26:13Z

This is still very messy and very inefficient with this separate calculation of pts. I should work on it more to restructure completely...

Koncopd added 8 commits February 26, 2020 11:45

split rank_genes_groups

59de70b

fix names rgg

624636f

del outer ns

712b6a0

rm vars

7005bc7

Merge branch 'master' into rgg_refactor

497a4c7

rm another var

ad68650

rm method

359308b

add generator for chunks

75f084b

default to t-test

59b51a7

fix warning

f78cefb

Koncopd force-pushed the rgg_refactor branch from 86c1cd1 to f78cefb Compare March 6, 2020 13:28

ivirshup requested changes Mar 7, 2020

View reviewed changes

toarray

dce52b8

Koncopd force-pushed the rgg_refactor branch from f77f6a6 to dce52b8 Compare March 16, 2020 13:27

Koncopd added 2 commits March 16, 2020 14:29

Merge branch 'master' into rgg_refactor

8e4f4d5

simplify further

e7eada0

gokceneraslan reviewed Mar 16, 2020

View reviewed changes

calculate pts

11d27bb

gokceneraslan mentioned this pull request Apr 7, 2020

Making scores parameterized #1152

Closed

Koncopd closed this Apr 8, 2020

Koncopd mentioned this pull request Apr 8, 2020

rank_genes_groups refactoring 2nd try #1156

Merged

gokceneraslan mentioned this pull request Oct 10, 2020

Add pts and pts_rest to rank_genes_groups_df and allow multiple groups #1388

Merged

flying-sheep deleted the rgg_refactor branch October 30, 2023 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split rank_genes_groups #1081

Split rank_genes_groups #1081

Koncopd commented Mar 3, 2020

falexwolf commented Mar 6, 2020

Koncopd commented Mar 6, 2020

Koncopd commented Mar 6, 2020

gokceneraslan commented Mar 6, 2020

Koncopd commented Mar 6, 2020

Koncopd commented Mar 6, 2020

gokceneraslan commented Mar 6, 2020

ivirshup left a comment

ivirshup Mar 7, 2020

ivirshup Mar 7, 2020

ivirshup Mar 7, 2020

ivirshup Mar 7, 2020

Koncopd commented Mar 9, 2020

Koncopd commented Mar 9, 2020

gokceneraslan Mar 16, 2020

Koncopd Mar 16, 2020

gokceneraslan Mar 16, 2020 •

edited

Koncopd Mar 16, 2020

Koncopd Mar 16, 2020

gokceneraslan Mar 17, 2020

Koncopd Mar 17, 2020

gokceneraslan Mar 17, 2020

Koncopd commented Mar 18, 2020 •

edited

		scores_sort = np.abs(scores) if rankby_abs else scores
		global_indices = _select_top_n(scores_sort, n_genes_user, n_genes)

Split rank_genes_groups #1081

Split rank_genes_groups #1081

Conversation

Koncopd commented Mar 3, 2020

falexwolf commented Mar 6, 2020

Koncopd commented Mar 6, 2020

Koncopd commented Mar 6, 2020

gokceneraslan commented Mar 6, 2020

Koncopd commented Mar 6, 2020

Koncopd commented Mar 6, 2020

gokceneraslan commented Mar 6, 2020

ivirshup left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Koncopd commented Mar 9, 2020

Koncopd commented Mar 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gokceneraslan Mar 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Koncopd commented Mar 18, 2020 • edited

gokceneraslan Mar 16, 2020 •

edited

Koncopd commented Mar 18, 2020 •

edited