add n_top_genes argument to rank_genes_groups_df #2145
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses https://scanpy.discourse.group/t/workflow-for-selecting-number-of-marker-genes-in-sc-queries-enrich/286
I wanted to have a simple interface to get the top n marker genes. Right now,
rank_genes_groups_df
only allows to threshold on logfc and pval, but especially for marker genes pval computation might not be statistically meaningful.It adds the following kind of functionality:
output is just the top 2 genes of the list.
it also works for multiple groups:
This also extends to enrichment queries (this is what I wanted originally):
For enrichment queries, I added to the doc string that a pval threshold of 0.05 is used. Previously, this was not obvious to me (and for cluster marker genes, this might not always be sensible).
I didn't add anything to
docs/release-notes/
, yet. I first wanted to get your opinion. Is it useful, what is still needed here?