Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

highly deviant genes implementation #1765

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ktpolanski
Copy link
Contributor

An implementation of highly deviant gene identification from the 2019 GLMPCA paper. I'm rather fond of the method, as it's a straightforward statistical measure, and comes with significance testing as a form of data-driven cutoff.

I put it in a new highly_deviant_genes() function, as:

  • it comes with a number of unique parameters, and there's only so many different algorithms highly_variable_genes() can house
  • the paper argues that highly deviant is different from highly variable

I acknowledge that there are no tests, I'm hoping to get some assistance with that if possible.

@codecov
Copy link

codecov bot commented Mar 26, 2021

Codecov Report

Merging #1765 (3569f57) into master (560bd5d) will decrease coverage by 0.38%.
The diff coverage is 19.27%.

@@            Coverage Diff             @@
##           master    #1765      +/-   ##
==========================================
- Coverage   71.18%   70.80%   -0.39%     
==========================================
  Files          92       93       +1     
  Lines       11190    11273      +83     
==========================================
+ Hits         7966     7982      +16     
- Misses       3224     3291      +67     
Impacted Files Coverage Δ
scanpy/preprocessing/_highly_deviant_genes.py 18.29% <18.29%> (ø)
scanpy/preprocessing/__init__.py 100.00% <100.00%> (ø)

@ivirshup ivirshup added Enhancement ✨ Needs info❔ More information needed labels Mar 29, 2021
@ivirshup
Copy link
Member

I like that this method is fairly simple, and could have a meaningful cutoff, but I think I'd like more evidence of it's usefulness before thinking about including it.

I have two main points of concern:

  • Are there examples of this method being used outside of the glmPCA paper? I would at least like to know that reasonable results can be found downstream of this.
  • In the glmPCA paper, the identified genes are highly correlated (1) with highly expressed genes, and lowly correlated (.3 with highly variable gene selection. While I'm not sure which highly variable gene method they compared against, should the low correlation with common practice give us pause?

image

@giovp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Needs info❔ More information needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants