-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalization and gene selection by analytical Pearson residuals #1715
Conversation
cool, looking forward to your feedback! |
Sounds good to me! |
b9f0424
to
76073cc
Compare
I finished the edits to conform with #1667! Now looking forward to your thoughts :) One note on the coding style checks (which I'm not very experienced with): When I activate pre-commit locally, it finds quite a number of style violations in parts of the code that I did not touch and automatically fixes them. This causes many changes that are unrelated to the code I wrote. That's why I disabled pre-commit again (so you don't have to go over all these changes), and tried to follow the style guide manually as good as possible. Hope that is ok for now.. Do you have any advice how to handle that? Should I just do one "style" commit (that fixes all these issues throughout the files I work on here) once you've checked the new parts of the code I wrote? Or should the style be ok in all "old" parts of the code, implying that I set up pre-commit wrong? I'm new to it so that could very well be the case as well.. |
Could you share the output you're getting here? The output I'm seeing on the pre-commit CI looks pretty normal, so it'd be helpful to know more exactly what was happening locally. We've also just activated pre-commit, so it's a bit new to us too! |
Hey @ivirshup, hope this helps, let me know if you need anything else.
|
Just a quick note as I just read up on this... I think we have set up black with a line length of 88 chars, and not 79 at the moment. |
Ah I think I see the issue! Feature branches should be based off I think this should largely be manageable by rebasing onto master (e.g. Side note: We're considering separating the |
…ation and hvg selection
…rson_residuals() as well
5b7dd99
to
fc49c25
Compare
Thanks a lot, I rebased and changed the PR target to
Sounds good! |
Codecov Report
@@ Coverage Diff @@
## master #1715 +/- ##
==========================================
+ Coverage 71.46% 71.95% +0.48%
==========================================
Files 92 98 +6
Lines 11305 11531 +226
==========================================
+ Hits 8079 8297 +218
- Misses 3226 3234 +8
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @jlause for the PR! This is really exciting and super useful!
This is a first round of review, most comments are re types, args and function behviour. I think it looks really good overall and maybe it's time to start writing tests ?
please let me know if anything unclear and also thanks in advance for code explanations!
def normalize_pearson_residuals( | ||
adata: AnnData, | ||
theta: float = 100, | ||
clip: Union[Literal['auto', 'none'], float] = 'auto', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above re clipping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
def normalize_pearson_residuals_pca( | ||
adata: AnnData, | ||
theta: float = 100, | ||
clip: Union[Literal['auto', 'none'], float] = 'auto', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Hey @giovp ,
Any advice/ideas what else should be tested? |
yes both makes sense, it would also be useful to come up with a dummy example for which the actual output could be tested against. This is done in seurat_v3 for instance, but in that case it's kind of straightforward because the "expected" is the output computed with original implementation (and as you catched in #1732 it's still might not be enough 😄 ). on another note, I was thinking if it makes sense to also release a short tutorial together with the PR (that would be on theislab/scanpy_tutorials) ? I think that for a lot of people the term "pearson residuals" could be alienating, and so they'd rather stick to |
I think it should stay for the rest, I think we are at a good stage, I'd ask @jlause to build docs locally
if this gets approval, before merging to master todo:
p.s. docs are failing for reasons I have haven't figured out yet |
Hi!
I agree.
I started doing that and will finish up tomorrow - there Qs in advance if you happen to look at this before:
I've looked at that and commented in the respective github :) Very happy we are getting this wrapped up now :) |
I'm not sure, I think with math is nicer but not aware of any convention. @ivirshup ?
must say I missed those sorry, feel free to add and I'll take a look again tomorrow and wrap it up. |
Hi, I worked over the docs completely once but still need to do a few small things (release note, final spell check) but need to leave my desk now - will do the final bits either later today or tomorrow! |
Hi @giovp , Also, it seems that building the docs is failing again on github (locally it works with some warnings). Again I'm not sure why / if it is even related to my changes 🤔 Let me know if I can help with fixing that or if anything else comes up! Best, Jan. |
ok tutorial is merged, you can have a look how it renders here: https://scanpy-tutorials.readthedocs.io/en/latest/tutorial_pearson_residuals.html I've fixed the tutorial.rst page and the release note. To me it looks good, I'd like to get @ivirshup approval on this before merging.
really clear and coincise btw, great job |
LGTM Thanks for pushing and getting this through! And thanks for picking up the review on this @giovp! |
thank you @ivirshup ! |
Hi, @jlause There's a issue when using |
Hi everyone,
a while back, @giovp asked me to create a pull request that integrates analytical Pearson residuals in scanpy. We already discussed a bit with @LuckyMD and @ivirshup over at berenslab/umi-normalization#1 how to structure it, and now I made a version that should be ready for review.
As discussed earlier, this pull request implements two core methods:
sc.pp.normalize_pearson_residuals()
, which applies the method toadata.X
. Overall, the function is very similar in structure tosc.pp.normalize_total()
(support for layers, inplace operation etc).sc.pp.highly_variable_genes(flavor='pearson_residuals')
, which selects genes based on Pearson residual variance. The "inner" function_highly_variable_pearson_residuals()
is structured similarly to_highly_variable_seurat_v3()
(support for multiple batches, median ranks for tie breaking). It includes thechunksize
argument to allow for memory-efficient computation of the residual variance.We discussed quite a lot how to implement a third function that would bundle gene selection, normalization by analytical residuals and PCA. This PR includes the two options that emerged at the end of that discussion, so now we have to choose ;)
sc.pp.recipe_pearson_residuals()
which does HVG selection and normalization both via Pearson residuals prior to PCAsc.pp.normalize_pearson_residuals_pca()
which applies any HVG selection if the user previously added one to theadata
object, and then normalizes via Pearson residuals and does PCA.Both functions retain the raw input counts as
adata.X
and add fields for PCA/Normalization/HVG selection results (or return them) as applicable, most importantly theX_pca
inadata.obsm['pearson_residuals_X_pca']
. I hope this addresses some of the issues we discussed over at the other repo in a scanpy-y way.Let me know what you think and where you think improvements are needed!
Cheers, Jan.