-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUGFIX: changed raise ValueError in align_track_with_cooler to just a… #360
Conversation
… warning Previously, the align_track_with_cooler() function raised a Value Error if the track values of a given region in the viewframe were all NaN. However, this check is not necessary because NaN values are anyways already masked.
Ratified 2022-06-01.
Hey, @gfudenberg. Is there a reason why empty regions in align track with cooler give an error rather than a warning? (Authorship deduced from git blame) I'm thinking about Deepti's case: why would one chromosome having all NaNs in the dataframe before assignment constitute an issue in the downstream analysis? |
Hi-- I don't know if I quite understand Deepti's case, but I think we discussed this in |
Thanks for getting back to it! I think the non-trivial thing here is that align_track_with_cooler does not distinguish absent bins in the track and NaN values in the track, and it's not entirely clear why. The solution might be to raise an error only if the chromosome is completely absent form the track (or some bins of it, as you propose in #271). |
Maybe it is the use in functions like cis_eig? E.g. if there is this all NaN chromosome, maybe this would throw an error when you try to correlate? cooltools/cooltools/api/eigdecomp.py Line 435 in 45a56df
|
…into align_track_fix
…track_with_cooler. NaNs and truly unassigned values are now distinguishable. There are now two options for align_track_with_cooler: ignore missing values or not. This parameter is propagated to saddles (because this check does not affect the results comparing to removal of chromosomes with all NaNs), but set to default True in eigdecomp (because at least one value per chromosome is strictly needed for phasing).
Okay, these are all good points that we discuss here. I introduced the parameter for ignoring missing values in alignment, and add it as a control option for saddle (because the chromosomes with all NaNs do not really affect the output). For eigdecomp you are right, @gfudenberg , there should be at least some value per each chromosome, otherwise phasing does not make sense. I set this new parameter to default True in eigdecomp. To sum up, this is small update of align_track_with_cooler, but might be important to those practicing various types of saddles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe consider
drop_track_nas --> drop_track_na
proposed BUGFIX: changed raise ValueError in align_track_with_cooler to just a warning
Previously, the align_track_with_cooler() function raised a Value
Error if the track values of a given region in the viewframe were
all NaN. However, this check is not necessary because NaN values
are anyways already masked.