-
Notifications
You must be signed in to change notification settings - Fork 251
[query/vds] Don't densify past the end of contigs #13184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test?
hail/python/hail/vds/methods.py
Outdated
@@ -39,6 +39,13 @@ def to_dense_mt(vds: 'VariantDataset') -> 'MatrixTable': | |||
Dataset in dense MatrixTable representation. | |||
""" | |||
ref = vds.reference_data | |||
# FIXME(chrisvittal) consider changing END semantics on VDS to make this better | |||
# see https://github.com/hail-is/hail/issues/13183 for why this is here and more discussion | |||
ref = ref.transmute_entries(END_GLOBAL=hl.locus(contig=ref.locus.contig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function this calls is not super cheap, better to separate this out into the pieces I suggested on that issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine. I disagree though. This is simple, correct, and maintainable. It is still possible to densify past the end of a contig with your solution, END
could be larger than the contig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if END is greater than the max position of a contig? That's bad data, I'm OK having wrong results for densify there. If you want to catch that, can do so without calling the locus constructor per entry anyhow I think
fd40410
to
c51bbae
Compare
aa71a62
to
02b0a7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just one issue with field names
hail/python/hail/vds/methods.py
Outdated
# FIXME(chrisvittal) consider changing END semantics on VDS to make this better | ||
# see https://github.com/hail-is/hail/issues/13183 for why this is here and more discussion | ||
# we assume that END <= contig.length | ||
ref = ref.annotate_rows(locus_global_pos=ref.locus.global_position(), locus_pos=ref.locus.position) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field names should probably have underscore prefix to avoid clobbering user data
hail/python/hail/vds/methods.py
Outdated
# see https://github.com/hail-is/hail/issues/13183 for why this is here and more discussion | ||
# we assume that END <= contig.length | ||
ref = ref.annotate_rows(_locus_global_pos=ref.locus.global_position(), _locus_pos=ref.locus.position) | ||
ref = ref.transmute_entries(_END_GLOBAL=ref.locus_global_pos + (ref.END - ref.locus_pos)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
fixes #13183