Skip to content

[query/vds] Don't densify past the end of contigs #13184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 21, 2023

Conversation

chrisvittal
Copy link
Collaborator

fixes #13183

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test?

@@ -39,6 +39,13 @@ def to_dense_mt(vds: 'VariantDataset') -> 'MatrixTable':
Dataset in dense MatrixTable representation.
"""
ref = vds.reference_data
# FIXME(chrisvittal) consider changing END semantics on VDS to make this better
# see https://github.com/hail-is/hail/issues/13183 for why this is here and more discussion
ref = ref.transmute_entries(END_GLOBAL=hl.locus(contig=ref.locus.contig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function this calls is not super cheap, better to separate this out into the pieces I suggested on that issue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. I disagree though. This is simple, correct, and maintainable. It is still possible to densify past the end of a contig with your solution, END could be larger than the contig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if END is greater than the max position of a contig? That's bad data, I'm OK having wrong results for densify there. If you want to catch that, can do so without calling the locus constructor per entry anyhow I think

Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one issue with field names

# FIXME(chrisvittal) consider changing END semantics on VDS to make this better
# see https://github.com/hail-is/hail/issues/13183 for why this is here and more discussion
# we assume that END <= contig.length
ref = ref.annotate_rows(locus_global_pos=ref.locus.global_position(), locus_pos=ref.locus.position)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field names should probably have underscore prefix to avoid clobbering user data

# see https://github.com/hail-is/hail/issues/13183 for why this is here and more discussion
# we assume that END <= contig.length
ref = ref.annotate_rows(_locus_global_pos=ref.locus.global_position(), _locus_pos=ref.locus.position)
ref = ref.transmute_entries(_END_GLOBAL=ref.locus_global_pos + (ref.END - ref.locus_pos))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

@danking danking merged commit 6853168 into hail-is:main Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hl.vds.to_dense_mt incorrectly densifies the last reference block of a chromosome into the beginning of the next
4 participants