Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vds] Add support for VDSes with truncated reference blocks #12645

Merged
merged 11 commits into from
Feb 6, 2023

Conversation

tpoterba
Copy link
Contributor

@tpoterba tpoterba commented Feb 2, 2023

CHANGELOG: introduce hl.vds.truncate_reference_blocks to permit faster point queries against Hail VariantDatasets. Remove ref_allele as a required field in reference data.

  • add hl.vds.truncate_reference_blocks
  • add hl.vds.merge_reference_blocks
  • update hl.vds.filter_intervals to use this

* add `hl.vds.truncate_reference_blocks`
* add `hl.vds.merge_reference_blocks`

Notes
-----
After this function has been run, the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete

Copy link
Collaborator

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are the lint errors

hail/python/hail/expr/expressions/typed_expressions.py Outdated Show resolved Hide resolved
if ref_allele_function is None:
rg = ht.locus.dtype.reference_genome
if 'ref_allele' in ht.row:
ref_allele_function = lambda ht: ht.ref_allele
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either ignore lint or use a def

if 'ref_allele' in ht.row:
ref_allele_function = lambda ht: ht.ref_allele
elif rg.has_sequence():
ref_allele_function = lambda ht: ht.locus.sequence_context()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either ignore lint or use a def

hail/python/hail/vds/methods.py Outdated Show resolved Hide resolved
hail/python/hail/vds/methods.py Outdated Show resolved Hide resolved
hail/python/hail/vds/methods.py Outdated Show resolved Hide resolved
Comment on lines 1096 to 1099
ht = ht.annotate(prev_block=hl.zip(hl.scan.array_agg(lambda elt: hl.scan.fold((hl.null(rd.entry.dtype), False),
lambda acc: keep_last(acc, (
elt, False)),
keep_last), ht.entries), ht.entries)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation here

hail/python/hail/vds/methods.py Outdated Show resolved Hide resolved
all_ref_max = n_with_ref_max_len == len(mts)

# if some mts have max ref len but not all, drop it
new_ref_block_len_max = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

hail/python/hail/vds/combiner/combine.py Outdated Show resolved Hide resolved
@tpoterba tpoterba force-pushed the vds-point-query-improvement branch from 5f360c4 to fedb013 Compare February 3, 2023 17:02
@chrisvittal
Copy link
Collaborator

We may want to note a change to ref_allele behavior in the changelog as well.

@danking danking merged commit ac1c604 into hail-is:main Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants