Skip to content

[vds] Add support for VDSes with truncated reference blocks#12645

Merged
danking merged 11 commits intohail-is:mainfrom
tpoterba:vds-point-query-improvement
Feb 6, 2023
Merged

[vds] Add support for VDSes with truncated reference blocks#12645
danking merged 11 commits intohail-is:mainfrom
tpoterba:vds-point-query-improvement

Conversation

@tpoterba
Copy link
Copy Markdown
Contributor

@tpoterba tpoterba commented Feb 2, 2023

CHANGELOG: introduce hl.vds.truncate_reference_blocks to permit faster point queries against Hail VariantDatasets. Remove ref_allele as a required field in reference data.

  • add hl.vds.truncate_reference_blocks
  • add hl.vds.merge_reference_blocks
  • update hl.vds.filter_intervals to use this

* add `hl.vds.truncate_reference_blocks`
* add `hl.vds.merge_reference_blocks`
Comment thread hail/python/hail/vds/methods.py Outdated

Notes
-----
After this function has been run, the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete

Copy link
Copy Markdown
Collaborator

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are the lint errors

Comment thread hail/python/hail/expr/expressions/typed_expressions.py Outdated
Comment thread hail/python/hail/vds/methods.py Outdated
if ref_allele_function is None:
rg = ht.locus.dtype.reference_genome
if 'ref_allele' in ht.row:
ref_allele_function = lambda ht: ht.ref_allele
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either ignore lint or use a def

Comment thread hail/python/hail/vds/methods.py Outdated
if 'ref_allele' in ht.row:
ref_allele_function = lambda ht: ht.ref_allele
elif rg.has_sequence():
ref_allele_function = lambda ht: ht.locus.sequence_context()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either ignore lint or use a def

Comment thread hail/python/hail/vds/methods.py Outdated
Comment thread hail/python/hail/vds/methods.py Outdated
Comment thread hail/python/hail/vds/methods.py
Comment thread hail/python/hail/vds/methods.py Outdated
Comment on lines +1096 to +1099
ht = ht.annotate(prev_block=hl.zip(hl.scan.array_agg(lambda elt: hl.scan.fold((hl.null(rd.entry.dtype), False),
lambda acc: keep_last(acc, (
elt, False)),
keep_last), ht.entries), ht.entries)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation here

Comment thread hail/python/hail/vds/methods.py Outdated
Comment thread hail/python/hail/vds/variant_dataset.py Outdated
all_ref_max = n_with_ref_max_len == len(mts)

# if some mts have max ref len but not all, drop it
new_ref_block_len_max = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Comment thread hail/python/hail/vds/combiner/combine.py Outdated
@tpoterba tpoterba force-pushed the vds-point-query-improvement branch from 5f360c4 to fedb013 Compare February 3, 2023 17:02
@chrisvittal
Copy link
Copy Markdown
Collaborator

We may want to note a change to ref_allele behavior in the changelog as well.

@danking danking merged commit ac1c604 into hail-is:main Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants