Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query/vds] Rework VDS combiner so that it can run via Query on Batch #13206

Merged
merged 15 commits into from Jul 5, 2023

Conversation

chrisvittal
Copy link
Collaborator

@chrisvittal chrisvittal commented Jun 22, 2023

Key changes:

  • Remove old VCF combiner
  • Add StreamZipJoinProducers an IR that takes an array, and a function from array.elementType to stream and zip joins the result of calling that function on each member of the array.
  • Combine Table._generate and this new stream zip operation to rewrite the gvcf merge stage of the vds combiner in O(1) IR

@chrisvittal chrisvittal changed the title Combiner on QoB [query/vds] Combiner on QoB Jun 30, 2023
@chrisvittal chrisvittal changed the title [query/vds] Combiner on QoB [query/vds] Rework VDS combiner so that it can run via Query on Batch Jul 1, 2023
Copy link
Collaborator Author

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of the change (not my work), looks good to me.

The stream zip code duplication is present and will be handled in the future.

@danking danking merged commit 8d6b171 into hail-is:main Jul 5, 2023
8 checks passed
danking pushed a commit that referenced this pull request Jul 27, 2023
#13313)

On the current release (0.2.119), `is.hail.rvd.RVD.localSort` attempts
to serialize an entire `is.hail.rvd.RVD` object when called, which
causes some operations, like reading a set of VCF files, to fail. That
particular operation no longer fails due to some combination of the
changes made to reading VCF files in
#13206 and
#13229, but the serialization could
still cause issues if `is.hail.rvd.RVD.localSort` is called in another
context.

See
https://hail.zulipchat.com/#narrow/stream/123011-Hail-Query-Dev/topic/Possible.20bug.20in.20Hail.200.2E2.2E119/near/378610151
for more information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants