Skip to content

Conversation

@patrick-schultz
Copy link
Member

@patrick-schultz patrick-schultz commented Aug 10, 2023

fixes #13407

CHANGELOG: Resolves #13407 in which uses of union_rows could reduce parallelism to one partition resulting in severely degraded performance.

TableUnion was always collapsing to a single partition when the key was empty. This adds a special case handling, which just concatenates partitions.

The body of the resulting TableStage is a little hacky: it does a StreamMultiMerge, but where exactly one input stream is non-empty. I think that should have fine performance, and I didn’t see any simpler ways to do it.

TableUnion was always collapsing to a single partition when the key was empty. This adds a special case handling, which just concatenates partitions.

The body of the resulting TableStage is a little hacky: it does a StreamMultiMerge, but where exactly one input stream is non-empty. I think that should have fine performance, and I didn’t see any simpler ways to do it.
@patrick-schultz patrick-schultz force-pushed the fix-unkeyed-table-union branch from c058d54 to 6c9cf8c Compare August 10, 2023 20:56
@danking
Copy link
Contributor

danking commented Aug 15, 2023

bump @chrisvittal

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my comment, I think I understand this enough to feel confident approving.

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving

@danking danking merged commit 8fb7d90 into hail-is:main Aug 16, 2023
@patrick-schultz patrick-schultz deleted the fix-unkeyed-table-union branch April 11, 2024 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[query] aggregation of split_multi collapses to one partition

3 participants