Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HivePartitionFunction #314

Conversation

mbasmanova
Copy link
Contributor

HivePartitionFunction is compatible with Hive bucketing and allows to
efficiently join bucketed and non-bucketed tables on bucket-by join keys by
partitioning non-bucketed table into partitions made of whole buckets.

This commits includes support for BIGINT and VARCHAR types. Future
commits will add support for other types.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 27, 2021
@mbasmanova mbasmanova marked this pull request as ready for review September 27, 2021 08:18
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 27, 2021
Summary:
HivePartitionFunction is compatible with Hive bucketing and allows to
efficiently join bucketed and non-bucketed tables on bucket-by join keys by
partitioning non-bucketed table into partitions made of whole buckets.

This commits includes support for BIGINT and VARCHAR types. Future
commits will add support for other types.

Pull Request resolved: facebookincubator#314

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Deployed custom build to t10_3 and replayed 100 baymax queries that used to fail with !leftKeys_.empty() HashJoinNode requires at least one join key

No crashes and 73 queries succeeded. The remaining errors are known issues like T100597839

https://fburl.com/unidash/9ol7pic2

Reviewed By: oerling

Differential Revision: D31201253

Pulled By: mbasmanova

fbshipit-source-id: 6ea0508a9be1838e942c55934af137bd65a0710a
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31201253

Summary:
Introduce CrossJoinNode plan node to specify a cross join in a query plan. Cross
join takes two plan nodes for the left and right sides and an output type which
allows to reorder the input columns.

A query plan with a cross join executes using two pipelines: (1) build pipeline
processes the right side data, collects all data in a list of vectors and makes
it available to the probe pipeline; (2) probe pipeline processes the data on
the left side in a streaming fashion and combines it with the right side data.

CrossJoinNode is translated into two operators: CrossJoinProbe and
CrossJoinBuild. CrossJoinProbe operator becomes part of the probe pipeline.
CrossJoinBuild operator is installed as the last operator of the build side
pipeline. The output of the CrossJoinBuild operator is a list of build side
vectors which CrossJoinProbe operator gets access to via CrossJoinBridge.

CrossJoinProbe wraps probe and build side vectors in dictionaries to represent
repeated values without copying.

Pull Request resolved: facebookincubator#282

Differential Revision: D31141606

Pulled By: mbasmanova

fbshipit-source-id: 9c916721b93fd3767913455ff6d73eb0dae81f11
…ubator#311)

Summary:
Extract PartitionFunction logic from PartitionedOutput to allow for different
partition functions. An upcoming change will introduce partition function that
uses Hive bucketing. Partitioning using Hive bucketing is useful when joining
bucketed table on a bucket-by key with a non-bucketed table.

Pull Request resolved: facebookincubator#311

Differential Revision: D31200618

Pulled By: mbasmanova

fbshipit-source-id: 0b08fd7bb313e6755a0e6d9c016e3d41cafc4d11
Summary:
HivePartitionFunction is compatible with Hive bucketing and allows to
efficiently join bucketed and non-bucketed tables on bucket-by join keys by
partitioning non-bucketed table into partitions made of whole buckets.

This commits includes support for BIGINT and VARCHAR types. Future
commits will add support for other types.

Pull Request resolved: facebookincubator#314

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Deployed custom build to t10_3 and replayed 100 baymax queries that used to fail with !leftKeys_.empty() HashJoinNode requires at least one join key

No crashes and 73 queries succeeded. The remaining errors are known issues like T100597839

https://fburl.com/unidash/9ol7pic2

Reviewed By: oerling

Differential Revision: D31201253

Pulled By: mbasmanova

fbshipit-source-id: b7437472c247fbf02641d9559a5272aea2923770
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31201253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants