-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HivePartitionFunction #314
Closed
mbasmanova
wants to merge
3
commits into
facebookincubator:main
from
mbasmanova:hive-compatible-partitioning
Closed
Add HivePartitionFunction #314
mbasmanova
wants to merge
3
commits into
facebookincubator:main
from
mbasmanova:hive-compatible-partitioning
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Sep 27, 2021
mbasmanova
force-pushed
the
hive-compatible-partitioning
branch
from
September 27, 2021 08:17
0af3806
to
e3ca3d6
Compare
@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
mbasmanova
added a commit
to mbasmanova/velox-1
that referenced
this pull request
Sep 27, 2021
Summary: HivePartitionFunction is compatible with Hive bucketing and allows to efficiently join bucketed and non-bucketed tables on bucket-by join keys by partitioning non-bucketed table into partitions made of whole buckets. This commits includes support for BIGINT and VARCHAR types. Future commits will add support for other types. Pull Request resolved: facebookincubator#314 Test Plan: Imported from GitHub, without a `Test Plan:` line. Deployed custom build to t10_3 and replayed 100 baymax queries that used to fail with !leftKeys_.empty() HashJoinNode requires at least one join key No crashes and 73 queries succeeded. The remaining errors are known issues like T100597839 https://fburl.com/unidash/9ol7pic2 Reviewed By: oerling Differential Revision: D31201253 Pulled By: mbasmanova fbshipit-source-id: 6ea0508a9be1838e942c55934af137bd65a0710a
mbasmanova
force-pushed
the
hive-compatible-partitioning
branch
from
September 27, 2021 23:14
e3ca3d6
to
1c02956
Compare
This pull request was exported from Phabricator. Differential Revision: D31201253 |
Summary: Introduce CrossJoinNode plan node to specify a cross join in a query plan. Cross join takes two plan nodes for the left and right sides and an output type which allows to reorder the input columns. A query plan with a cross join executes using two pipelines: (1) build pipeline processes the right side data, collects all data in a list of vectors and makes it available to the probe pipeline; (2) probe pipeline processes the data on the left side in a streaming fashion and combines it with the right side data. CrossJoinNode is translated into two operators: CrossJoinProbe and CrossJoinBuild. CrossJoinProbe operator becomes part of the probe pipeline. CrossJoinBuild operator is installed as the last operator of the build side pipeline. The output of the CrossJoinBuild operator is a list of build side vectors which CrossJoinProbe operator gets access to via CrossJoinBridge. CrossJoinProbe wraps probe and build side vectors in dictionaries to represent repeated values without copying. Pull Request resolved: facebookincubator#282 Differential Revision: D31141606 Pulled By: mbasmanova fbshipit-source-id: 9c916721b93fd3767913455ff6d73eb0dae81f11
…ubator#311) Summary: Extract PartitionFunction logic from PartitionedOutput to allow for different partition functions. An upcoming change will introduce partition function that uses Hive bucketing. Partitioning using Hive bucketing is useful when joining bucketed table on a bucket-by key with a non-bucketed table. Pull Request resolved: facebookincubator#311 Differential Revision: D31200618 Pulled By: mbasmanova fbshipit-source-id: 0b08fd7bb313e6755a0e6d9c016e3d41cafc4d11
Summary: HivePartitionFunction is compatible with Hive bucketing and allows to efficiently join bucketed and non-bucketed tables on bucket-by join keys by partitioning non-bucketed table into partitions made of whole buckets. This commits includes support for BIGINT and VARCHAR types. Future commits will add support for other types. Pull Request resolved: facebookincubator#314 Test Plan: Imported from GitHub, without a `Test Plan:` line. Deployed custom build to t10_3 and replayed 100 baymax queries that used to fail with !leftKeys_.empty() HashJoinNode requires at least one join key No crashes and 73 queries succeeded. The remaining errors are known issues like T100597839 https://fburl.com/unidash/9ol7pic2 Reviewed By: oerling Differential Revision: D31201253 Pulled By: mbasmanova fbshipit-source-id: b7437472c247fbf02641d9559a5272aea2923770
mbasmanova
force-pushed
the
hive-compatible-partitioning
branch
from
September 27, 2021 23:33
1c02956
to
24650e9
Compare
This pull request was exported from Phabricator. Differential Revision: D31201253 |
yma11
pushed a commit
to yma11/velox
that referenced
this pull request
Jun 16, 2023
PHILO-HE
pushed a commit
to PHILO-HE/velox
that referenced
this pull request
Jun 27, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
HivePartitionFunction is compatible with Hive bucketing and allows to
efficiently join bucketed and non-bucketed tables on bucket-by join keys by
partitioning non-bucketed table into partitions made of whole buckets.
This commits includes support for BIGINT and VARCHAR types. Future
commits will add support for other types.