Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java utilities to aid in accelerating aggregations on 128-bit types #10201

Merged
merged 2 commits into from
Feb 4, 2022

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Feb 2, 2022

This adds a couple of custom kernels for Java to help accelerate sum aggregations on 128-bit types and check for overflows. The first kernel extracts a 32-bit chunk from an 128-bit type which can be used to feed four 32-bit chunks into a sum aggregation. The second kernel takes the resulting upscaled 64-bit integer results and reassembles the parts into a 128-bit type column along with a boolean column to indicate whether the value overflowed.

By splitting the 128-bit type into 32-bit chunks, a sum aggregation on DECIMAL128 which is a sort-based aggregation can be turned into a hash-based aggregation on 32-bit integer inputs for improved performance. As a bonus, this approach can also check for overflow which is difficult to do when aggregating on DECIMAL128 sums directly.

@jlowe jlowe added Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS 4 - Needs cuDF (Java) Reviewer improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 2, 2022
@jlowe jlowe requested a review from a team as a code owner February 2, 2022 23:02
@jlowe jlowe self-assigned this Feb 2, 2022
@jlowe jlowe added this to PR-WIP in v22.04 Release via automation Feb 2, 2022
@github-actions github-actions bot added the CMake CMake build issue label Feb 2, 2022
@codecov
Copy link

codecov bot commented Feb 3, 2022

Codecov Report

Merging #10201 (c214cf4) into branch-22.04 (a7d88cd) will increase coverage by 0.05%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10201      +/-   ##
================================================
+ Coverage         10.42%   10.47%   +0.05%     
================================================
  Files               119      122       +3     
  Lines             20603    20501     -102     
================================================
  Hits               2148     2148              
+ Misses            18455    18353     -102     
Impacted Files Coverage Δ
python/cudf/cudf/core/_base_index.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/categorical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/datetime.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/numerical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/timedelta.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column_accessor.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <ø> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update deb902a...c214cf4. Read the comment docs.

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, but I would like someone with more C++ to review it too

@jlowe jlowe requested a review from jrhemstad February 3, 2022 20:34
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the C++ stuff.

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The java side and as much as I can understand of the C++

v22.04 Release automation moved this from PR-WIP to PR-Reviewer approved Feb 3, 2022
@jlowe jlowe added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuDF (Java) Reviewer labels Feb 3, 2022
@jlowe
Copy link
Member Author

jlowe commented Feb 4, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 4e8cb4f into rapidsai:branch-22.04 Feb 4, 2022
v22.04 Release automation moved this from PR-Reviewer approved to Done Feb 4, 2022
@jlowe jlowe deleted the aggutil128 branch February 4, 2022 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants