Add scan_aggregation and reduce_aggregation derived types. #10357

nvdbaranec · 2022-02-25T21:26:01Z

This PR adds the scan_aggregation and reduce_aggregation derived types. With it, all concrete aggregation types are now derived from algorithmic specific subtypes.

nvdbaranec · 2022-02-25T21:28:38Z

Pinging @jrhemstad.

@karthikeyann, there are some comments in the existing python aggregations that I wasn't sure about:

cudf/python/cudf/cudf/_lib/aggregation.pyx

Line 227 in 900d55c

# TODO: update this after adding per algorithm aggregation derived types

If there's anything I can do in this PR to address them, let me know.

codecov · 2022-02-25T23:51:47Z

Codecov Report

Merging #10357 (6f940fd) into branch-22.04 (b613394) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@               Coverage Diff                @@
##           branch-22.04   #10357      +/-   ##
================================================
+ Coverage         86.13%   86.16%   +0.02%     
================================================
  Files               139      139              
  Lines             22460    22457       -3     
================================================
+ Hits              19347    19351       +4     
+ Misses             3113     3106       -7

Impacted Files	Coverage Δ
python/cudf/cudf/core/column/column.py	`89.16% <100.00%> (+0.17%)`	⬆️
python/cudf/cudf/core/dataframe.py	`93.57% <100.00%> (ø)`
python/cudf/cudf/core/frame.py	`91.72% <100.00%> (ø)`
python/cudf/cudf/core/single_column_frame.py	`97.01% <100.00%> (ø)`
python/cudf/cudf/core/column/string.py	`88.39% <0.00%> (+0.12%)`	⬆️
python/cudf/cudf/core/tools/datetimes.py	`84.49% <0.00%> (+0.30%)`	⬆️
python/cudf/cudf/core/groupby/groupby.py	`91.92% <0.00%> (+0.43%)`	⬆️
python/cudf/cudf/core/column/lists.py	`90.56% <0.00%> (+0.47%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0f7fe6...6f940fd. Read the comment docs.

bdice

Comments attached. Reading issue #7106 helped me understand a lot more of this PR. A cross-link to that related issue would be nice to have in the PR description for future readers.

cpp/include/cudf/detail/aggregation/aggregation.hpp

cpp/src/aggregation/aggregation.cpp

cpp/tests/reductions/reduction_tests.cpp

java/src/main/native/src/ColumnViewJni.cpp

python/cudf/cudf/_lib/aggregation.pyx

karthikeyann · 2022-02-28T12:24:48Z

@nvdbaranec
cumsum, cummin, cummax were aliases for scan operations for sum, min, and max respectively when single Aggregation class was present.
since per algorithm (reduction, scan, group_scan, rolling) is implemented, cumsum, cummin, cummax should not exist.
They should be removed.
For eg. wherever ScanAggregation::cumsum() is called, it should be replaced by ScanAggregation::sum()

cpp/src/aggregation/aggregation.cpp

python/cudf/cudf/_lib/aggregation.pyx

cpp/include/cudf/aggregation.hpp

nvdbaranec · 2022-03-07T16:00:47Z

@nvdbaranec cumsum, cummin, cummax were aliases for scan operations for sum, min, and max respectively when single Aggregation class was present. since per algorithm (reduction, scan, group_scan, rolling) is implemented, cumsum, cummin, cummax should not exist. They should be removed. For eg. wherever ScanAggregation::cumsum() is called, it should be replaced by ScanAggregation::sum()

In the interest of keeping this PR down in size, I'll do this work as a second PR. This PR is already a prereq for other high priority work (implementing percentile_approx as a reduction).

nvdbaranec · 2022-03-07T16:29:18Z

Added an issue for followup (assigned to me)

#10394

revans2

From the java perspective this looks fine to me.

cpp/tests/reductions/reduction_tests.cpp

bdice

One fix for consistency -- otherwise LGTM. Thanks!

cpp/tests/reductions/reduction_tests.cpp

karthikeyann

Looks good. 👍
Just couple of cpp suggestions.

cpp/include/cudf/aggregation.hpp

cpp/src/aggregation/aggregation.cpp

nvdbaranec · 2022-03-11T22:54:30Z

@gpucibot merge

Fixes benchmarks compile errors introduced by #10357 Example: ``` /cudf/cpp/benchmarks/reduction/reduce.cpp: In function ‘void BM_reduction(benchmark::State&, const std::unique_ptr<cudf::aggregation>&)’: /cudf/cpp/benchmarks/reduction/reduce.cpp:52:46: error: invalid initialization of reference of type ‘const std::unique_ptr<cudf::reduce_aggregation>&’ from expression of type ‘const std::unique_ptr<cudf::aggregation>’ 52 | auto result = cudf::reduce(input_column, agg, output_dtype); ``` Aggregation types for reduce and scan were modified to include template types. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/nvdbaranec URL: #10428

nvdbaranec added 4 commits February 22, 2022 11:44

Add scan_aggregation and reduce_aggregations. C++ side only.

245e68c

Java bindings.

c884d5c

Merge branch 'branch-22.04' into scan_reduce_aggregations

321c9b2

Python bindings.

900d55c

nvdbaranec added libcudf Affects libcudf (C++/CUDA) code. cuDF (Python) Affects Python cuDF API. cuDF (Java) Affects Java cuDF API. improvement Improvement / enhancement to an existing function breaking Breaking change labels Feb 25, 2022

nvdbaranec requested review from a team as code owners February 25, 2022 21:26

nvdbaranec added this to PR-WIP in v22.04 Release via automation Feb 25, 2022

nvdbaranec requested review from bdice and rgsl888prabhu February 25, 2022 21:26

Copyright updates.

0398a0d

bdice requested changes Feb 26, 2022

View reviewed changes

v22.04 Release automation moved this from PR-WIP to PR-Needs review Feb 26, 2022

karthikeyann reviewed Feb 28, 2022

View reviewed changes

cpp/src/aggregation/aggregation.cpp Show resolved Hide resolved

python/cudf/cudf/_lib/aggregation.pyx Show resolved Hide resolved

python/cudf/cudf/_lib/aggregation.pyx Show resolved Hide resolved

jrhemstad reviewed Mar 1, 2022

View reviewed changes

cpp/include/cudf/aggregation.hpp Show resolved Hide resolved

PR review comments.

a3a71b8

nvdbaranec requested review from bdice and karthikeyann March 7, 2022 16:22

nvdbaranec mentioned this pull request Mar 7, 2022

[FEA] Cleanup of unneeded functions in Python aggregation types. #10394

Open

Formatting

56a6c0f

revans2 approved these changes Mar 7, 2022

View reviewed changes

bdice reviewed Mar 7, 2022

View reviewed changes

cpp/tests/reductions/reduction_tests.cpp Show resolved Hide resolved

cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved

cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved

Clean up some test code.

e693562

nvdbaranec requested a review from bdice March 9, 2022 16:09

rgsl888prabhu approved these changes Mar 9, 2022

View reviewed changes

bdice approved these changes Mar 9, 2022

View reviewed changes

cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved

v22.04 Release automation moved this from PR-Needs review to PR-Reviewer approved Mar 9, 2022

nvdbaranec added 2 commits March 9, 2022 15:56

Small test tweak.

23cae44

Merge branch 'branch-22.04' into scan_reduce_aggregations

6f940fd

karthikeyann reviewed Mar 11, 2022

View reviewed changes

cpp/include/cudf/aggregation.hpp Show resolved Hide resolved

cpp/src/aggregation/aggregation.cpp Show resolved Hide resolved

nvdbaranec requested a review from karthikeyann March 11, 2022 20:48

rapids-bot bot merged commit b1ea304 into rapidsai:branch-22.04 Mar 11, 2022

v22.04 Release automation moved this from PR-Reviewer approved to Done Mar 11, 2022

davidwendt mentioned this pull request Mar 14, 2022

Fix benchmarks to work with new aggregation types #10428

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scan_aggregation and reduce_aggregation derived types. #10357

Add scan_aggregation and reduce_aggregation derived types. #10357

nvdbaranec commented Feb 25, 2022

nvdbaranec commented Feb 25, 2022

codecov bot commented Feb 25, 2022 •

edited

bdice left a comment

karthikeyann commented Feb 28, 2022

nvdbaranec commented Mar 7, 2022

nvdbaranec commented Mar 7, 2022 •

edited

revans2 left a comment

bdice left a comment

karthikeyann left a comment

nvdbaranec commented Mar 11, 2022

Add scan_aggregation and reduce_aggregation derived types. #10357

Add scan_aggregation and reduce_aggregation derived types. #10357

Conversation

nvdbaranec commented Feb 25, 2022

nvdbaranec commented Feb 25, 2022

codecov bot commented Feb 25, 2022 • edited

Codecov Report

bdice left a comment

Choose a reason for hiding this comment

karthikeyann commented Feb 28, 2022

nvdbaranec commented Mar 7, 2022

nvdbaranec commented Mar 7, 2022 • edited

revans2 left a comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

karthikeyann left a comment

Choose a reason for hiding this comment

nvdbaranec commented Mar 11, 2022

codecov bot commented Feb 25, 2022 •

edited

nvdbaranec commented Mar 7, 2022 •

edited