Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add dictionary support to cudf::reduce #6666

Merged
merged 46 commits into from
Dec 1, 2020

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Nov 4, 2020

Reference #5963

This PR adds dictionary column type support to the set of cudf::reduce functions.
This PR depends on utilities added in PR #6651

Here are the reduce operations that will be included in this PR.

  • all
  • any
  • max
  • mean
  • median
  • min
  • nth_element
  • product
  • quantile
  • std
  • sum_of_squares
  • sum
  • unique_count
  • var

@davidwendt davidwendt self-assigned this Nov 4, 2020
@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. labels Nov 4, 2020
@davidwendt davidwendt added this to PR-WIP in v0.17 Release via automation Nov 4, 2020
@codecov
Copy link

codecov bot commented Nov 4, 2020

Codecov Report

Merging #6666 (95f02a7) into branch-0.17 (1771a8f) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.17    #6666   +/-   ##
============================================
  Coverage        81.94%   81.94%           
============================================
  Files               96       96           
  Lines            16164    16164           
============================================
  Hits             13246    13246           
  Misses            2918     2918           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1771a8f...95f02a7. Read the comment docs.

@davidwendt
Copy link
Contributor Author

Some of cudf::reduce operations uses cudf::quantile which does not support dictionary columns.
So, I'll have to update cudf::quantile before finishing this PR.

@davidwendt
Copy link
Contributor Author

davidwendt commented Nov 6, 2020

Ok, the last few builds have failed because they timed out (>40 minutes!). Adding dictionary to the simple reduce functions (min, max, sum, sum-of-squares, and product) increase their compile time from ~5 mins to ~18 mins. For now, I've included changes to remove the 2nd type-dispatch call. And the compile time for these files are now ~50 seconds. All tests are passing but benchmarks have not been checked. I will create an issue to discuss these changes and then will likely redo them in a separate PR.

@davidwendt
Copy link
Contributor Author

Created PR #6727 to fix the compile-time issues. Now this PR will become dependent on that one.

cpp/include/cudf/detail/quantile.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/detail/quantile.hpp Outdated Show resolved Hide resolved
cpp/src/reductions/compound.cuh Show resolved Hide resolved
cpp/src/reductions/mean.cu Show resolved Hide resolved
cpp/src/reductions/simple.cuh Show resolved Hide resolved
v0.17 Release automation moved this from PR-Needs review to PR-Reviewer approved Nov 23, 2020
@davidwendt davidwendt added 5 - Ready to Merge Testing and reviews complete, ready to merge 3 - Ready for Review Ready for review by team and removed 3 - Ready for Review Ready for review by team 5 - Ready to Merge Testing and reviews complete, ready to merge labels Nov 24, 2020
v0.17 Release automation moved this from PR-Reviewer approved to PR-Needs review Nov 28, 2020
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.
Few minor changes.

conda/recipes/libcudf/meta.yaml Show resolved Hide resolved
cpp/include/cudf/column/column_device_view.cuh Outdated Show resolved Hide resolved
cpp/src/reductions/reductions.cpp Outdated Show resolved Hide resolved
cpp/src/reductions/reductions.cpp Outdated Show resolved Hide resolved
cpp/src/reductions/simple.cuh Outdated Show resolved Hide resolved
cpp/src/reductions/nth_element.cu Outdated Show resolved Hide resolved
cpp/src/reductions/nth_element.cu Outdated Show resolved Hide resolved
cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved
v0.17 Release automation moved this from PR-Needs review to PR-Reviewer approved Dec 1, 2020
@karthikeyann karthikeyann added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Dec 1, 2020
@karthikeyann karthikeyann added the feature request New feature or request label Dec 1, 2020
@rapids-bot rapids-bot bot merged commit 1c81827 into rapidsai:branch-0.17 Dec 1, 2020
v0.17 Release automation moved this from PR-Reviewer approved to Done Dec 1, 2020
@davidwendt davidwendt deleted the dictionary-reduce branch December 2, 2020 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
No open projects
v0.17 Release
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants