Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for STRUCT input to groupby #9024

Merged
merged 4 commits into from
Aug 26, 2021

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Aug 12, 2021

This commit adds support for STRUCT columns in groupby. This should now allow for groupby aggregations to work when any of the grouping columns are STRUCT, including nested STRUCTS.

Note: List columns are still not supported on groupby, even as members of STRUCT columns, at any level of nesting. Only STRUCT, STRUCT<STRUCT>, etc. are currently supported.

Depends on #8956 (i.e. unflatten_nested_columns()).

@mythrocks mythrocks self-assigned this Aug 12, 2021
@mythrocks mythrocks requested a review from a team as a code owner August 12, 2021 05:59
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Aug 12, 2021
@mythrocks mythrocks requested review from devavret and removed request for nvdbaranec August 12, 2021 06:00
@mythrocks
Copy link
Contributor Author

Note: This PR includes the changes in #8956 as well. The actual change will be well smaller, once #8956 is merged.

@mythrocks mythrocks added feature request New feature or request improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed feature request New feature or request labels Aug 12, 2021
@codecov
Copy link

codecov bot commented Aug 12, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@0ad36ff). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head faf1f10 differs from pull request most recent head 23e1ad2. Consider uploading reports for the commit 23e1ad2 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #9024   +/-   ##
===============================================
  Coverage                ?   10.76%           
===============================================
  Files                   ?      114           
  Lines                   ?    19086           
  Branches                ?        0           
===============================================
  Hits                    ?     2055           
  Misses                  ?    17031           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ad36ff...23e1ad2. Read the comment docs.

@vyasr
Copy link
Contributor

vyasr commented Aug 12, 2021

Note: This PR includes the changes in #8956 as well. The actual change will be well smaller, once #8956 is merged.

👍 I'll probably hold off reviewing until #8956 is merged then.

@mythrocks mythrocks force-pushed the groupby-structs branch 2 times, most recently from 5d46bd2 to b47ae4a Compare August 12, 2021 21:24
@mythrocks
Copy link
Contributor Author

rerun tests

This commit adds support for `STRUCT` columns in `groupby`. This should now allow for `groupby` aggregations to work when any of the grouping columns are `STRUCT`, including nested `STRUCTS`.

Note: `List` columns are still not supported on `groupby`, even as members of `STRUCT` columns, at any level of nesting. Only `STRUCT`, `STRUCT<STRUCT>`, etc. are currently supported.

Depends on rapidsai#8956 (i.e. `unflatten_nested_columns()`).
@mythrocks
Copy link
Contributor Author

mythrocks commented Aug 20, 2021

👍 I'll probably hold off reviewing until #8956 is merged then.

@vyasr, @devavret: Thank you for your patience. #8956 is merged. I have rebased to latest. This PR should now be in shape for a review.

Also, fixed Copyright date.
@jrhemstad
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d9d565e into rapidsai:branch-21.10 Aug 26, 2021
rapids-bot bot pushed a commit that referenced this pull request Sep 15, 2021
Fixes #8887.

#9024 added support for `STRUCT` groupby keys.
This commit adds a test for `grouped_rolling_window()` where the groupby keys are `STRUCT`.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - David Wendt (https://github.com/davidwendt)

URL: #9228
@mythrocks mythrocks deleted the groupby-structs branch September 16, 2021 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants