-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nested types as groupby keys in libcudf #11792
Support nested types as groupby keys in libcudf #11792
Conversation
Codecov ReportBase: 87.47% // Head: 88.10% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #11792 +/- ##
================================================
+ Coverage 87.47% 88.10% +0.62%
================================================
Files 133 135 +2
Lines 21826 22133 +307
================================================
+ Hits 19093 19500 +407
+ Misses 2733 2633 -100
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@PointKernel as we saw yesterday, when attempting to do a groupby-agg with a list column as keys, we see the following error: RuntimeError: cuDF failure at: cudf/cpp/src/groupby/hash/groupby.cu:661: Null keys of nested type cannot be excluded. This is because the default option in Python is |
IIRC, The error message was added to align with Spark's behavior thus we don't have to throw the error in libcudf. I will just remove it and see if anyone complains. |
0ce0f55
to
d8b4317
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see my comment regarding ordering of outputs in the test addressed. However, given the tight time constraints (code freeze starts tomorrow) I don't want to block this PR on that. If you have time to make that change in this PR that's great, but if not please file it away for future work. Thanks!
@gpucibot merge |
Related to #8039
This PR replaces old row operators in list groupby with new ones thus nested types like lists and structs can be used as groupby keys in libcudf. It partially fixes the issue and comprehensive python refactoring requires non-trivial future work. It is a breaking change since libcudf will no longer throw an error when nulls are excluded and groupby keys are of nested types. Comprehensive tests for complex nested types like structs of lists of lists of structs depend on #11222.
Closes #10181
Description
Checklist