-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] support max and min aggregations for nested structs #8974
Comments
This issue should cover the previous one: #7995 |
… aggregate() and scan() (#9545) This PR adds struct support for `min`, `max`, `argmin` and `argmax` in groupby aggregation, and `min`, `max` in groupby scan. Although these operations require implementation for both hash-based and sort-based approaches, this PR only adds struct support into the sort-based approach due to the complexity of the hash-based implementation. Struct support for the hash-based approach will be future work. Partially addresses #8974 and #7995. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - David Wendt (https://github.com/davidwendt) - Jake Hemstad (https://github.com/jrhemstad) URL: #9545
This PR continues to address #8974, adding support for structs in `min` and `max` reduction. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Mark Harris (https://github.com/harrism) - https://github.com/nvdbaranec URL: #9697
This PR continues to address #8974, adding support for structs in `min` and `max` inclusive scan. Exclusive scan support is not needed in the near future. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - David Wendt (https://github.com/davidwendt) - https://github.com/nvdbaranec URL: #9725
Currently, there are two more places we need to support struct
In these places, currently the internal functions call a kernel very early, which prohibits injecting any additional host function call. As such, supporting structs in those functions requires refactoring the entire computation pipeline. Thus, supporting structs in them will be postponed until such refactoring is done. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
) This PR adds support for `min` and `max` operations in rolling window for STRUCT type. It also does some minor modifications to the existing code, such as renaming some variables and refining some comments. Partially addresses #8974. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - David Wendt (https://github.com/davidwendt) URL: #10332
@revans2 is this issue solved? |
No this is not resolved, pending the unfinished task above. |
Yes it looks like we have full support now. |
@ttnghia I will file a follow on issue for the hash based min/max group by. |
#17241 is what I filed as a follow on issue for performance. From a usability standpoint this what we have works today. |
Is your feature request related to a problem? Please describe.
In the Spark plugin we have a push to try and support as much of structs as we can. Cudf supports sorting of nested structs (no lists just basic types and other structs). It would really be great if we could support max and min aggregations on these as well. There are a lot of different types of max/min aggregations and if we cannot get them all at once we can take it a bit at a time too. We would like to be able to support this for.
like described in #8964 null child column values would come before non-null child column values.
Describe the solution you'd like
Just what I asked for min/max aggregations that can work on structs.
Describe alternatives you've considered
I'm not sure there is an alternative that we can support.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
The text was updated successfully, but these errors were encountered: