[FEA] support min and max group by aggregations and reductions on lists of structs and strings #10408

revans2 · 2022-03-10T16:11:32Z

Is your feature request related to a problem? Please describe.
This is specifically for NVIDIA/spark-rapids#4929 for a customer query.

Describe the solution you'd like
Group by and reduction aggregations for min/max that support lists of structs and lists of strings. Ideally we could solve ordering in general like #5890 is trying to do. This would then be a follow on PR to reuse the row comparison code for min/max in this case.

If more information is needed like null ordering etc I can provide it.

Describe alternatives you've considered
Just ugly hacks that we should not do.

devavret · 2022-03-10T16:55:34Z

Can you explain what ordering would look like on a list of struct? Say I have

[{1, b}, {1, b}]
[{1, a}, {2, b}]

Would we compare the first child and then the second child? meaning the order of comparisons is 1, 1, a, b vs 1, 2, b, b and so idx 0 < idx 1

Or would we compare in this order 1, b, 1, b vs 1, a, 2, b meaning idx 1 < idx 0.

revans2 · 2022-03-10T17:18:06Z

It is the second one 1, b, 1, b vs 1, a, 2, b. But more generally sorting a list in Spark looks at each element in the list starting at the first and only if they are the same does it look at the next elements to break the tie.

null == null in sorting elements, but null < non-null

If one list is longer than another list, then it shorter list is less than the longer list if and only if all of the elements in the shorter list match those in the longer list up to the length of the shorter list.

The code here is for sorting list/arrays in Spark

https://github.com/apache/spark/blob/a26c01d85035b487e2048f1c106b14b89455e2d9/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ArrayType.scala#L116-L148

If you want some examples I can some up with a few.

ttnghia · 2022-03-10T21:59:29Z

@devavret I suppose to work on this, if nobody has been assigned.

jrhemstad · 2022-03-10T22:51:29Z

@ttnghia please hold off. This requires further discussion.

github-actions · 2022-04-09T23:03:06Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

sameerz · 2022-05-03T00:15:17Z

Sill needed

github-actions · 2022-06-02T01:32:18Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

sameerz · 2022-06-02T17:57:53Z

Still needed.

ttnghia · 2022-06-03T18:14:32Z

Depends on #11129.

github-actions · 2022-10-10T17:16:24Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

…uction (#13676) This adds support for list type in `min` and `max` aggregations in groupby and reduction contexts. Closes #13667 and closes #10408. Status: * [X] Implementation. * [X] Unit tests. * [X] Run `compute-sanitizer`. * [X] Test with spark-rapids (NVIDIA/spark-rapids#8689). Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Divye Gala (https://github.com/divyegala) - MithunR (https://github.com/mythrocks) URL: #13676

revans2 added feature request New feature or request Needs Triage Need team to review and classify labels Mar 10, 2022

revans2 mentioned this issue Mar 10, 2022

[FEA] Support min/max aggregation/reduction for arrays of structs and arrays of strings NVIDIA/spark-rapids#4929

Closed

github-actions bot added this to Needs prioritizing in Feature Planning Mar 10, 2022

revans2 changed the title ~~[FEA] support min and max group by aggregations and reductions on lists of structs~~ [FEA] support min and max group by aggregations and reductions on lists of structs and strings Mar 10, 2022

ttnghia self-assigned this Mar 10, 2022

devavret mentioned this issue Mar 21, 2022

[FEA] Story - Supporting row operators on nested types #10186

Closed

github-actions bot added the inactive-30d label Apr 9, 2022

github-actions bot removed the inactive-30d label May 3, 2022

github-actions bot added the inactive-30d label Jun 2, 2022

github-actions bot removed the inactive-30d label Jun 2, 2022

GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jun 28, 2022

bdice mentioned this issue Jul 7, 2022

Proposal for lexicographic comparators for lists of structs #11222

Closed

GregoryKimball mentioned this issue Oct 4, 2022

[FEA] Implement full support for nested types #11844

Closed

github-actions bot added the inactive-90d label Oct 10, 2022

GregoryKimball removed the inactive-90d label Apr 3, 2023

shwina mentioned this issue Apr 24, 2023

[FEA] Support aggregations/scans on lists via groupby #13208

Open

ttnghia mentioned this issue Jun 12, 2023

[FEA] Fully support nested types in Spark SQL functions NVIDIA/spark-rapids#8550

Open

ttnghia mentioned this issue Jul 8, 2023

Support min and max aggregations for list type in groupby and reduction #13676

Merged

4 tasks

rapids-bot bot closed this as completed in #13676 Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] support min and max group by aggregations and reductions on lists of structs and strings #10408

[FEA] support min and max group by aggregations and reductions on lists of structs and strings #10408

revans2 commented Mar 10, 2022 •

edited

devavret commented Mar 10, 2022

revans2 commented Mar 10, 2022

ttnghia commented Mar 10, 2022

jrhemstad commented Mar 10, 2022

github-actions bot commented Apr 9, 2022

sameerz commented May 3, 2022

github-actions bot commented Jun 2, 2022

sameerz commented Jun 2, 2022

ttnghia commented Jun 3, 2022 •

edited

github-actions bot commented Oct 10, 2022

[FEA] support min and max group by aggregations and reductions on lists of structs and strings #10408

[FEA] support min and max group by aggregations and reductions on lists of structs and strings #10408

Comments

revans2 commented Mar 10, 2022 • edited

devavret commented Mar 10, 2022

revans2 commented Mar 10, 2022

ttnghia commented Mar 10, 2022

jrhemstad commented Mar 10, 2022

github-actions bot commented Apr 9, 2022

sameerz commented May 3, 2022

github-actions bot commented Jun 2, 2022

sameerz commented Jun 2, 2022

ttnghia commented Jun 3, 2022 • edited

github-actions bot commented Oct 10, 2022

revans2 commented Mar 10, 2022 •

edited

ttnghia commented Jun 3, 2022 •

edited