[FEA] Hash-based groupby should perform multiple aggregations in a single pass #543

jrhemstad · 2018-12-17T19:37:31Z

Is your feature request related to a problem? Please describe.
In the existing hash-based groupby implementation in libcudf, when a cuDF user requests multiple aggregations be done on the same set of key columns, each groupby aggregation is performed independently and then concatenated together. Furthermore, in order to be able to concat the results together, the output of each independent groupby operation must be sorted such that all the results are in the same order.

This is detrimental to performance for at least two reasons:

The multiple passes through the data are redundant
The sort is unnecessary overhead

Describe the solution you'd like
The hash-based groupby implementation in libcudf should be updated such that it can perform an arbitrary number of aggregation operations with only a single pass through all of the key and value columns.

jrhemstad · 2019-06-21T22:47:46Z

Done by #1478

jrhemstad added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Dec 17, 2018

jrhemstad self-assigned this Dec 17, 2018

jrhemstad mentioned this issue Dec 17, 2018

[FEA] Implement null support for groupby #544

Closed

thomcom mentioned this issue Jan 3, 2019

[BUG] Groupby aggregations are slow #533

Closed

jrhemstad mentioned this issue May 2, 2019

Single pass hash groupby #1478

Merged

4 tasks

jrhemstad closed this as completed Jun 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Hash-based groupby should perform multiple aggregations in a single pass #543

[FEA] Hash-based groupby should perform multiple aggregations in a single pass #543

jrhemstad commented Dec 17, 2018

jrhemstad commented Jun 21, 2019

[FEA] Hash-based groupby should perform multiple aggregations in a single pass #543

[FEA] Hash-based groupby should perform multiple aggregations in a single pass #543

Comments

jrhemstad commented Dec 17, 2018

jrhemstad commented Jun 21, 2019