[FEA] Hash-based groupby should perform multiple aggregations in a single pass #543
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Python
Affects Python cuDF API.
Is your feature request related to a problem? Please describe.
In the existing hash-based groupby implementation in libcudf, when a cuDF user requests multiple aggregations be done on the same set of key columns, each groupby aggregation is performed independently and then concatenated together. Furthermore, in order to be able to concat the results together, the output of each independent groupby operation must be sorted such that all the results are in the same order.
This is detrimental to performance for at least two reasons:
Describe the solution you'd like
The hash-based groupby implementation in libcudf should be updated such that it can perform an arbitrary number of aggregation operations with only a single pass through all of the key and value columns.
The text was updated successfully, but these errors were encountered: