Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Hash-based groupby should perform multiple aggregations in a single pass #543

Closed
jrhemstad opened this issue Dec 17, 2018 · 1 comment
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@jrhemstad
Copy link
Contributor

Is your feature request related to a problem? Please describe.
In the existing hash-based groupby implementation in libcudf, when a cuDF user requests multiple aggregations be done on the same set of key columns, each groupby aggregation is performed independently and then concatenated together. Furthermore, in order to be able to concat the results together, the output of each independent groupby operation must be sorted such that all the results are in the same order.

This is detrimental to performance for at least two reasons:

  1. The multiple passes through the data are redundant
  2. The sort is unnecessary overhead

Describe the solution you'd like
The hash-based groupby implementation in libcudf should be updated such that it can perform an arbitrary number of aggregation operations with only a single pass through all of the key and value columns.

@jrhemstad jrhemstad added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Dec 17, 2018
@jrhemstad jrhemstad self-assigned this Dec 17, 2018
@jrhemstad jrhemstad mentioned this issue May 2, 2019
4 tasks
@jrhemstad
Copy link
Contributor Author

Done by #1478

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

1 participant