Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized dask interface groupby implementation #991

merged 1 commit into from Nov 29, 2016


None yet
3 participants
Copy link

philippjfr commented Nov 29, 2016

The groupby implementation now no longer loads all the columns that are being grouped over. Instead it now uses the categories or unique values in a column for the 1D case and uses the itertuple method otherwise to accumulate the unique indices without loading the whole column at once.

@philippjfr philippjfr added the data label Nov 29, 2016

@philippjfr philippjfr force-pushed the dask_groupby_opt branch from 68fc4bb to 1478820 Nov 29, 2016


This comment has been minimized.

Copy link

jbednar commented Nov 29, 2016

Sounds good to me!


This comment has been minimized.

Copy link

jlstevens commented Nov 29, 2016

Looks good and as this is a new datatype, there are no backwards compatibility implications to worry about. Merging.

@jlstevens jlstevens merged commit bc61b75 into master Nov 29, 2016

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
continuous-integration/travis-ci/push The Travis CI build passed
coverage/coveralls Coverage decreased (-0.01%) to 75.751%
s3-reference-data-cache Test data is cached.

@philippjfr philippjfr deleted the dask_groupby_opt branch Dec 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.