Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized dask interface groupby implementation #991

Merged
merged 1 commit into from Nov 29, 2016
Merged

Conversation

@philippjfr
Copy link
Member

@philippjfr philippjfr commented Nov 29, 2016

The groupby implementation now no longer loads all the columns that are being grouped over. Instead it now uses the categories or unique values in a column for the 1D case and uses the itertuple method otherwise to accumulate the unique indices without loading the whole column at once.

@philippjfr philippjfr force-pushed the dask_groupby_opt branch from 68fc4bb to 1478820 Nov 29, 2016
@jbednar
Copy link
Member

@jbednar jbednar commented Nov 29, 2016

Sounds good to me!

@jlstevens
Copy link
Contributor

@jlstevens jlstevens commented Nov 29, 2016

Looks good and as this is a new datatype, there are no backwards compatibility implications to worry about. Merging.

@jlstevens jlstevens merged commit bc61b75 into master Nov 29, 2016
4 checks passed
4 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.01%) to 75.751%
Details
@philippjfr
s3-reference-data-cache Test data is cached.
Details
@philippjfr philippjfr deleted the dask_groupby_opt branch Dec 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants