Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Groupby #659

Merged
merged 9 commits into from Jun 16, 2019

Conversation

@devin-petersohn
Copy link
Member

commented Jun 7, 2019

What do these changes do?

Related issue number

  • passes flake8 modin
  • passes black --check modin
  • tests added and passing
@codecov

This comment has been minimized.

Copy link

commented Jun 7, 2019

Codecov Report

Merging #659 into master will decrease coverage by 0.01%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #659      +/-   ##
==========================================
- Coverage    90.5%   90.49%   -0.02%     
==========================================
  Files          37       37              
  Lines        5604     5658      +54     
==========================================
+ Hits         5072     5120      +48     
- Misses        532      538       +6
Impacted Files Coverage Δ
modin/engines/base/frame/partition_manager.py 80.6% <100%> (+0.29%) ⬆️
...gines/ray/pandas_on_ray/frame/partition_manager.py 100% <100%> (ø) ⬆️
modin/pandas/dataframe.py 89.52% <75%> (-0.14%) ⬇️
modin/backends/pandas/query_compiler.py 92.22% <81.81%> (-0.1%) ⬇️
modin/pandas/groupby.py 88.74% <93.1%> (+0.37%) ⬆️
modin/pandas/__init__.py 84.61% <0%> (-0.5%) ⬇️
modin/pandas/series.py 91.52% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 565ce87...05ff363. Read the comment docs.

devin-petersohn added some commits Jun 13, 2019

@devin-petersohn devin-petersohn changed the title [WIP] Improve performance of Groupby Improve performance of Groupby Jun 13, 2019

@williamma12
Copy link
Collaborator

left a comment

Looks good! just some left over comments

modin/pandas/dataframe.py Outdated Show resolved Hide resolved
modin/pandas/dataframe.py Outdated Show resolved Hide resolved

devin-petersohn added some commits Jun 14, 2019

@@ -203,6 +203,22 @@ def full_reduce(self, map_func, reduce_func, axis):
"""
raise NotImplementedError("Blocked on Distributed Series")

def groupby_reduce(self, axis, by, map_func, reduce_func):
p = np.squeeze(by.partitions)

This comment has been minimized.

Copy link
@williamma12

williamma12 Jun 14, 2019

Collaborator

let's rename the variable p to something else? maybe by_partitions?

return map_func(df, other)

map_func = ray.put(map_func)
p = np.squeeze(by.partitions)

This comment has been minimized.

Copy link
@williamma12

williamma12 Jun 14, 2019

Collaborator

same as above

modin/engines/base/frame/partition_manager.py Outdated Show resolved Hide resolved

devin-petersohn added some commits Jun 15, 2019

@williamma12
Copy link
Collaborator

left a comment

Looks good, @devin-petersohn!

@williamma12 williamma12 merged commit 6bec6c5 into modin-project:master Jun 16, 2019

1 of 3 checks passed

codecov/patch 90% of diff hit (target 90.5%)
Details
codecov/project 90.49% (-0.02%) compared to 565ce87
Details
Travis CI - Pull Request Build Passed
Details
@modin-bot

This comment has been minimized.

Copy link

commented Jun 19, 2019

This pull request has been mentioned on Modin Discuss. There might be relevant details there:

https://discuss.modin.org/t/modin-0-5-3-release-notes/66/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.