Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Performance regression in Groupby.apply with group_keys=True #53195

Merged
merged 1 commit into from May 12, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented May 12, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

I came accross this bottleneck through the changed default of group_keys. Technically this is not a regression, but it's visible to all users that don't set group_keys explicitly. This change provides around a 20-30% speedup in groupby apply when you have many groups

@phofl phofl added Groupby Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 12, 2023
@phofl phofl added this to the 2.0.2 milestone May 12, 2023
@mroeschke mroeschke merged commit c5524e4 into pandas-dev:main May 12, 2023
40 checks passed
@mroeschke
Copy link
Member

Nice find. Thanks @phofl

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request May 12, 2023
@phofl phofl deleted the 53195 branch May 12, 2023 17:08
phofl added a commit that referenced this pull request May 12, 2023
…roupby.apply with group_keys=True) (#53202)

Backport PR #53195: PERF: Performance regression in Groupby.apply with group_keys=True

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
@rhshadrach
Copy link
Member

Technically this is not a regression, but it's visible to all users that don't set group_keys explicitly.

Just want to make sure I understand this - group_keys=False was much faster than group_keys=True, and this PR improves the group_keys=True case. Is that right?

@phofl
Copy link
Member Author

phofl commented May 16, 2023

Yep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants