-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: GroupBy().fillna() performance regression #37149
Conversation
an asv is appropriate here (or if we have one that covers just show the results) |
if not u can add one along the lines of what i posted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add an asv or show an existing one that is caught by this
No existing one seemed to catch it so added a new one.
(pandas-dev) C:\git\pandas-smithto1\asv_bench>python C:\anaconda3\envs\pandas-dev\Scripts\asv.exe continuous --quick -f 1.1 -b groupby.FillNA upstream/master HEAD
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls show the results of the asv
|
LGTM, thanks @smithto1 fixing the issue reported by me recently so swiftly |
@jreback can you have another look. |
thanks @smithto1 |
@meeseeksdev backport 1.1.x |
…37223) Co-authored-by: Thomas Smith <thomassmith0304@gmail.com>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
A performance regression was introduced with #30679 in handling grouping on a duplicate axis. The regression can be avoided by skipping the call to
get_indexer_non_unique
if the axis is unchanged (i.e. like it is infillna
).I don't know how to write a test for this fix since it is just an issue of speed. If there is a test pattern or other check that should be included please highlight and I'm happy to add it.
In lieu of a test, running the minimal example from the issue report on the fixed branch shows the performance fix: