-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN/PERF: Simplify argmin/argmax #58019
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks really nice
@@ -1066,7 +1066,7 @@ def test_idxmin(self, float_frame, int_frame, skipna, axis): | |||
frame.iloc[15:20, -2:] = np.nan | |||
for df in [frame, int_frame]: | |||
if (not skipna or axis == 1) and df is not int_frame: | |||
if axis == 1: | |||
if skipna: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we still be hitting this with the axis == 1
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When axis==1
we run into all NAs for a single row, but when skipna=False
we still raise the error message "Encountered an NA value with skipna=False" for performance - see the bottom half of #57971 (comment)
pandas/core/base.py
Outdated
@@ -733,12 +733,6 @@ def argmax( | |||
""" | |||
delegate = self._values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without using the delegate here and the nanops.nanargmax
branch below, I'm seeing a 50x perf regression in some ASVs, e.g. series_methods.NanOps.time_func('argmax', 1000000, 'int8')
:
N = 1000000
dtype = 'int8'
s = Series(np.ones(N), dtype=dtype)
s.argmax()
@jbrockmendel - perhaps that branch should be deeper in the code where the argmin
/argmax
implementation is for EAs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean if you do return return self.array.argmax(skipna=skipna)
without the EA check?
It looks like sorting.nanargminmax is just a lot slower than the nanops.arg(min|max). Best case would be to improve sorting.nanargminmax, but failing that I think making NumpyExtensionArray.argmax call the faster version would be viable (need to double-check they are behaviorally equivalent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean if you do return
return self.array.argmax(skipna=skipna)
without the EA check?
Correct.
Best case would be to improve sorting.nanargminmax, but failing that I think making NumpyExtensionArray.argmax call the faster version would be viable
Thanks - will give these a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into this, and I don't think either is viable. The main reason nanops.nanargmax
is faster than sorting.nanargminmax
is that the former only works for numeric dtypes by filling in NA values where appropriate (typically with -np.inf
) whereas for EAs (e.g. StringArray) we need to perform indexing operations to remove NA values from any consideration since there is no equivalent of -np.inf
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bummer. thanks for takinga look
Thanks @rhshadrach |
* CLN/PERF: Simplify argmin/argmax * More simplifications * Partial revert * Remove comments * fixups
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Ref: #57971 (comment)
ASVs against main:
ASVs against 2.2.x show no perf change.