CLN/PERF: Simplify argmin/argmax #58019

rhshadrach · 2024-03-26T21:47:08Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

ASVs against main:

| Change   | Before [b63ae8c7]    | After [46bbd5eb] <cln_argmin_argmax>   |   Ratio | Benchmark (Parameter)                                         |
|----------|----------------------|----------------------------------------|---------|---------------------------------------------------------------|
| -        | 184±1μs              | 138±3μs                                |    0.75 | series_methods.NanOps.time_func('argmax', 1000000, 'int32')   |
| -        | 14.4±0.1μs           | 10.5±0.2μs                             |    0.73 | series_methods.NanOps.time_func('argmax', 1000, 'float64')    |
| -        | 1.23±0.06ms          | 863±20μs                               |    0.7  | series_methods.NanOps.time_func('argmax', 1000000, 'float64') |
| -        | 77.9±0.6μs           | 40.8±0.7μs                             |    0.52 | series_methods.NanOps.time_func('argmax', 1000000, 'int8')    |
| -        | 7.42±0.2μs           | 2.44±0.06μs                            |    0.33 | series_methods.NanOps.time_func('argmax', 1000, 'int64')      |
| -        | 7.18±0.2μs           | 2.29±0.02μs                            |    0.32 | series_methods.NanOps.time_func('argmax', 1000, 'int32')      |
| -        | 7.25±0.1μs           | 2.29±0.03μs                            |    0.32 | series_methods.NanOps.time_func('argmax', 1000, 'int8')       |

ASVs against 2.2.x show no perf change.

WillAyd

Overall this looks really nice

pandas/core/indexes/base.py

WillAyd · 2024-03-26T22:09:16Z

pandas/tests/frame/test_reductions.py

@@ -1066,7 +1066,7 @@ def test_idxmin(self, float_frame, int_frame, skipna, axis):
        frame.iloc[15:20, -2:] = np.nan
        for df in [frame, int_frame]:
            if (not skipna or axis == 1) and df is not int_frame:
-                if axis == 1:
+                if skipna:


Shouldn't we still be hitting this with the axis == 1 case?

When axis==1 we run into all NAs for a single row, but when skipna=False we still raise the error message "Encountered an NA value with skipna=False" for performance - see the bottom half of #57971 (comment)

rhshadrach · 2024-03-26T22:30:21Z

pandas/core/base.py

@@ -733,12 +733,6 @@ def argmax(
        """
        delegate = self._values


Without using the delegate here and the nanops.nanargmax branch below, I'm seeing a 50x perf regression in some ASVs, e.g. series_methods.NanOps.time_func('argmax', 1000000, 'int8'):

N = 1000000 dtype = 'int8' s = Series(np.ones(N), dtype=dtype) s.argmax()

@jbrockmendel - perhaps that branch should be deeper in the code where the argmin/argmax implementation is for EAs?

Do you mean if you do return return self.array.argmax(skipna=skipna) without the EA check?

It looks like sorting.nanargminmax is just a lot slower than the nanops.arg(min|max). Best case would be to improve sorting.nanargminmax, but failing that I think making NumpyExtensionArray.argmax call the faster version would be viable (need to double-check they are behaviorally equivalent)

Do you mean if you do return return self.array.argmax(skipna=skipna) without the EA check?

Correct.

Best case would be to improve sorting.nanargminmax, but failing that I think making NumpyExtensionArray.argmax call the faster version would be viable

Thanks - will give these a shot.

Looked into this, and I don't think either is viable. The main reason nanops.nanargmax is faster than sorting.nanargminmax is that the former only works for numeric dtypes by filling in NA values where appropriate (typically with -np.inf) whereas for EAs (e.g. StringArray) we need to perform indexing operations to remove NA values from any consideration since there is no equivalent of -np.inf I think.

bummer. thanks for takinga look

mroeschke · 2024-04-01T18:13:21Z

Thanks @rhshadrach

* CLN/PERF: Simplify argmin/argmax * More simplifications * Partial revert * Remove comments * fixups

rhshadrach added 2 commits March 26, 2024 17:39

CLN/PERF: Simplify argmin/argmax

5cab58d

More simplifications

0517148

rhshadrach added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance Clean Reduction Operations sum, mean, min, max, etc. labels Mar 26, 2024

rhshadrach mentioned this pull request Mar 26, 2024

Potential regression induced by "CLN: Enforce deprecation of argmin/max and idxmin/max with NA values" #58013

Closed

WillAyd reviewed Mar 26, 2024

View reviewed changes

rhshadrach added 2 commits March 26, 2024 18:17

Partial revert

7ad5f94

Remove comments

c9bd1f8

rhshadrach commented Mar 26, 2024

View reviewed changes

fixups

46bbd5e

rhshadrach marked this pull request as ready for review March 30, 2024 13:14

rhshadrach requested review from jbrockmendel and WillAyd March 31, 2024 11:45

WillAyd approved these changes Apr 1, 2024

View reviewed changes

mroeschke approved these changes Apr 1, 2024

View reviewed changes

mroeschke added this to the 3.0 milestone Apr 1, 2024

mroeschke merged commit aad1136 into pandas-dev:main Apr 1, 2024
46 checks passed

rhshadrach deleted the cln_argmin_argmax branch April 2, 2024 02:19

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024

CLN/PERF: Simplify argmin/argmax (pandas-dev#58019)

2c30be4

* CLN/PERF: Simplify argmin/argmax * More simplifications * Partial revert * Remove comments * fixups

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN/PERF: Simplify argmin/argmax #58019

CLN/PERF: Simplify argmin/argmax #58019

rhshadrach commented Mar 26, 2024 •

edited

Loading

WillAyd left a comment

WillAyd Mar 26, 2024

rhshadrach Mar 26, 2024

rhshadrach Mar 26, 2024 •

edited

Loading

jbrockmendel Mar 28, 2024

rhshadrach Mar 29, 2024

rhshadrach Mar 30, 2024 •

edited

Loading

jbrockmendel Apr 1, 2024

mroeschke commented Apr 1, 2024

CLN/PERF: Simplify argmin/argmax #58019

CLN/PERF: Simplify argmin/argmax #58019

Conversation

rhshadrach commented Mar 26, 2024 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Mar 26, 2024

Choose a reason for hiding this comment

rhshadrach Mar 26, 2024

Choose a reason for hiding this comment

rhshadrach Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

jbrockmendel Mar 28, 2024

Choose a reason for hiding this comment

rhshadrach Mar 29, 2024

Choose a reason for hiding this comment

rhshadrach Mar 30, 2024 • edited Loading

Choose a reason for hiding this comment

jbrockmendel Apr 1, 2024

Choose a reason for hiding this comment

mroeschke commented Apr 1, 2024

rhshadrach commented Mar 26, 2024 •

edited

Loading

rhshadrach Mar 26, 2024 •

edited

Loading

rhshadrach Mar 30, 2024 •

edited

Loading