PERF: Series.any #52341

jbrockmendel · 2023-04-01T03:58:09Z

import numpy as np
import pandas as pd

s = pd.Series(np.random.randint(0, 2, 100000)).astype(bool) 

In [8]: %timeit s.any(skipna=True)                                                                     
19.5 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)  # <- main
7.71 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)  # <- PR

In [10]: %timeit s.values.any() 
3.14 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In the OP for #26032 the Series.any was 223x slower than the ndarray.any. In main it is 6.2x and this gets it down to 2.5x. Let's look at what's left:

In [21]: %prun -s cumtime for n in range(10000): s.any(skipna=True)
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.010    0.000    0.182    0.000 generic.py:11509(any)
    10000    0.009    0.000    0.171    0.000 generic.py:11208(any)
    10000    0.026    0.000    0.163    0.000 generic.py:11165(_logical_func)
    10000    0.020    0.000    0.126    0.000 series.py:4530(_reduce)
    10000    0.016    0.000    0.094    0.000 nanops.py:487(nanany)
    10000    0.006    0.000    0.046    0.000 {method 'any' of 'numpy.ndarray' objects}
    10000    0.004    0.000    0.040    0.000 _methods.py:55(_any)
    10000    0.036    0.000    0.036    0.000 {method 'reduce' of 'numpy.ufunc' objects}
    10000    0.012    0.000    0.022    0.000 nanops.py:259(_get_values)
    20000    0.010    0.000    0.014    0.000 common.py:1093(needs_i8_conversion)
    10000    0.005    0.000    0.008    0.000 series.py:726(_values)
    30000    0.006    0.000    0.006    0.000 {built-in method builtins.isinstance}
    10000    0.004    0.000    0.006    0.000 _validators.py:224(validate_bool_kwarg)

We have a few layers in between the Series.any call and nanops.nanany that we could refactor away. These add up to .01+.009+.026+.02=.065 seconds here, about 34% of the total runtime.

The ufunc_config+seterr calls add up to .177s here. Disabling the with np.errstate(all="ignore") in Series._reduce cuts the timeit result down to 9us. It also doesn't break any tests or surface any warnings in the tests.Update: did this after determining that we already set errstate within the relevant nanops functions.

_get_values is still ~~more than half~~update:a quarter of nanany, ~~and more than half of _get_values is in extract_array. According to the annotation the extract_array is unnecessary (though it is necessary for the doctests and a handful of tests in test_nanops).~~update: the extract_array has now been removed.

So there is still some room for optimization, but some of it would be more invasive than this. I would be +1 on making all these optimizations, will wait for a little buy-in first though.

mroeschke · 2023-04-03T18:19:28Z

Thanks @jbrockmendel

* PERF: Series.any * optimize * remove unnecessary np.errstate

jbrockmendel added 3 commits March 31, 2023 19:52

PERF: Series.any

ddf974b

optimize

b5c82c9

Merge branch 'main' into perf-26032

b345959

jbrockmendel added Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc. labels Apr 2, 2023

jbrockmendel added 2 commits April 1, 2023 18:37

remove unnecessary np.errstate

fe0b664

Merge branch 'main' into perf-26032

720aa23

mroeschke approved these changes Apr 3, 2023

View reviewed changes

mroeschke added this to the 2.1 milestone Apr 3, 2023

mroeschke merged commit 1209f27 into pandas-dev:main Apr 3, 2023

jbrockmendel deleted the perf-26032 branch April 3, 2023 19:04

topper-123 pushed a commit to topper-123/pandas that referenced this pull request Apr 6, 2023

PERF: Series.any (pandas-dev#52341)

88b78f6

* PERF: Series.any * optimize * remove unnecessary np.errstate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Series.any #52341

PERF: Series.any #52341

jbrockmendel commented Apr 1, 2023 •

edited

Loading

mroeschke commented Apr 3, 2023

PERF: Series.any #52341

PERF: Series.any #52341

Conversation

jbrockmendel commented Apr 1, 2023 • edited Loading

mroeschke commented Apr 3, 2023

jbrockmendel commented Apr 1, 2023 •

edited

Loading