Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: DataFrame reductions with axis=None and EA dtypes #54308

Merged
merged 3 commits into from Jul 31, 2023

Conversation

lukemanley
Copy link
Member

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(1000, 1000), dtype="float64[pyarrow]")

%timeit df.min(axis=None)

316 ms ± 52.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)     <- main
196 ms ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)     <- this PR
22.9 ms ± 1.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- this PR + #54299 (pending PR)

Existing ASV:

       before           after         ratio
     [aefede93]       [bf708351]
     <main>           <perf-ea-axis-none-reductions>
-      8.83±0.7ms       7.03±0.3ms     0.80  stat_ops.FrameOps.time_op('std', 'Int64', None)
-        99.6±4ms       13.7±0.3ms     0.14  stat_ops.FrameOps.time_op('skew', 'Int64', None)
-        96.7±4ms       13.2±0.3ms     0.14  stat_ops.FrameOps.time_op('kurt', 'Int64', None)
-        107±10ms       12.9±0.3ms     0.12  stat_ops.FrameOps.time_op('median', 'Int64', None)
-        63.9±2ms       6.01±0.8ms     0.09  stat_ops.FrameOps.time_op('mean', 'Int64', None)

@lukemanley lukemanley added Performance Memory or execution speed performance ExtensionArray Extending pandas with custom dtypes or arrays. Reduction Operations sum, mean, min, max, etc. labels Jul 29, 2023
@lukemanley lukemanley added this to the 2.1 milestone Jul 29, 2023
@@ -1883,7 +1884,10 @@ def test_reduction_axis_none_returns_scalar(method, numeric_only):
tm.assert_almost_equal(result, expected)
else:
expected = getattr(np, method)(np_arr, axis=None)
assert result == expected
if dtype == "float64":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, I removed it. I had been testing with pyarrow types and reductions of pyarrow types can have small floating point differences vs numpy.

@mroeschke mroeschke merged commit 069554a into pandas-dev:main Jul 31, 2023
36 of 37 checks passed
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the perf-ea-axis-none-reductions branch August 1, 2023 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants