Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: ArrowExtensionArray comparison methods #50524

Merged
merged 2 commits into from
Jan 3, 2023

Conversation

lukemanley
Copy link
Member

Perf improvement in ArrowExtensionArray comparison methods when array contains nulls. Improvement is from avoiding conversion to object array.

import pandas as pd
import numpy as np

data = np.random.randn(10**6)
data[0] = np.nan

arr = pd.array(data, dtype="float64[pyarrow]")

%timeit arr > 0

# 80.5 ms ± 619 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)    <- main
# 4.05 ms ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  <- PR

@lukemanley lukemanley added Performance Memory or execution speed performance Arrow pyarrow functionality labels Jan 2, 2023
@mroeschke mroeschke added this to the 2.0 milestone Jan 3, 2023
@mroeschke mroeschke merged commit d555c1d into pandas-dev:main Jan 3, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the arrow-ea-cmp-perf branch January 19, 2023 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants