Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: pd.eval when using mixed types #5485

Closed
jreback opened this issue Nov 10, 2013 · 3 comments
Closed

PERF: pd.eval when using mixed types #5485

jreback opened this issue Nov 10, 2013 · 3 comments
Labels
expressions pd.eval, query Performance Memory or execution speed performance

Comments

@jreback
Copy link
Contributor

jreback commented Nov 10, 2013

In [11]: df = pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1), "C": np.random.randn(1000000)})

In [13]: %timeit pd.eval('df*df')
1 loops, best of 3: 635 ms per loop

In [14]: df = df.astype(float)

In [15]: %timeit pd.eval('df*df')
100 loops, best of 3: 5.87 ms per loop

the time is all in _interleave as we need to pass values to numexpr

@cpcloud
Copy link
Member

cpcloud commented Nov 10, 2013

How could this be fixed? I don't see how we can get around a copy being made to unify the type to pass to numexpr.

@jreback
Copy link
Contributor Author

jreback commented Nov 10, 2013

not sure
maybe if u detect only numeric then could just either use python or astype first
in this case the expression is quite simple

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 1, 2015
@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@jbrockmendel jbrockmendel added the expressions pd.eval, query label Oct 22, 2019
@mroeschke mroeschke removed the Internals Related to non-user accessible pandas implementation label Apr 25, 2020
@mroeschke mroeschke removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Apr 11, 2021
@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
@mroeschke
Copy link
Member

Looks like the perf is pretty close now so closing

In [6]: In [11]: df = pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1), "C": np.random.randn(1000000)})

In [7]: %timeit pd.eval('df*df')
39.1 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [8]: df = df.astype(float)

In [9]: %timeit pd.eval('df*df')
34.9 ms ± 259 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions pd.eval, query Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

5 participants