Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf regression in 0.13+ for df.apply #6013

Closed
8one6 opened this issue Jan 20, 2014 · 1 comment · Fixed by #6024
Closed

Perf regression in 0.13+ for df.apply #6013

8one6 opened this issue Jan 20, 2014 · 1 comment · Fixed by #6024
Labels
Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@8one6
Copy link

8one6 commented Jan 20, 2014

On this page:
http://pandas.pydata.org/pandas-docs/stable/enhancingperf.html#enhancingperf

Right after In[11]: displays timing results, the text suggests we have seen a 10x speedup compared with the original code. It looks like the original code runs in 336ms/loop. And the code executed in In[11] executes in 105ms/loop.

So only 3x speedup, right?

(Edit) And then later on, after In[14], the text suggests a 3x speedup but that run seems to take execution from 105ms/loop down to 2.5ms/loop = 40x speedup...

@ghost
Copy link

ghost commented Jan 20, 2014

Good catch, this might be another case of a known regression in the perf of apply in 0.13.

In [4]: %load_ext cythonmagic

In [10]: %%cython
   ...:    ....: cdef double f_typed(double x) except? -2:
   ...:    ....:     return x * (x - 1)
   ...:    ....: cpdef double integrate_f_typed(double a, double b, int N):
   ...:    ....:     cdef int i
   ...:    ....:     cdef double s, dx
   ...:    ....:     s = 0
   ...:    ....:     dx = (b - a) / N
   ...:    ....:     for i in range(N):
   ...:    ....:         s += f_typed(a + i * dx)
   ...:    ....:     return s * dx
   ...: 

in 0.13.0-246-g1e1907c:

In [7]: df = DataFrame({'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000)), 'x': 'x'})
In [8]: %timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)
10 loops, best of 3: 116 ms per loop

But in 0.12:

In [5]: df = DataFrame({'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000)), 'x': 'x'})
In [6]: %timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)
10 loops, best of 3: 19.6 ms per loop

@jreback , is that apply perf hit here to stay?

related #5654, #5656 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant