diff --git a/doc/source/user_guide/enhancingperf.rst b/doc/source/user_guide/enhancingperf.rst index 9941ffcc9de4d..525f9abb1d1ae 100644 --- a/doc/source/user_guide/enhancingperf.rst +++ b/doc/source/user_guide/enhancingperf.rst @@ -234,14 +234,18 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra .. code-block:: ipython - In [4]: %timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values) + In [4]: %timeit apply_integrate_f(df['a'].to_numpy(), + df['b'].to_numpy(), + df['N'].to_numpy()) 1000 loops, best of 3: 1.25 ms per loop We've gotten another big improvement. Let's check again where the time is spent: .. ipython:: python - %prun -l 4 apply_integrate_f(df['a'].values, df['b'].values, df['N'].values) + %prun -l 4 apply_integrate_f(df['a'].to_numpy(), + df['b'].to_numpy(), + df['N'].to_numpy()) As one might expect, the majority of the time is now spent in ``apply_integrate_f``, so if we wanted to make anymore efficiencies we must continue to concentrate our @@ -286,7 +290,9 @@ advanced Cython techniques: .. code-block:: ipython - In [4]: %timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values) + In [4]: %timeit apply_integrate_f_wrap(df['a'].to_numpy(), + df['b'].to_numpy(), + df['N'].to_numpy()) 1000 loops, best of 3: 987 us per loop Even faster, with the caveat that a bug in our Cython code (an off-by-one error, @@ -349,8 +355,9 @@ take the plain Python code from above and annotate with the ``@jit`` decorator. def compute_numba(df): - result = apply_integrate_f_numba(df['a'].values, df['b'].values, - df['N'].values) + result = apply_integrate_f_numba(df['a'].to_numpy(), + df['b'].to_numpy(), + df['N'].to_numpy()) return pd.Series(result, index=df.index, name='result') Note that we directly pass NumPy arrays to the Numba function. ``compute_numba`` is just a wrapper that provides a @@ -394,7 +401,7 @@ Consider the following toy example of doubling each observation: 1000 loops, best of 3: 233 us per loop # Custom function with numba - In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.values) + In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy()) 1000 loops, best of 3: 145 us per loop Caveats