Skip to content

Commit

Permalink
DOC: Replace .values with .to_numpy() in enhancingperf (#26313)
Browse files Browse the repository at this point in the history
* DOC: Replace .values with .to_numpy() 

As suggested in #24807 (comment)

Replace `.values` with `.to_numpy()` in the benchmark demonstration code.
  • Loading branch information
huizew authored and gfyoung committed May 8, 2019
1 parent 6d7ba05 commit 7bfbd81
Showing 1 changed file with 13 additions and 6 deletions.
19 changes: 13 additions & 6 deletions doc/source/user_guide/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -234,14 +234,18 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra

.. code-block:: ipython
In [4]: %timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
In [4]: %timeit apply_integrate_f(df['a'].to_numpy(),
df['b'].to_numpy(),
df['N'].to_numpy())
1000 loops, best of 3: 1.25 ms per loop
We've gotten another big improvement. Let's check again where the time is spent:

.. ipython:: python
%prun -l 4 apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
%prun -l 4 apply_integrate_f(df['a'].to_numpy(),
df['b'].to_numpy(),
df['N'].to_numpy())
As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
so if we wanted to make anymore efficiencies we must continue to concentrate our
Expand Down Expand Up @@ -286,7 +290,9 @@ advanced Cython techniques:

.. code-block:: ipython
In [4]: %timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
In [4]: %timeit apply_integrate_f_wrap(df['a'].to_numpy(),
df['b'].to_numpy(),
df['N'].to_numpy())
1000 loops, best of 3: 987 us per loop
Even faster, with the caveat that a bug in our Cython code (an off-by-one error,
Expand Down Expand Up @@ -349,8 +355,9 @@ take the plain Python code from above and annotate with the ``@jit`` decorator.
def compute_numba(df):
result = apply_integrate_f_numba(df['a'].values, df['b'].values,
df['N'].values)
result = apply_integrate_f_numba(df['a'].to_numpy(),
df['b'].to_numpy(),
df['N'].to_numpy())
return pd.Series(result, index=df.index, name='result')
Note that we directly pass NumPy arrays to the Numba function. ``compute_numba`` is just a wrapper that provides a
Expand Down Expand Up @@ -394,7 +401,7 @@ Consider the following toy example of doubling each observation:
1000 loops, best of 3: 233 us per loop
# Custom function with numba
In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.values)
In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy())
1000 loops, best of 3: 145 us per loop
Caveats
Expand Down

0 comments on commit 7bfbd81

Please sign in to comment.