# Enhancing performance of Pandas

This demo is adapted from the example in [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html). We will investigate how to speed up certain functions operating on pandas DataFrames using three different techniques: Cython, Numba, Pythran (via `transonic`). 

In [1]:
import numpy as np
import pandas as pd

df = pd.DataFrame({'a': np.random.randn(1000),
                  'b': np.random.randn(1000),
                  'N': np.random.randint(100, 1000, (1000)),
                  'x': 'x'})

Here's the function in pure Python:

In [18]:
from transonic import jit

def f(x):
   return x * (x - 1)

def integrate_f(a, b, N):
   s = 0
   dx = (b - a) / N
   for i in range(N):
       s += f(a + i * dx)
   return s * dx


# Jit functions for later use
f_cython = jit(backend="cython")(f)
f_numba = jit(backend="numba")(f)
f_pythran = jit(backend="pythran")(f)

integrate_f_cython = jit(backend="cython")(integrate_f)
integrate_f_numba = jit(backend="numba")(integrate_f)
integrate_f_pythran = jit(backend="pythran")(integrate_f)

We achieve our result by using apply (row-wise):

In [19]:
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)

85.2 ms ± 1.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Let's take a look and see where the time is spent during this operation (limited to the most time consuming four calls) using the prun ipython magic function:

In [20]:
%prun -l 4 df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)

 

         659732 function calls (654706 primitive calls) in 0.151 seconds

   Ordered by: internal time
   List reduced from 217 to 4 due to restriction <4>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.073    0.000    0.107    0.000 <ipython-input-18-33c8deee669b>:6(integrate_f)
   539827    0.035    0.000    0.035    0.000 <ipython-input-18-33c8deee669b>:3(f)
     3000    0.005    0.000    0.029    0.000 base.py:4702(get_value)
     3000    0.003    0.000    0.033    0.000 series.py:1068(__getitem__)

## Enter: `transonic`

In [21]:
from transonic import wait_for_all_extensions
from transonic.util import print_versions, timeit_verbose

print_versions()

Transonic 0.4.2
Pythran 0.9.4post0
Numba 0.46.0
Cython 0.29.14


## Cython + transonic

In [23]:
# warmup
df.apply(lambda x: integrate_f_cython(x['a'], x['b'], x['N']), axis=1)
wait_for_all_extensions()

# benchmark
%timeit df.apply(lambda x: integrate_f_cython(x['a'], x['b'], x['N']), axis=1)

INFO: Schedule cythonization of file /home/avmo/.transonic/cython/__jit__/__ipython__7c98424550fb42df9527dd5a5f550c17/integrate_f.py


compile extension
61.5 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Numba + transonic

In [25]:
# warmup
df.apply(lambda x: integrate_f_numba(x['a'], x['b'], x['N']), axis=1)
wait_for_all_extensions()

# benchmark
%timeit df.apply(lambda x: integrate_f_numba(x['a'], x['b'], x['N']), axis=1)

20.9 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Pythran + transonic

In [26]:
# warmup
df.apply(lambda x: integrate_f_pythran(x['a'], x['b'], x['N']), axis=1)
wait_for_all_extensions()

# benchmark
%timeit df.apply(lambda x: integrate_f_pythran(x['a'], x['b'], x['N']), axis=1)

INFO: Schedule pythranization of file /home/avmo/.transonic/pythran/__jit__/__ipython__7c98424550fb42df9527dd5a5f550c17/integrate_f.py


compile extension
20.9 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
