# Enhancing performance of Pandas

This demo is adapted from the example in [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html). We will investigate how to speed up certain functions operating on pandas DataFrames using three different techniques: Cython, Numba, Pythran (via `transonic`). 

In [None]:
import numpy as np
import pandas as pd

df = pd.DataFrame({'a': np.random.randn(1000),
                  'b': np.random.randn(1000),
                  'N': np.random.randint(100, 1000, (1000)),
                  'x': 'x'})

Here's the function in pure Python:

In [None]:
from transonic import jit

def f(x):
    return x * (x - 1)

def integrate_f(a, b, N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
       s += f(a + i * dx)
    return s * dx


# JIT functions for later use
# Note: f(x) will be automatically included in the modules
integrate_f_cython = jit(backend="cython")(integrate_f)
integrate_f_numba = jit(backend="numba")(integrate_f)
integrate_f_pythran = jit(backend="pythran")(integrate_f)

We achieve our result by using apply (row-wise):

In [None]:
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)

Let's take a look and see where the time is spent during this operation (limited to the most time consuming four calls) using the prun ipython magic function:

In [None]:
%prun -l 4 df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)

## Enter: `transonic`

In [None]:
from transonic import wait_for_all_extensions
from transonic.util import print_versions, timeit_verbose

print_versions()

## Cython + transonic

In [None]:
# warmup
df.apply(lambda x: integrate_f_cython(x['a'], x['b'], x['N']), axis=1)
wait_for_all_extensions()

# benchmark
%timeit df.apply(lambda x: integrate_f_cython(x['a'], x['b'], x['N']), axis=1)

## Numba + transonic

In [None]:
# warmup
df.apply(lambda x: integrate_f_numba(x['a'], x['b'], x['N']), axis=1)
wait_for_all_extensions()

# benchmark
%timeit df.apply(lambda x: integrate_f_numba(x['a'], x['b'], x['N']), axis=1)

## Pythran + transonic

In [None]:
# warmup
df.apply(lambda x: integrate_f_pythran(x['a'], x['b'], x['N']), axis=1)
wait_for_all_extensions()

# benchmark
%timeit df.apply(lambda x: integrate_f_pythran(x['a'], x['b'], x['N']), axis=1)

## Cython + types + transonic

In [None]:
%%file _pandas_cython_boost.py
from transonic import boost

@boost(backend="cython", inline=True)
def f_typed(x: float):
    return x * (x - 1)


@boost(backend="cython")
def integrate_f_typed(a: float, b: float, N: int):
    i: int
    s: float
    dx: float = (b - a) / N
    s = 0
    for i in range(N):
        s += f_typed(a + i * dx)
    return s * dx

In [None]:
!transonic -b cython _pandas_cython_boost.py

In [None]:
from transonic import set_compile_at_import, wait_for_all_extensions

set_compile_at_import(True)

In [None]:
from _pandas_cython_boost import integrate_f_typed

wait_for_all_extensions()

# benchmark
%timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)