# How does Cython speed up Python?

## Reason 1: Interpreted -> Compiled

## Cython version of trivial function

In [None]:
%load_ext Cython

In [None]:
%%cython -n cyfoo

def cyfoo(a, b):
    return a + b

## Profiling

In [None]:
%timeit cyfoo(1, 2)

In [None]:
import sys
sys.modules['cyfoo']

In [None]:
print("Cython integer addition speedup: {:0.1f}%".format((112. - 79.) / 112. * 100))

In [None]:
%timeit cyfoo('a', 'b')

In [None]:
print("Cython string addition speedup: {:0.1f}%".format((159. - 133.) / 159. * 100))

### For simple addition, Cython version gives consistent speedup

* With all the caveats for microbenchmarks...

## We see the same `PyNumber_Add()` entry point as for interpreted Python

In [None]:
!cat /home/jovyan/.cache/ipython/cython/cyfoo.c | nl



```c
static PyObject 
*__pyx_pf_5cyfoo_cyfoo(CYTHON_UNUSED PyObject *__pyx_self,
                       PyObject *__pyx_v_a,
                       PyObject *__pyx_v_b) {
[...]
  /* "cyfoo.pyx":3
 * 
 * def cyfoo(a, b):
 *     return a + b             # <<<<<<<<<<<<<<
 */
  __pyx_t_1 = PyNumber_Add(__pyx_v_a, __pyx_v_b);
 [...]
}
```

## We conclude: converting from interpreted to compiled code gives some speedup

## Reason 2: Dynamic -> Static Typing

In [None]:
def pyfac(n):
    if n <= 1:
        return 1
    return n * pyfac(n - 1)

In [None]:
%timeit pyfac(20.0)
pyfac(20.0)

In [None]:
%%cython

def cyfac(n):
    if n <= 1:
        return 1
    return n * cyfac(n - 1)

def cyfac_double(double n):
    if n <= 1:
        return 1.0
    return n * cyfac_double(n - 1)

In [None]:
%timeit cyfac(20.0)
cyfac(20.0)

In [None]:
%timeit cyfac_double(20.0)
cyfac_double(20.0)

## Optimal Cython solution: up to 40x speedup

* Optimal for *this* recursive implementation...

In [None]:
%%cython

cpdef double cyfac_double_fast(double n):
    if n <= 1:
        return 1.0
    return n * cyfac_double_fast(n - 1)

In [None]:
%timeit cyfac_double_fast(20.0)
cyfac_double_fast(20.0)

## For the record: what about a loop-based version?

In [None]:
def pyfac_loop(n):
    r = 1.0
    for i in range(1, n+1):
        r *= i
    return r

In [None]:
%timeit pyfac_loop(20)
pyfac_loop(20)

In [None]:
%%cython

cpdef double cyfac_loop(int n):
    cdef double r = 1.0
    cdef int i
    for i in range(1, n+1):
        r *= <double>i
    return r

In [None]:
%timeit cyfac_loop(20)
cyfac_loop(20)

In [None]:
print("Cython speedup factor--loop-based version: {:0.1f}".format((1.81 / 0.062)))

## Excercises / questions

* Why are we using `double` here instead of `long`?
* Why are the `pyfac_loop()` and `cyfac_loop()` versions *better* from a robustness pov?
* Write a trivial no-op function in Python and measure its performance w/ `timeit`.  Now, make a Cython no-op `def` function, and measure *it*.  How do they compare?  Conjecture why.  What does this imply for function call overhead between pure Python and Cython code?

In [None]:
def pynoop(): pass
%timeit pynoop()

In [None]:
%%cython
def cynoop(): pass

In [None]:
%timeit cynoop()