# Cython and NumPy, buffers, fused types

We'll be re-implementing the `scipy.stats.poisson.entropy()` function / method using Cython and its NumPy support, and we'll see what sort of speedup we can get for our efforts.

Obligatory XKCD:

<img src="https://imgs.xkcd.com/comics/password_strength.png" width=600 height=600 />

In [None]:
%load_ext Cython
import numpy as np

## Pure Python implementation

In [None]:
# https://en.wiktionary.org/wiki/Shannon_entropy

def shannon_entropy_py(p_x):
    return - np.sum(p_x * np.log(p_x))

## Cythonized version

In [None]:
%%cython -a

import numpy as np
cimport numpy as cnp

def shannon_entropy_cy(cnp.ndarray p_x):
    return - np.sum(p_x * np.log(p_x))

## What's `cimport`?

Cython introduces a new keyword, `cimport` that allows interfacing with other Cython code _at the C-level_ and _at compile time_.

This is in distinction to Python's regular `import` statement, which interfaces with other _Python_ modules at _runtime_.

The statement `cimport numpy as cnp` allows Cython to interface with NumPy arrays at the C level and at compile time for improved performance.

The `cimport numpy` statement causes the Cython compiler to look for a `numpy.pxd` Cython declaration file at compile time.  This is where we can find the C-level declarations of the NumPy C-API.  Here we're using the `ndarray` object from `numpy.pxd` to declare the argument of `shannon_entropy_cy()`.

## Scipy.stats comparison

In [None]:
from scipy.stats import poisson
poi = poisson(10.0)
n = 100
pmf = poi.pmf(np.arange(n))

In [None]:
print(poi.entropy())
print(shannon_entropy_py(pmf))
print(shannon_entropy_cy(pmf))

In [None]:
%%timeit
poi.entropy()

In [None]:
%%timeit
shannon_entropy_py(pmf)

In [None]:
%%timeit
shannon_entropy_cy(pmf)

## Explicit `for` loop

In [None]:
%%cython -a

cimport numpy as cnp
from libc.math cimport log as clog

def shannon_entropy_v1(cnp.ndarray p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        res += p_x[i] * clog(p_x[i])
    return -res

In [None]:
%%timeit
shannon_entropy_v1(pmf)

## What's `from libc.math cimport log`?

Cython allows us to `cimport` C (and C++) functions from the C and C++ standard (template) libraries.

To access functions in the C stdlib `math.h` header file, we simply do

```
from libc.math cimport exp, log, sqrt
```

## NumPy buffer special declaration

In [None]:
%%cython -a

cimport numpy as cnp
from libc.math cimport log as clog

def shannon_entropy_v2(cnp.ndarray[double] p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        res += p_x[i] * clog(p_x[i])
    return -res

In [None]:
%%timeit
shannon_entropy_v2(pmf)

## The `cnp.ndarray[double]` syntax

The `cnp.ndarray[double]` declares a NumPy array _buffer_ object.  Cython knows how to interact with this array-like object efficiently.  The `double` in square brackets is the (scalar) dtype of the array elements.

## Turn off boundschecking and wraparound checking

In [None]:
%%cython -a

cimport cython
cimport numpy as cnp
from libc.math cimport log

@cython.boundscheck(False)
@cython.wraparound(False)
def shannon_entropy_v3(cnp.ndarray[double] p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        res += p_x[i] * log(p_x[i])
    return -res

In [None]:
%%timeit
shannon_entropy_v3(pmf)

## The `cython` cimported magic module

Cython allows us to control its compile-time semantics and behavior via the magic `cython` module that we `cimport`.

We can then use Cython directives like:

```python
@cython.boundscheck(False)
@cython.wraparound(False)
def func(...):
  ...
```

To control how code is generated.

In this case, we're telling cython to not generate code that does boundschecking or wraparound checking (negative indexing).

## Typed memoryview syntax

In [None]:
%%cython -a

cimport cython
from libc.math cimport log

@cython.boundscheck(False)
@cython.wraparound(False)
def shannon_entropy_mv(double[::1] p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        res += p_x[i] * log(p_x[i])
    return -res

In [None]:
%%timeit
shannon_entropy_mv(pmf)

## Typed memoryview syntax

The declaration

```python
def shannon_entropy_mv(double[::1] p_x):
   ...
```

Declares `p_x` to be a one dimensional contiguous typed memoryview -- a Cython only construct that's compatible with NumPy arrays and PEP 3118 buffer objects.  We include it here because you may see this synax used in other situations.

## Fused types example

In [None]:
%%cython -a

cimport cython
from libc.math cimport log

@cython.boundscheck(False)
@cython.wraparound(False)
def shannon_entropy_mv(cython.floating[::1] p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        if p_x[i] > 0.0: # Have to guard against underflow...
            res += p_x[i] * log(p_x[i])
    return -res

In [None]:
print(shannon_entropy_mv(pmf.astype('f8')))
print(shannon_entropy_mv(pmf.astype('f4')))