# Cython demonstrated

Let's begin at the start. How do we use Cython within a Jupyter notebook?

First, we need to enable Cython within our notebook. This is done with

In [1]:
%load_ext cython

## Example: Computing cos(x) on an array

In the optimization lecture we began with profiling a set of functions to compute an integral. Let's start with one of those functions here. Here we write a function to compute the cos(x) for a timeseries. We also provide a numpy optimized function for computing cos(x) as we showed last week.

In [2]:
import numpy

def generate_time_series(tmin, tmax, delta_t):
    """
    Generates a times series between tmin and tmax sampled at delta_t
    """
    tseries = numpy.arange(tmin, tmax, delta_t)
    # We shift tseries by delta_t / 2 to ensure that we are using the midpoint rule (see wikipedia page)
    tseries = tseries + delta_t / 2.
    return tseries


def compute_cosx(tseries):
    """
    Computes cos(t) for all values in tseries
    """
    cosx = numpy.zeros(len(tseries))
    for idx, tval in enumerate(tseries):
        cosx[idx] = numpy.cos(tseries[idx])
    return cosx

def compute_cosx_numpy(tseries):
    """
    Computes cos(t) for all values in tseries
    """
    return numpy.cos(tseries)

Remember that the numpy version is considerably faster than the non-vectorized version

In [3]:
tseries = generate_time_series(1., 1000., 1./100.)
%timeit compute_cosx(tseries)
%timeit compute_cosx_numpy(tseries)

29.1 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
214 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Now let's try to do this with cython. As a first cut at this we can just use the Cython interpreter with no changes. In jupyter we begin a cell with `%%cython` to acheive this, and add the `-a` option to give some useful output. **NOTE** The cython cell does not have access to any imports made in any other cell, so you will need to import any modules in here again.

In [4]:
%%cython -a
import math, numpy
def compute_cosx_cython(tseries):
    """
    Computes cos(t) for all values in tseries
    """
    cosx = numpy.zeros(len(tseries))
    for idx, tval in enumerate(tseries):
        cosx[idx] = math.cos(tseries[idx])
    return cosx

def compute_cosx_numpy_cython(tseries):
    """
    Computes cos(t) for all values in tseries
    """
    return numpy.cos(tseries)

 2418 |                     CYTHON_FALLTHROUGH;
      |                     ^
/Users/iwharry/.ipython/cython/_cython_magic_534906b76ead1d77418c61447350fa62.c:294:34: note: expanded from macro 'CYTHON_FALLTHROUGH'
  294 |       #define CYTHON_FALLTHROUGH __attribute__((fallthrough))
      |                                  ^
 2429 |                     CYTHON_FALLTHROUGH;
      |                     ^
/Users/iwharry/.ipython/cython/_cython_magic_534906b76ead1d77418c61447350fa62.c:294:34: note: expanded from macro 'CYTHON_FALLTHROUGH'
  294 |       #define CYTHON_FALLTHROUGH __attribute__((fallthrough))
      |                                  ^


This command prints some stuff after "Generated by Cython Version.Number.X"

It will give you a sense of how fast this will be. Lines that are dark yellow will not be particularly fast. You do *not* want time-critical lines to be yellow. However, it is nice to be able to write *unedited* python code in this way. Look at the timings below! Our slow unoptimized version of the code is *significantly* faster when called in this way ... and it took no real effort on our part to do that! Thought the numpy code is still much faster ... for now!

In [5]:
%timeit compute_cosx_cython(tseries)
%timeit compute_cosx_numpy_cython(tseries)

9.45 ms ± 27.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
214 µs ± 871 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


How can we go about making this faster? Well let's focus on the first function with the for loop, the second version uses a numpy function and there's not much point trying to optimize that. There's a few things we can do to optimize this

* We use C's math library to call cos/sin. If we call math.cos or math.sin we're back in python code, and we must avoid that to be fast!
* We declare the type of all variables. This is done using `cdef` followed by the type of variable, followed by it's name. So `cdef int idx` says that idx is going to be an integer.
* Inputs to the function are similarly declared. Note the adding `[::1]` indicates a numpy array (or other array-like object in the standard configuration). So `double [::1]` can be interpreted as a numpy array of floats. (Python floats are 64-bit precision by default, which is called double in C).

Declaring variable types explicitly makes the function a little less flexible but allows the compiled code to be significantly faster as it knows more precisely ahead of time what it will be asked to do! Note that I also declare the length of the array, and the idx used in the for loop before starting the for loop.

In [6]:
%%cython -a
import numpy

from libc.math cimport cos # This imports c's sin function from the math library

def compute_cosx_cython(double [::1] timeseries):
    """
    Computes cos(t) for all values in tseries
    """
    cdef int n = timeseries.size # How many values in the timeseries
    cdef int idx
    cdef double[::1] cosx = numpy.zeros(n) # Create an array to store the cos(x) values
    for idx in range(n):
        cosx[idx] = cos(timeseries[idx])
    return cosx


 18633 |                     CYTHON_FALLTHROUGH;
       |                     ^
/Users/iwharry/.ipython/cython/_cython_magic_311581b611527fe1aac49cb788a6e635.c:296:34: note: expanded from macro 'CYTHON_FALLTHROUGH'
  296 |       #define CYTHON_FALLTHROUGH __attribute__((fallthrough))
      |                                  ^
 18644 |                     CYTHON_FALLTHROUGH;
       |                     ^
/Users/iwharry/.ipython/cython/_cython_magic_311581b611527fe1aac49cb788a6e635.c:296:34: note: expanded from macro 'CYTHON_FALLTHROUGH'
  296 |       #define CYTHON_FALLTHROUGH __attribute__((fallthrough))
      |                                  ^


Now how fast is this?

In [7]:
%timeit compute_cosx_cython(tseries)

215 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Finally, we can add a few options at the top of the function to turn off some python sanity checks, which are useful, but can slow code down. In this case these don't help much, but can be useful in other cases. Note that turning these off can cause your code to fail in weird ways with no reason for the failure (what's called a "segmentation fault"). If this happens remove these checks and see if you get a warning/error about things being wrong!

In [8]:
%%cython -a
import numpy
from cython import wraparound, boundscheck, cdivision

from libc.math cimport cos # This imports c's sin function from the math library

@boundscheck(False)
@wraparound(False)
@cdivision(True)
def compute_cosx_cython(double [::1] timeseries):
    """
    Computes cos(t) for all values in tseries
    """
    cdef int n = timeseries.size # How many values in the timeseries
    cdef int idx
    cdef double[::1] cosx = numpy.zeros(n) # Create an array to store the cos(x) values
    for idx in range(n):
        cosx[idx] = cos(timeseries[idx])
    return cosx


 18606 |                     CYTHON_FALLTHROUGH;
       |                     ^
/Users/iwharry/.ipython/cython/_cython_magic_a44c61265ee5be411a4172bd035c4ff0.c:297:34: note: expanded from macro 'CYTHON_FALLTHROUGH'
  297 |       #define CYTHON_FALLTHROUGH __attribute__((fallthrough))
      |                                  ^
 18617 |                     CYTHON_FALLTHROUGH;
       |                     ^
/Users/iwharry/.ipython/cython/_cython_magic_a44c61265ee5be411a4172bd035c4ff0.c:297:34: note: expanded from macro 'CYTHON_FALLTHROUGH'
  297 |       #define CYTHON_FALLTHROUGH __attribute__((fallthrough))
      |                                  ^


Note that there is now no yellow lines inside the for loop. The initialization stuff might still be slow, but the for loop is the bulk of the cost, and that seems quite well optimized now!

In [9]:
%timeit compute_cosx_cython(tseries)

213 µs ± 997 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


This is now about as fast as the numpy function. Don't forget that numpy is itself compiled C-code, so it's often hard to beat that. Indeed in this case just using the numpy function would be the best choice. However, the point is that if there *wasn't* a numpy cos function, you would be able to acheive a function that's basically as fast using cython.

It is worth emphasizing though that making cython fast is also not trivial but the big things to change with respect to python code are illustrated here:

 * Declare variable types (arrays especially can be complicated here ... I did use numpy.zeros here to create an input array, and I recommend doing this to avoid memory management in C.).
 * Use built-in C functions for things like cos or sin or exp.