# Cython - Static compilation for Python

Creating a compiled version of your Python code can easy with Cython, and speed-ups can be huge. 

Let's start with Paul's prime sieve code from earlier.

In [1]:
import  math
def sieve_primes(n):
    a = [True for x in range(n + 1)]
    i = 2
    while i <= math.sqrt(n):
        if a[i]:
            for j in range(i*i, n + 1, i):
                a[j] = False
        i += 1
    return [i for i in range(2, len(a)) if a[i]]

In [2]:
#Check it's working OK
sieve_primes(30)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

Let's time it. 

In [3]:
#Time it for all primes less than 5 million
N = 5000000
original_speed = %timeit -o sieve_primes(N)

1.19 s ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**Now we'll try Cython**

You'll need to ensure Cython is installed e.g. using `pip` or `conda`, and then we can load the Cython extension.

In [4]:
%load_ext Cython

We'll use the cython cell magic with the optional `-a` flag to get some insight into what Cython is doing. 

In [5]:
%%cython -a
import  math
def sieve_primes_cython(n):
    a = [True for x in range(n + 1)]
    i = 2
    while i <= math.sqrt(n):
        if a[i]:
            for j in range(i*i, n + 1, i):
                a[j] = False
        i += 1
    return [i for i in range(2, len(a)) if a[i]]

The yellow colour tell us whether the code generated by Cython interacts with Python or not. Line in white don't interact with Python, and hence will run as fast as normal C code, which is good. The darker the yellow, the higher the number of Python API calls for that line, and the less optimised the code will be, which is bad!! There was a lot of yellow above, so the speed-up will be modest.

In [6]:
#Check it's working OK
sieve_primes_cython(30)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

In [7]:
#Time it for all primes less than 5 million
cython_speed = %timeit -o sieve_primes_cython(N)

717 ms ± 8.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
print('Cython is {0:.1f} times faster than pure Python for the primes function'.format(original_speed.best/cython_speed.best))

Cython is 1.7 times faster than pure Python for the primes function


**A factor of around 1.6 speed up**

This isn't bad from adding a few lines of code, but can we do better?

The Cython documentation says better speed up may be obtained by identifying the types of the various objects in the code. In this example we have integers and a list, their types can be given explicitly using `cdef int` and `cdef list`. Note we now need to use use `cpdef` instead of `def`, see [the Cython documentation](https://notes-on-cython.readthedocs.io/en/latest/function_declarations.html) for details.

In [9]:
%%cython -a
import math
cpdef list sieve_primes_cython2(int n):
    cdef int i, j
    cdef list a 
    a = [True for x in range(n + 1)]
    i = 2
    while i <= math.sqrt(n):
        if a[i]:
            for j in range(i*i, n + 1, i):
                a[j] = False
        i += 1
    return [i for i in range(2, len(a)) if a[i]]

Is there less yellow? I think so, e.g. `i += 1` is now white. Let's test it and time it. 

In [10]:
sieve_primes_cython2(30)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

In [11]:
cython_speed = %timeit -o sieve_primes_cython2(N)

548 ms ± 31.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [12]:
print('Cython version 2 is {0:.1f} times faster than pure Python for the primes function'.format(original_speed.best/cython_speed.best))

Cython version 2 is 2.2 times faster than pure Python for the primes function


**What about another example?**

This function calculates pi using random numbers. 

In [13]:
#Plain Python version 
import random
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

In [14]:
N = 1000000
python_time = %timeit -o monte_carlo_pi(N)

373 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [15]:
%%cython
# Cython without types
import random
def monte_carlo_pi_cython(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

In [16]:
cython_time = %timeit -o monte_carlo_pi_cython(N)

254 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [17]:
print('Cython version is {0:.1f} times faster than pure Python'.format(python_time.best/cython_time.best))

Cython version is 1.5 times faster than pure Python


In [18]:
%%cython
# Cython with types
import random
cpdef double monte_carlo_pi_cython2(int nsamples):
    cdef int acc, i
    cdef double x,y
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return (4.0 * acc) / nsamples

In [19]:
cython_time = %timeit -o monte_carlo_pi_cython2(N)

89.4 ms ± 150 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [20]:
print('Cython version is {0:.1f} times faster than pure Python'.format(python_time.best/cython_time.best))

Cython version is 4.2 times faster than pure Python


Not bad.

And I managed to find an example which gives huge speed-ups.

**Example 3**

This code computes the Fibonaci series.

In [21]:
def fibonaci_series(n: int):
    i = 2
    a = 0
    b = 1
    vals = []
    if n > 0:
        vals.append(a)
    if n > 1:
        vals.append(b)
    while i < n:
        c = a + b
        vals.append(c)
        a = b
        b = c
        i += 1
    return vals

In [22]:
fibonaci_series(10)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In [23]:
n = 200000
python_speed = %timeit -o fib = fibonaci_series(n)

1.24 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [24]:
%%cython
cpdef fibonaci_series_cython(int n):
    cdef int i, a, b, c
    cdef list fib
    i = 2
    a = 0
    b = 1
    vals = []
    if n > 0:
        vals.append(a)
    if n > 1:
        vals.append(b)
    while i < n:
        c = a + b
        vals.append(c)
        a = b
        b = c
        i += 1
    return vals

In [26]:
fibonaci_series_cython(10)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In [27]:
cython_speed = %timeit -o fib = fibonaci_series_cython(n)

5.71 ms ± 41.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [28]:
print('Cython version is {0:.1f} times faster than pure python for this operation'.format(python_speed.best/cython_speed.best))

Cython version is 217.4 times faster than pure python for this operation


Nice!!!