# Cython 
Python implementation of C 

Lets start with finding a given number of prime numbers. 

In [12]:
%%writefile python_primes.py
def py_prime(nb_primes):
    p = []
    n = 2
    while len(p) < nb_primes:
        # Is n prime?
        for i in p:
            if n % i == 0:
                break

        # If no break occurred in the loop
        else:
            p.append(n)
        n += 1
    return p

Overwriting python_primes.py


In [13]:
from python_primes import  py_prime

## Compiled
A compiled language like `C` is a language that is converted to ML code before it is ever run.  Unlike Python.  This allows code to run faster.  So we are going to go over how to compile the code we just wrote.  You will be compiling any Cython code you write before using it.


As we do not want to get confused I am going to make another file with a different name and a modified function name which we will compile.

In [14]:
%%writefile python_primes_compiled.py
def py_prime_compiled(nb_primes):
    p = []
    n = 2
    while len(p) < nb_primes:
        # Is n prime?
        for i in p:
            if n % i == 0:
                break

        # If no break occurred in the loop
        else:
            p.append(n)
        n += 1
    return p

Overwriting python_primes_compiled.py


## Setup file

We need to set up a file to compile our code into C code to be used by our Python code.  This file tells Cython what file to compile and any additional information or features you want set.  You can also choose your compiler.  One important note is the annotate pram being set to true allows us to view an html file that will give us information about what parts of the code still use Python functionality and data types slowing down run time.

In [15]:
%%writefile setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize(["python_primes_compiled.py"], annotate=True)
)


Overwriting setup.py


Now that we have a setup file we need to have Cython exicute the file and create the compiled `C` code

In [16]:
!python setup.py build_ext --inplace

Compiling python_primes_compiled.py because it changed.
[1/1] Cythonizing python_primes_compiled.py
running build_ext
building 'python_primes_compiled' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.6m -c python_primes_compiled.c -o build/temp.macosx-10.7-x86_64-3.6/python_primes_compiled.o
gcc -bundle -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/python_primes_compiled.o -o /Users/daniel.rupp/Desktop/Cython_Demo/python_primes_compiled.cpython-36m-darwin.so


The output will tell you any errors that happen.  However those will only be major errors not things like trying to use the wrong variable type.

We can now import and use our code

In [17]:
from python_primes import py_prime
from python_primes_compiled import py_prime_compiled

%timeit py_prime(500)
%timeit py_prime_compiled(500)

8.08 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.14 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Cython code
There is many changes we can make to code to improve its run time.  The more `C` like you make the code the better but there are a few small changes that can be done to create major imporvements.  

The first and most important is making sure to cast all your variables into the data types they will be holding.  By specifying that the data types will be int, doubles, ect allows them to be converted into `C` data types speeding up processing.

The `cdef` before the variable type is a signifier to Python that we are creating a `C` data type.

In [18]:
%%writefile cy_primes.pyx
def cy_prime(int nb_primes):
    cdef int n, i, len_p
    cdef int p[1000]
    if nb_primes > 1000:
        nb_primes = 1000

    len_p = 0  # The current number of elements in p.
    n = 2
    while len_p < nb_primes:
        # Is n prime?
        for i in p[:len_p]:
            if n % i == 0:
                break

        # If no break occurred in the loop, we have a prime.
        else:
            p[len_p] = n
            len_p += 1
        n += 1

    # Let's return the result in a python list:
    result_as_list  = [prime for prime in p[:len_p]]
    return result_as_list


Overwriting cy_primes.pyx


We now need to update our `setup.py` file to include our new file when compiled.  We also need to rerun it

In [19]:
%%writefile setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize(["cy_primes.pyx","python_primes_compiled.py"], annotate=True)
)

Overwriting setup.py


In [20]:
!python setup.py build_ext --inplace

Compiling cy_primes.pyx because it changed.
[1/1] Cythonizing cy_primes.pyx
running build_ext
building 'cy_primes' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.6m -c cy_primes.c -o build/temp.macosx-10.7-x86_64-3.6/cy_primes.o
gcc -bundle -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/cy_primes.o -o /Users/daniel.rupp/Desktop/Cython_Demo/cy_primes.cpython-36m-darwin.so


#### We can now import our new function and run it.

In [21]:
from cy_primes import cy_prime

In [22]:
%timeit py_prime(500)
%timeit py_prime_compiled(500)
%timeit cy_prime(500)

8.55 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.89 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
570 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


By setting up our variables as `C` data types and compiling our code into `C` (and allowing Cython to create a wrapper to call the `C` code) we have made our code over 10 times faster.

### Lets look at the html file for a second

The yellow lines show where we can make changes to speed up code.

## Numpy and Cython

Numpy and Cython where set up to work well together.  With the addition of the `memoryviews` this has actually become easer and more general in usage.

The example below is just for showing the importance of using `C` structures and memoryviews to speed up execution.

Below is the way to do this using `numpy` which will be the best

In [13]:
import numpy as np
def compute_np(array_1, array_2, a, b, c):
    return np.clip(array_1, 2, 10) * a + array_2 * b + c

Here I have the same process without using `numpys` ability to broadcast.

In [14]:
def clip(a, min_value, max_value):
    return min(max(a, min_value), max_value)


def compute(array_1, array_2, a, b, c):
    """
    This function must implement the formula
    np.clip(array_1, 2, 10) * a + array_2 * b + c

    array_1 and array_2 are 2D.
    """
    x_max = array_1.shape[0]
    y_max = array_1.shape[1]

    assert array_1.shape == array_2.shape

    result = np.zeros((x_max, y_max), dtype=array_1.dtype)

    for x in range(x_max):
        for y in range(y_max):
            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result[x, y] = tmp + c
            
    return result

Finally I have a Cython version that puts all the base types into `C` data types.  However note that we do not have a way to type the numpy array.

In [15]:
%%writefile compute_cy.pyx
import numpy as np

# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.intc

# cdef means here that this function is a plain C function (so faster).
# To get all the benefits, we type the arguments and the return value.
cdef int clip(int a, int min_value, int max_value):
    return min(max(a, min_value), max_value)


def compute(array_1, array_2, int a, int b, int c):
    
    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indentation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    cdef Py_ssize_t x_max = array_1.shape[0]
    cdef Py_ssize_t y_max = array_1.shape[1]
    
    assert array_1.shape == array_2.shape
    assert array_1.dtype == DTYPE
    assert array_2.dtype == DTYPE

    result = np.zeros((x_max, y_max), dtype=DTYPE)
    
    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    # For the "tmp" variable, we want to use the same data type as is
    # stored in the array, so we use int because it correspond to np.intc.
    # NB! An important side-effect of this is that if "tmp" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.

    cdef int tmp

    # Py_ssize_t is the proper C type for Python array indices.
    cdef Py_ssize_t x, y

    for x in range(x_max):
        for y in range(y_max):

            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result[x, y] = tmp + c

    return result

Writing compute_cy.pyx


In [16]:
%%writefile setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize(["cy_primes.pyx","python_primes_compiled.py","compute_cy.pyx"], annotate=True)
)

Overwriting setup.py


In [17]:
!python setup.py build_ext --inplace

Compiling compute_cy.pyx because it changed.
[1/1] Cythonizing compute_cy.pyx
running build_ext
building 'compute_cy' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.6m -c compute_cy.c -o build/temp.macosx-10.7-x86_64-3.6/compute_cy.o
gcc -bundle -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/compute_cy.o -o /Users/daniel.rupp/Desktop/Cython_Demo/compute_cy.cpython-36m-darwin.so


In [18]:

from compute_cy import compute

array_1 = np.random.uniform(0, 1000, size=(1000, 500)).astype(np.intc)
array_2 = np.random.uniform(0, 1000, size=(1000, 500)).astype(np.intc)
a = 4
b = 3
c = 9

%timeit compute_np(array_1, array_2, a, b, c)
%timeit compute(array_1, array_2, a, b, c)


1.43 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.41 s ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


As we can see this makes the function very slow compaired to the `numpy` implimentation.

If we look at the resulting `html` file we can see that we are still running Python code for much of the work.  We need to be using `C` arrays.  This gets us into using  memoryviews.

As you can see we have `int[:, :]` as types for array_1 and array_2.  This causes them to be cast as memoryviews and allows `C` speed access.

In [19]:
%%writefile compute_cy_memview.pyx

import numpy as np


DTYPE = np.intc


cdef int clip(int a, int min_value, int max_value):
    return min(max(a, min_value), max_value)


def compute_memview(int[:, :] array_1, int[:, :] array_2, int a, int b, int c):
     
    cdef Py_ssize_t x_max = array_1.shape[0]
    cdef Py_ssize_t y_max = array_1.shape[1]

    # array_1.shape is now a C array, no it's not possible
    # to compare it simply by using == without a for-loop.
    # To be able to compare it to array_2.shape easily,
    # we convert them both to Python tuples.
    assert tuple(array_1.shape) == tuple(array_2.shape)

    result = np.zeros((x_max, y_max), dtype=np.intc)
    
    cdef int[:, :] result_view = result

    cdef int tmp
    cdef Py_ssize_t x, y

    for x in range(x_max):
        for y in range(y_max):

            tmp = clip(array_1[x, y], 2, 10)
            tmp = tmp * a + array_2[x, y] * b
            result_view[x, y] = tmp + c

    return result

Writing compute_cy_memview.pyx


In [20]:
%%writefile setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize(["cy_primes.pyx","python_primes_compiled.py","compute_cy.pyx","compute_cy_memview.pyx"], annotate=True)
)

Overwriting setup.py


In [21]:
!python setup.py build_ext --inplace

Compiling compute_cy_memview.pyx because it changed.
[1/1] Cythonizing compute_cy_memview.pyx
running build_ext
building 'compute_cy_memview' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.6m -c compute_cy_memview.c -o build/temp.macosx-10.7-x86_64-3.6/compute_cy_memview.o
gcc -bundle -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/compute_cy_memview.o -o /Users/daniel.rupp/Desktop/Cython_Demo/compute_cy_memview.cpython-36m-darwin.so


In [22]:
from compute_cy_memview import compute_memview

In [23]:
%timeit compute_memview(array_1, array_2, a, b, c)

2.51 ms ± 399 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


This is still slower then numpy.  We can improve the run time by turning off bound and wrap checks.

```
@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing
```

# Using Cython in Jupyter 

you load the extension then any cell that is started with `%%cython` will compile the code.  You can add the `--annotate` flag and get the analysis on the code.

In [24]:
%load_ext Cython

In [25]:
%%cython

cdef int a = 0
for i in range(10):
    a += i
print(a)

45


In [27]:
%%cython --annotate

cdef int i,a = 0
for i in range(10):
    a += i
print(a)

45


## Data Objects 

We can also make data objects in `C`.  This will speed up the interaction with them and any computing they do.  Remember that the full requirements for storage of variables will be allocated on creation of the object.

In [28]:
class Particle():
    
    def __init__(self, m, p, v):
        self.mass = m
        self.position = p
        self.velocity = v
        
    def get_momentum(self):
        return self.mass * self.velocity

In [29]:
%%writefile particle_class.pyx

cdef class CParticle(object):
    
    cdef double mass, position, velocity
    def __init__(self, m, p, v):
        
        self.mass = m
        self.position = p
        self.velocity = v
        
    def get_momentum(self):
        return self.mass * self.velocity

Writing particle_class.pyx


In [30]:
%%writefile setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize(["particle_class.pyx","cy_primes.pyx","python_primes_compiled.py","compute_cy.pyx","compute_cy_memview.pyx"], annotate=True)
)

Overwriting setup.py


In [31]:
!python setup.py build_ext --inplace

Compiling particle_class.pyx because it changed.
[1/1] Cythonizing particle_class.pyx
running build_ext
building 'particle_class' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.6m -c particle_class.c -o build/temp.macosx-10.7-x86_64-3.6/particle_class.o
gcc -bundle -undefined dynamic_lookup -L/anaconda3/lib -arch x86_64 -L/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/particle_class.o -o /Users/daniel.rupp/Desktop/Cython_Demo/particle_class.cpython-36m-darwin.so


In [32]:
from particle_class import CParticle

In [33]:
particle = Particle(1.0, 2.2, 3.3)
c_particle = CParticle(1.0, 2.2, 3.3)

In [34]:
particle.get_momentum()

3.3

In [35]:
c_particle.get_momentum()

3.3