<h1 align="center">Scientific Programming in Python</h1>
<h2 align="center">Topic 5: Accelerating Python with Cython: Writting C in Python </h2> 


_Notebook created by Martín Villanueva - `martin.villanueva@usm.cl` - DI UTFSM - May2017._

In [1]:
%matplotlib inline

import numpy as np
import numexpr as ne
import numba
import math
import random
import matplotlib.pyplot as plt
import scipy as sp
import sys

%load_ext Cython

## Table of Contents
* [1.- Cython Basic Usage](#cython)
* [2.- Advanced usage](#cython++)
* [3.- Pure C in Python](#C)


<div id='cython' />
## 1.- Cython Basic Usage

__Cython__ is both a __Superset of Python__ and a __Python Library__ that lets you combine C and Python in various ways. There are two main use-cases:
1. Optimizing your Python code by statically compiling it to C.
2. Wrapping a C/C++ library in Python.

In order to get it properly working, you need Cython and a C compiler:
1. __Cython__: `conda install cython`
2. __C compiler__: Install GNU C compiler with your package manager (Unix/Linux) or install Xcode (OSX).

***

We will introduce the basic Cython usage by impementing the  [__Eratosthenes Sieve Algorithm__](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes), which is an algorithm to find all prime numbers smaller than a given number.

In [8]:
def primes_python(n):
    primes = [False, False] + [True] * (n - 2)
    i= 2
    while i < n:
        # We do not deal with composite numbers.
        if not primes[i]:
            i += 1
            continue 
        k= i+i
        # We mark multiples of i as composite numbers.
        while k < n:
            primes[k] = False
            k += i 
        i += 1
    # We return all numbers marked with True.
    return [i for i in range(2, n) if primes[i]]

In [9]:
primes_python(20)

[2, 3, 5, 7, 11, 13, 17, 19]

Let's evaluate the performance for the first version:

In [10]:
tp = %timeit -o primes_python(10000)

100 loops, best of 3: 5.11 ms per loop


And now we write our first Cython version, by just adding `%%cython` magic in the first line of the cell:

In [11]:
%%cython
def primes_cython1(n):
    primes = [False, False] + [True] * (n - 2)
    i= 2
    while i < n:
        # We do not deal with composite numbers.
        if not primes[i]:
            i += 1
            continue 
        k= i+i
        # We mark multiples of i as composite numbers.
        while k < n:
            primes[k] = False
            k += i 
        i += 1
    # We return all numbers marked with True.
    return [i for i in range(2, n) if primes[i]]

In [13]:
tc1 = %timeit -o primes_cython1(10000)

100 loops, best of 3: 2.61 ms per loop


__We achieve x2 speed improvement doing (practically) nothing!__.

When we add `%%cython` at the beginning of the cell, the code gets compiled by Cython into a C extension. Then, this extension is loaded, and the compiled function is readily available in the interactive namespace. 

Lets help the compiler by explicitly defining the type of the variables with the __`cdef`__ macro/keyword:

In [14]:
%%cython
def primes_cython2(int n):
    # Note the type declarations below
    cdef list primes = [False, False] + [True] * (n - 2)
    cdef int i = 2
    cdef int k = 0
    # The rest of the functions is unchanged
    while i < n:
        # We do not deal with composite numbers.
        if not primes[i]:
            i += 1
            continue 
        k= i+i
        # We mark multiples of i as composite numbers.
        while k < n:
            primes[k] = False
            k += i 
        i += 1
    # We return all numbers marked with True.
    return [i for i in range(2, n) if primes[i]]

In [15]:
tc2 = %timeit -o primes_cython2(10000)

1000 loops, best of 3: 308 µs per loop


In [18]:
print("Cython version 1 speedup: {0}".format(tp.best/tc1.best))
print("Cython version 2 speedup: {0}".format(tp.best/tc2.best))

Cython version 1 speedup: 1.9537728803827772
Cython version 2 speedup: 16.57108263494032


__Then__: _In general, Cython will be the most efficient when it can compile data structures and operations directly to C by __making as few CPython API calls as possible__. Specifying the types of the variables often leads to greater speed improvements._

Just for curiosity let's see the performance Numba's JIT achieves:

In [22]:
@numba.jit(nopython=True)
def primes_numba(n):
    primes = [False, False] + [True] * (n - 2)
    i= 2
    while i < n:
        # We do not deal with composite numbers.
        if not primes[i]:
            i += 1
            continue 
        k= i+i
        # We mark multiples of i as composite numbers.
        while k < n:
            primes[k] = False
            k += i 
        i += 1
    # We return all numbers marked with True.
    res = []
    for i in range(2,n):
        if primes[i]: res.append(i)
    return res

In [42]:
tn = %timeit -o primes_numba(10000)

10000 loops, best of 3: 159 µs per loop


Numba wins this time! but: __This is not the final form of Cython...__ 

### Inspecting Cython bottlenecks with annotations

We can inspect the C code generated by Cython with the `-a` argument. Let's inspect the code used above.

The non-optimized lines will be shown in a gradient of yellow (__white lines are faster, yellow lines are slower__), telling you which lines are the least efficiently compiled to C. By clicking on a line, you can see the generated C code corresponding to that line.

In [43]:
%%cython -a
def primes_cython1(n):
    primes = [False, False] + [True] * (n - 2)
    i= 2
    while i < n:
        # We do not deal with composite numbers.
        if not primes[i]:
            i += 1
            continue 
        k= i+i
        # We mark multiples of i as composite numbers.
        while k < n:
            primes[k] = False
            k += i 
        i += 1
    # We return all numbers marked with True.
    return [i for i in range(2, n) if primes[i]]

In [44]:
%%cython -a
def primes_cython2(int n):
    # Note the type declarations below
    cdef list primes = [False, False] + [True] * (n - 2)
    cdef int i = 2
    cdef int k = 0
    # The rest of the functions is unchanged
    while i < n:
        # We do not deal with composite numbers.
        if not primes[i]:
            i += 1
            continue 
        k= i+i
        # We mark multiples of i as composite numbers.
        while k < n:
            primes[k] = False
            k += i 
        i += 1
    # We return all numbers marked with True.
    return [i for i in range(2, n) if primes[i]]

### Alternative usage of Cython: Outside the notebook

If you want to use Cython outside the notebook (the way it was thought...), you have to do the work of the magic:
1. Write the function into a `.pyx` file.
2. __Cythonize it__ with `cython filename.pyx` generating the `filename.c` file.
3. Compile it with GCC: 

`gcc -shared -fPIC -fwrapv -O3 -fno-strict-aliasing -I/home/username/anaconda/include/python2.7 -o filename filename.c`

<div id='cython++' />
## 2.- Advanced usage


### Compiler directives

### NumPy Arrays

### Typed Memory Views

### Classes and methods

<div id='C' />
## 3.- Pure C in Python

### `def`, `cdef` and `cpdef`

### Function inlining
