# Extending Python 
This is just a little notebook based on some lectures from Jake Vanderplas
That I made to talk about different approaches to leveraging C code in Python

In [3]:
!ls /usr/lib

bundle1.o			   libkmod.a
charset.alias			   libkmodc++.a
cron				   libkrb4.dylib
crt1.10.5.o			   libkrb5.dylib
crt1.10.6.o			   libkrb524.dylib
crt1.o				   libkrb5support.dylib
dtrace				   libktrace.dylib
dyld				   libl.a
dylib1.10.5.o			   liblangid.dylib
dylib1.o			   liblapack.dylib
gcrt1.o				   liblber.dylib
groff				   libldap.dylib
lazydylib1.o			   libldap_r.dylib
libACSClient.dylib		   liblzma.5.dylib
libAccountPolicyTranslation.dylib  liblzma.dylib
libBSDPClient.A.dylib		   libm.dylib
libBSDPClient.dylib		   libmecab.1.0.0.dylib
libCRFSuite.dylib		   libmecab.dylib
libCRFSuite0.12.dylib		   libmecabra.dylib
libCoreStorage.dylib		   libmenu.5.4.dylib
libDHCPServer.A.dylib		   libmenu.dylib
libDHCPServer.dylib		   libmx.A.dylib
libDiagnosticMessagesClient.dylib  libmx.dylib
libIASAuthReboot.dylib		   libncurses.5.4.dylib
libMatch.1.dylib		   libncurses.5.dylib
libMatch.dylib			   libncurses.dylib
libOpenScriptingUtil.dylib	   libnetsnmp.15.1.2.dyl

In [4]:
from ctypes import CDLL
libc_name = 'libc.dylib'
libc = CDLL(libc_name)

In [5]:
libc.time

<_FuncPtr object at 0x105e49688>

In [6]:
%%file my_sum.c
#include <stdio.h>

// sum all the values in the array x
// x is a pointer to a memory block 
// of length n
int sum(int *x, int n)
{
  int i, counter;
  counter = 0;
  for(i=0; i<n; i++)
    {
      counter += x[i];
    }
  return counter;
}

Overwriting my_sum.c


In [7]:
%%bash 
gcc -c my_sum.c
gcc -shared -o my_sum.so my_sum.o

In [8]:
!ls my_sum.*

my_sum.c  my_sum.o  my_sum.so



Now we'll use the CTypes API to create Python objects that can be passed to this function:

In [9]:
from ctypes import CDLL, c_void_p
import numpy as np

my_sum=CDLL('my_sum.so')

a = np.arange(10, dtype=np.int32)
adata = a.ctypes.data_as(c_void_p)
asize = a.size

my_sum.sum(adata, asize)

45

In [10]:
a.sum()

45

## Cython
CTypes and F2Py provide the ability to wrap Fortran, C, and C++ code so that it can be imported into Python. Cython enables this as well, though we will not focus on that part of it here. The biggest part of Cython is that it lets you **convert** Python code and Python-like code into compiled C code, which can run many times faster than the original code.
Let's see a quick example. Here's a Python function which computes the N^th fibonacci number:

In [11]:
def nth_fib(n):
    a, b = 0, 1
    for i in range(n):
        b, a = a + b, b
    return a

In [12]:
[nth_fib(i) for i in range(20)]

[0,
 1,
 1,
 2,
 3,
 5,
 8,
 13,
 21,
 34,
 55,
 89,
 144,
 233,
 377,
 610,
 987,
 1597,
 2584,
 4181]

In [13]:
%timeit nth_fib(10000)

100 loops, best of 3: 2.37 ms per loop



Now we'll take the exact same code, and compile it with Cython. In general, this will be done by saving the code to file, and running cython on the command line. You can read about that in the [documentation](cython.org). An example is included in this folder in the source code. Here we'll use IPython's Cython magic to streamline the process:

In [14]:
%load_ext cython

In [16]:
%%cython 
def nth_fib2(n):
    a, b = 0, 1
    for i in range(n):
        b, a = a + b, b
    return a

In [17]:
[nth_fib2(i) for i in range(10)]

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In [18]:
%timeit nth_fib2(10000)

100 loops, best of 3: 2 ms per loop


Just compiling the code in Cython gave us a ~10% speedup.
But we can do better by adding **type** annotations.
See, the main reason Python is slow is because it has to do dynamic type checking each time it evaluates an expression. If we can tell Cython what the types are from the beginning, this step can be skipped, and we have large time savings. We do this through a cdef command. We also do the temporary assignment explicitly to remove the Python tuple assignment:

In [19]:
%%cython
def nth_fib3(int n):
    cdef int a = 0
    cdef int b = 1
    cdef int tmp
    for i in range(n):
        tmp = b
        b = a + b
        a = tmp 
    return a

In [20]:
[nth_fib3(i) for i in range(10)]

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In [21]:
print("Python only:")
%timeit nth_fib(10000)
print("Bare Cython:")
%timeit nth_fib(10000)
print("Typed Cython:")
%timeit nth_fib3(10000)

Python only:
100 loops, best of 3: 2.44 ms per loop
Bare Cython:
100 loops, best of 3: 2.37 ms per loop
Typed Cython:
100000 loops, best of 3: 3.14 µs per loop


# Using Cython with NumPy
Cython provides a really nice interface to numpy arrays via the [Typed Memoryview](http://docs.cython.org/src/userguide/memoryviews.html) syntax. Let's implement the same fib function as above, but using Cython.
First we'll simply compile our Python function again:

In [23]:

import numpy as np

def fib_py(N):
    x = np.zeros(N, dtype=float)
    for i in range(N):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i - 1] + x[i - 2]
    return x

In [24]:
%%cython
import numpy as np
def fib_cy(N):
    x = np.zeros(N, dtype=float)
    for i in range(N):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i - 1] + x[i - 2]
    return x

In [25]:
print(np.allclose(fib_py(10000), fib_cy(10000)))
%timeit fib_py(10000)
%timeit fib_cy(10000)

True
100 loops, best of 3: 4.36 ms per loop

  if __name__ == '__main__':



100 loops, best of 3: 2.97 ms per loop


Again, a very small improvement. Let's add some type information

In [27]:
%%cython
import numpy as np
from numpy cimport float_t

def fib_cy2(int N):
    cdef int i
    cdef float_t[::1] x = np.zeros(N, dtype=float)
    for i in range(N):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i - 1] + x[i - 2]
    return x

In [30]:
%timeit fib_cy(10000)


100 loops, best of 3: 2.92 ms per loop




In [29]:
%timeit fib_cy2(10000)

10000 loops, best of 3: 33.6 µs per loop


Wow! Simply adding some type 
information for the Cython compiler made our 
code orders of magnitude faster! This is because the array can now be treated as a **contiguous memory block**
rather than a Python object. This makes each of the indexing operations much more efficient, 
because they no longer have to go through the Python interface layer.