<a href="https://colab.research.google.com/github/jonclindaniel/LargeScaleComputing_A21/blob/main/in-class-activities/01_Introduction/1W_python_compilation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Compiling Python with `numba` and `cython`

Reproduce Python function from lecture and measure its execution time:

In [1]:
def loop(x, r):
    for i in range(r):
        x *= 2.5
    return x

%time loop(2, 10**6)

CPU times: user 67.2 ms, sys: 606 µs, total: 67.8 ms
Wall time: 65.6 ms


inf

## Using `numba`

First, let's try compiling "Just in Time" using `numba`:

In [1]:
from numba import jit

# jit compiles when we call the function for the first time
# nopython tries to run without involving Python interpreter
@jit(nopython=True)
def loop_jit(x, r):
  for i in range(r):
    x *= 2.5
  return x

%time loop_jit(2, 10**6) # includes compilation time

CPU times: user 519 ms, sys: 1.5 s, total: 2.02 s
Wall time: 271 ms


inf

In [3]:
%time loop_jit(2, 10**6) # much faster after compilation

CPU times: user 931 µs, sys: 1.63 ms, total: 2.56 ms
Wall time: 2.57 ms


inf

In [4]:
%timeit loop(2, 10**6) # better to time across multiple runs using `timeit`

41.8 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [5]:
%timeit loop_jit(2, 10**6)

1.61 ms ± 96.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


We might want to compile our code ahead of time, though, so that we can see a speed-up the first time we use it. `numba` allows us to compile ahead of time like so:


In [2]:
from numba.pycc import CC

# name of compiled module to create:
cc = CC('test_aot')

# name of function in module, with explicit data types required (4byte=32bit ints and floats)
@cc.export('loop_aot', 'f4(f4,i4)')
def loop_aot(x, r):
    for i in range(r):
        x *= 2.5
    return x

cc.compile()

Note that we now have a compiled object file (.so) in our current directory. This is a compiled module that contains our function.

In [3]:
ls

1W_python_compilation.ipynb  [0m[01;32mtest_aot.cpython-39-x86_64-linux-gnu.so[0m*


To use our function, we just need to import our pre-compiled module, as we would any other Python module:

In [4]:
import test_aot
%time test_aot.loop_aot(2, 10**6) # first time running it is fast this time

CPU times: user 3.12 ms, sys: 0 ns, total: 3.12 ms
Wall time: 3.13 ms


inf

In [5]:
%timeit test_aot.loop_aot(2, 10**6) # same overall performance as before

1.37 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


## Using `cython`

Another common way to compile Python code (albeit slightly uglier) is to compile our function via explicit `cython` static typing, like so (here, using the IPython `cython` extension to compile):

In [10]:
%load_ext cython

In [11]:
# will automatically convert Python -> C -> Compiled machine code
%%cython 

# explicitly add static types to function itself:
def loop_cython(float x, int r):
    cdef int i
    for i in range(r):
        x *= 2.5
    return x

SyntaxError: invalid syntax (1013425813.py, line 5)

In [None]:
# for running on local machine:

import cython

In [13]:
# for running on local machine

# explicitly add static types to function itself:
def loop_cython(x: cython.float, r: cython.int):
    i: cython.int
    for i in range(r):
        x *= 2.5
    return x

In [14]:
%timeit loop_cython(2, 10**6) # comparable performance to numba

40.2 ms ± 5.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
