<a href="https://colab.research.google.com/drive/1xp1hk0gnvFQgVD5Avvi7DqoMMlLAJptu?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Compiling Python with `numba` and `cython`

Reproduce Python function from lecture and measure its execution time:

In [5]:
def loop(x, r):
    for i in range(r):
        x *= 2.5
    return x

%time loop(2, 10**6)

CPU times: total: 93.8 ms
Wall time: 83.8 ms


inf

## Using `numba`

First, let's try compiling "Just in Time" using `numba`:

In [None]:
from numba import jit

# jit compiles when we call the function for the first time
# nopython tries to run without involving Python interpreter

# We need to define this everytime before the function if we want to compile
# specific things and not the whole code
@jit(nopython=True)
def loop_jit(x, r):
  for i in range(r):
    x *= 2.5
  return x

%time loop_jit(2, 10**6) # includes compilation time

# The more time it takes to do the first compile, the more benefit you obtain 
# from compiling first
# There is no specific rule. 

CPU times: total: 109 ms
Wall time: 107 ms


inf

In [None]:
%time loop_jit(2, 10**6) # much faster after compilation

CPU times: user 1.47 ms, sys: 62 µs, total: 1.53 ms
Wall time: 1.54 ms


inf

In [7]:
# This one is better, it times across multiple runs
%timeit loop(3, 10**6) # better to time across multiple runs using `timeit`

93.7 ms ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [8]:
%timeit loop_jit(3, 10**6)

2.02 ms ± 150 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


We might want to compile our code ahead of time, though, so that we can see a speed-up the first time we use it. `numba` allows us to compile ahead of time like so:


In [1]:
from numba.pycc import CC

# name of compiled module to create:
cc = CC('test_aot')

# name of function in module, with explicit data types required (4byte=32bit ints and floats)
@cc.export('loop_aot', 'f4(f4,i4)')
def loop_aot(x, r):
    for i in range(r):
        x *= 2.5
    return x

cc.compile()

Note that we now have a compiled object file (.so) in our current directory. This is a compiled module that contains our function.

In [3]:
ls

 El volumen de la unidad C no tiene etiqueta.
 El n�mero de serie del volumen es: FEE0-2E68

 Directorio de c:\Users\HP\Downloads\UChicago\1. Courses\3. Spring Quarter 2025\Large-scale computing\course-materials\in-class-activities\01_Introduction

25/03/2025  02:53 p. m.    <DIR>          .
25/03/2025  12:37 p. m.    <DIR>          ..
25/03/2025  02:53 p. m.            27,350 1M_python_compilation.ipynb
25/03/2025  02:53 p. m.            40,960 test_aot.cp313-win_amd64.pyd
               2 archivos         68,310 bytes
               2 dirs  98,227,331,072 bytes libres


To use our function, we just need to import our pre-compiled module, as we would any other Python module:

In [4]:
import test_aot
%time test_aot.loop_aot(2, 10**6) # first time running it is fast this time

CPU times: total: 0 ns
Wall time: 1.1 ms


inf

In [5]:
%timeit test_aot.loop_aot(2, 10**6) # same overall performance as before

1.01 ms ± 3.37 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Using `cython`

Another common way to compile Python code (albeit slightly uglier) is to compile our function via explicit `cython` static typing, like so (here, using the IPython `cython` extension to compile):

In [6]:
%load_ext cython

In [7]:
%%cython

# explicitly add static types to function itself:
def loop_cython(float x, int r):
    cdef int i
    for i in range(r):
        x *= 2.5
    return x

Content of stdout:
_cython_magic_f1d8097bee103e88fa9bf836eb9513f3b915de8e.c
   Creando biblioteca C:\Users\HP\.ipython\cython\_cython_magic_f1d8097bee103e88fa9bf836eb9513f3b915de8e.cp313-win_amd64.lib y objeto C:\Users\HP\.ipython\cython\_cython_magic_f1d8097bee103e88fa9bf836eb9513f3b915de8e.cp313-win_amd64.exp
Generando c¢digo
Generaci¢n de c¢digo finalizada

In [8]:
%timeit loop_cython(2, 10**6) # comparable performance to numba

3.53 ms ± 15 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
