## The Numba Project

> Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. 

[Numba](http://numba.pydata.org/) can convert (snippets of) Python code to fast [LLVM](https://llvm.org/) bytecode  (optimized machine code) and thus decrease the execution time compared to standard Python code or even `numpy` routines. 
It runs on CPUs but can also exploit GPU hardware acceleration and offers advanced features like e.g. support for vectorization or automatic parallelization:
* using the `vectorize()` decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays as fast as traditional ufuncs written in C.
* the "parallel" option attempts to automatically parallelize and perform other optimizations on (part of) a function. At the moment, this feature only works on CPUs.

In [None]:
# generate two large vectors
import numpy as np
l = 10000
x = np.random.randn(l)
y = np.random.randn(l)
x

First, we do the standard approach with a simple `for` loop:

In [None]:
def compute_mean_distance(x, y):
    # computes mean distance over 2 vectors
    l = len(x)
    s = 0
    for idx in range(l):
        s = s + (x[idx]-y[idx])
    return s / l
print(compute_mean_distance(x, y))

This is, of course, very slow:

In [None]:
%%timeit
compute_mean_distance(x, y)

Next, we will use `numpy`'s "universal functions" (`ufunc`s):

In [None]:
def compute_mean_distance_numpy(x, y):
    return (x-y).mean()
compute_mean_distance_numpy(x, y)

In [None]:
%%timeit
compute_mean_distance_numpy(x, y)

This is much faster as the loop is done at a lower level. 

Next, we will bring Numba into the game. (Numba needs to be installed first as it is not part of Python. The convention is to import this as `nb`.)

In [None]:
import numba as nb
# Note that we apply this on our handmade, slow for-loop implementation.
# jit stands for "just in time [compilation]".
compute_mean_distance_numba = nb.jit(compute_mean_distance, nopython=True)
# by calling the function once here, it will later (when we do the timing) be able to retrieve the compiled code from cache
print(compute_mean_distance_numba(x, y))

We are instructing Numba here to run in "no-python mode", meaning that it will compile the function so that it will run entirely without the involvement of the Python interpreter. This is the recommended and best-practice way to use Numba as it leads to the best performance. 

Let us now whether this is any faster:

In [None]:
%%timeit
compute_mean_distance_numba(x, y)

This is now even faster than the `numpy` version! 

However, running in "no-python mode" will not always work: Numba cannot be used to compile arbitrary Python code, as it supports only a (large) subset of Python syntax. Its main use is thus to speed up numerical algorithms that involve long loops with very little additional effort. Note that we did not even have to rewrite our code or do an additional compilation step.

Combining `numpy` and `numba` does not bring such a large boost, as the `numpy` code is already optimal (and not Python code):

In [None]:
compute_mean_distance_numby = nb.jit(compute_mean_distance_numpy, nopython=True)
print(compute_mean_distance_numby(x, y))

In [None]:
%%timeit
compute_mean_distance_numby(x, y)

In addition, there is some internal overhead from the translation and compilation steps that (invisible to us) are done in the background.