# Numba
Numba is good for interfacing your user-defined Python functions into very fast machine code (for any machine!). Numba's speed is in repeated applications of the same functions, as it compiles the function into machine code in the first pass and is very fast when calling it afterward. This applies to functions that do mostly scalar and vector operations and do not rely on specialized Python libraries and objects. If Numba can recognize the types/shapes of the inputs and the mapping to outputs, it could be made much faster. 

In [1]:
from timeit import default_timer as timer
import numpy as np
from numba import jit
from sys import getsizeof

Define random 2-d array:

In [2]:
n = 10**3
x = np.random.random((n,n))
print('Size of array x (MB): ',round(getsizeof(x)/1024/1024,2))

Size of array x (MB):  7.63


Define arbitrary function that does some computation and looping:

In [3]:
def function(x):
    y = np.zeros(x.shape)
    for i in range(0,x.shape[0]):
        for j in range(0,x.shape[1]):
            y[j,i] = x[i,j] + x[j,i] - x[i,j]*x[j,i] # arbitrary symmetrization w/ Hadamard product
    return y

Time one evaluation of the function:

In [4]:
s = timer()
y = function(x)
e = timer()
normal_tfe = e - s
print('Time of 1 function evaluation without Numba (seconds): ',normal_tfe)

Time of 1 function evaluation without Numba (seconds):  0.8829193459999998


Then, redefine same function, except decorate with Numba's just-in-time compiler:

In [5]:
@jit(nopython=True)
def function(x):
    y = np.zeros(x.shape)
    for i in range(0,x.shape[0]):
        for j in range(0,x.shape[1]):
            y[j,i] = x[i,j] + x[j,i] - x[i,j]*x[j,i] # arbitrary symmetrization w/ Hadamard product
    return y

In [6]:
s = timer()
y = function(x)
e = timer()
print('Time of 1st function evaluation with Numba (seconds): ',e - s)

Time of 1st function evaluation with Numba (seconds):  0.24062856900000007


There is a larger speedup after the first application because Numba compiled the function in machine code to run it once. Let's test how much faster it got with this function that loops over the compiled function until it hits the time it took to run once in Python.

In [7]:
def loop_numba(x,normal_tfe):
    i = 0 # counter
    s,e = timer(),0 # start timer
    while e-s < normal_tfe:
        y = function(x)
        e = timer()
        i = i + 1
    return i


Compare the function evaluation times:

In [8]:
count_nfes_numba = loop_numba(x,normal_tfe)
print('Numba-compiled function ran {} times in the amount of time it took to run once in Python.'.format(count_nfes_numba))

Numba-compiled function ran 119 times in the amount of time it took to run once in Python.


But of course, for this operation, we could just use Numpy:

In [9]:
def loop_numpy(x,normal_tfe):
    i = 0 # counter
    s,e = timer(),0 # start timer
    while e-s < normal_tfe:
        y = x + np.transpose(x) - np.multiply(x,np.transpose(x)) # same as what we coded in for-loops above
        e = timer()
        i = i + 1
    return i

In [10]:
count_nfes_numpy = loop_numpy(x,normal_tfe)
print('Numpy-vectorized function ran {} times in the amount of time it took to run once in Python.'.format(count_nfes_numpy))

Numpy-vectorized function ran 118 times in the amount of time it took to run once in Python.


However, this process extends to more complex user-defined functions that are written mostly in Numpy, like a watershed model. As long as the functions: 
1. Accept and work in mostly scalars and vectors (not DataFrames, or other objects/classes)
2. Do mostly looping and vector operations
3. Avoid using dictionaries (in general), tuples (in operations), and sets (idk how)
4. Avoid using other Python libraries besides Numpy/Scipy
5. Are called many times
6. Return scalars or vectors, not objects

Then, the application can be sped-up with Numba. This comes at the cost of losing the flexibility of working with various Python objects and syntactical tricks, but most of the common things like if-else, logical comparisons, Python math, and lists/list comprehensions are still useable. 

Numba also has some support for automatic parallelization and kernelization of functions and loops. This means you can automatically distribute code and automatically optimize how the operations are distributed. This is an advantage over other approaches, such as the mpi4py or multiprocessing libraries, where we explicitly define how to parallelize code. Numba can also compile your functions to run on GPUs! But, the basic functionality is in the @jit decorator.
