# CLASSIFICATION: UNCLASSIFIED
# UNDER CONSTRUCTION

Numba uses just-in-time (JIT) compilation to greatly speed up functions written in python. It works especially well functions heavy on array and mathematical operations and is said to be comparable to performance in C, C++ and Fortran. 

We will first look at a simple example of how to sum a 2-dimensional array.

Let's suppose we have an image and want to apply a filter to it

In [1]:
import numpy as np
image = np.arange(256*256).reshape(256,256)
filt = np.arange(50*50).reshape(50,50)

In [2]:
%%time 
def filter2d(image, filt):
    M, N = image.shape
    Mf, Nf = filt.shape
    Mf2 = Mf // 2
    Nf2 = Nf // 2
    result = np.zeros_like(image)
    
    for i in range(Mf2, M - Mf2):
        for j in range(Nf2, N - Nf2):
            num = 0.0
            for ii in range(Mf):
                for jj in range(Nf):
                    num += (filt[Mf-1-ii,Nf-1-jj]*image[i-Mf+ii,j-Nf2+jj])
            result[i,j] = num
    return result
    

filter2d(image, filt)

CPU times: user 1min 46s, sys: 0 ns, total: 1min 46s
Wall time: 1min 46s


Now we do the same thing except add a jit decorator.

In [3]:
from numba import jit

In [4]:
%%time 

@jit
def filter2d_jit(image, filt):
    M, N = image.shape
    Mf, Nf = filt.shape
    Mf2 = Mf // 2
    Nf2 = Nf // 2
    result = np.zeros_like(image)
    
    for i in range(Mf2, M - Mf2):
        for j in range(Nf2, N - Nf2):
            num = 0.0
            for ii in range(Mf):
                for jj in range(Nf):
                    num += (filt[Mf-1-ii,Nf-1-jj]*image[i-Mf+ii,j-Nf2+jj])
            result[i,j] = num
    return result


CPU times: user 24.9 ms, sys: 14.1 ms, total: 39 ms
Wall time: 39.3 ms


In [6]:
%%time 

filter2d_jit(image, filt);

CPU times: user 280 ms, sys: 10 µs, total: 280 ms
Wall time: 279 ms


array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

On an moderately sized image, this is 400 times faster.

Numba can also be used to vectorize functions. We will consider the function $$f(x) = \frac{\sin(\pi x)}{\pi x}$$ which is defined to be 1 when $x=0$. Note in this example, you could use np.sinc or define the function with np functions. However, there are many cases where the function does not exist in numpy.

In [29]:
from numba import vectorize
import math

Here is the unvectorized version of it, which we must apply to each element of our vector

In [30]:
def sinc_slow(x):
    if x == 0:
        return 1.0
    return math.sin(x*math.pi) / (x*math.pi)

In [31]:
%%time
np.array([sinc_slow(x) for x in np.random.random(10000)])

CPU times: user 16.2 ms, sys: 6 µs, total: 16.2 ms
Wall time: 14.8 ms


array([ 0.31738475,  0.69071155,  0.0101941 , ...,  0.03863917,
        0.98279504,  0.54119015])

Now, we vectorize it using the vectorize decorator. We also include a signature.

float32(float32) means that the function is returning a float32 and accepting a float32 argument - the first float32 is the return type and multiple argumnets can go in the paranetheses. See the numba documentation for a full list of options and their short forms.

Having an array of multiple signatures will get numba to choose the most efficient one. The signatures should be ordered from simplest data type to most complex (e.g. int8 then higher ints, then floats). 

In [33]:
@vectorize(['float32(float32)','float64(float64)'])
def sinc(x):
    if x == 0:
        return 1.0
    return math.sin(x*math.pi) / (x*math.pi)

In [34]:
%%time
sinc(np.random.random(10000))

CPU times: user 880 µs, sys: 1.01 ms, total: 1.89 ms
Wall time: 3.23 ms


array([ 0.31922822,  0.99091863,  0.7181361 , ...,  0.05047162,
        0.52803751,  0.17092169])

Even for a simple function, this is about five times faster.