Numba is a just-in-time compiler (JIT) for Python code focused on NumPy arrays and scientific Python. I've seen various tutorials around the web and in conferences, but I have yet to see someone use Numba "in the wild". In the past few months, I've been using Numba in my own code, and I recently released my first real package using Numba, [skan](). The short version is that Numba is *amazing* and you should strongly consider it to speed up your scientific Python bottlenecks. Read on for the longer version.

## Part 1: some toy examples

Let me illustrate what Numba is good for with the most basic example: adding two arrays together. You've probably seen similar examples around the web.

We start by defining a pure Python function for iterating over a pair of arrays and adding them:

In [1]:
import numpy as np


def addarr(x, y):
    result = np.zeros_like(x)
    for i in range(x.size):
        result[i] = x[i] + y[i]
    return result

How long does this take in pure Python?

In [2]:
n = int(1e6)
a = np.random.rand(n)
b = np.random.rand(n)

In [3]:
%timeit -r 1 -n 1 addarr(a, b)

1 loop, best of 1: 566 ms per loop


About half a second on my machine. Let's try with Numba using its JIT decorator:

In [4]:
import numba

addarr_nb = numba.jit(addarr)

In [5]:
%timeit -r 1 -n 1 addarr_nb(a, b)

1 loop, best of 1: 396 ms per loop


The first time it runs, it's only a tiny bit faster. That's because of the nature of JITs: they only compile code *as it is being run*, in order to use object type information of the objects passed into the function. (Note that, in Python, the arguments `a` and `b` to `addarr` could be anything: an array, as expected, but also a list, a tuple, even a `Banana`, if you've defined such a class, and the meaning of the function body is different for each of those types.)

Let's see what happens the next time we run it:

In [6]:
%timeit -r 1 -n 1 addarr_nb(a, b)

1 loop, best of 1: 5.71 ms per loop


Whoa! Now the code takes 5ms, about 100 times faster than the pure Python version. And the NumPy equivalent?

In [7]:
%timeit -r 1 -n 1 a + b

1 loop, best of 1: 4.92 ms per loop


Only marginally faster than Numba, even though NumPy addition is implemented in highly optimised C code. And, for some data types, Numba even beats NumPy!

In [8]:
r = np.random.randint(0, 128, size=n).astype(np.uint8)
s = np.random.randint(0, 128, size=n).astype(np.uint8)

In [9]:
%timeit -r 1 -n 1 r + s

1 loop, best of 1: 3.96 ms per loop


In [10]:
%timeit -r 1 -n 1 addarr_nb(r, s)

1 loop, best of 1: 201 ms per loop


In [12]:
%timeit -r 1 -n 1 addarr_nb(r, s)

1 loop, best of 1: 261 µs per loop


WOW! For smaller data types, Numba beats NumPy by over 10x!

I'm only speculating, but since my clock speed is about 1GHz (I'm writing this on a base Macbook with a 1.1GHz Core-m processor), I suspect that Numba is taking advantage of some [SIMD]() capabilities of the processor, whereas NumPy is treating each array element as an individual arithmetic operation. (If any Numba or NumPy devs are reading this and have more concrete implementation details that explain this, please share them in the comments!)

So hopefully I've got your attention now. For years, NumPy has been the go-to library for performance Python in scientific computing, and now Numba generally matches that for arbitrary code and sometimes beats it handily!

In this context, I decided to use numba to do something a little less trivial, as part of my research.