## Introduction to numba

- Accelerate pure Python code
- JIT compiler
- Easy to use
- Supports some parallelization (YMMV)
- Ability to write GPU code (from Python)
- https://numba.pydata.org
- Cross platform


## How does it work?

- Analyzes your code
- Generates low level machine code
- Uses LLVM (same as some other popular languages)


## Installation

- Use `conda` or `pip`
- Should work on most OSs


## Features

- Can happily use numpy code
- Broadcasting and numpy-style indexing
- Pure Python data structures will not be faster
- Nor will generic Python modules like pandas etc.
- Much easier to write than native GPU code for GPU execution

<br/>

- Ideally suited for numerical computation


## Simple example

- Will try something in Python
- Compare with numba


In [1]:
import numpy as np
import numba

In [2]:
def vaxpb(y, x, a, b):
    y[:] = a*x + np.sin(b)

In [3]:
def axpb(y, x, a, b):
    for i in range(y.shape[0]):
        y[i] = a[i]*x[i] + np.sin(b[i])

## Performance with numpy


In [4]:
def make_data(n):
    x = np.linspace(0, 2*np.pi, n)
    a, b = np.random.random((2, n))
    y = np.zeros_like(x)
    return y, x, a, b

```python
y, x, a, b = make_data(100)
```

In [5]:
y, x, a, b = make_data(1000)

In [6]:
%timeit vaxpb(y, x, a, b)

10.7 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [7]:
%timeit axpb(y, x, a, b)

1.07 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## With numba


In [8]:
@numba.njit
def nvaxpb(y, x, a, b):
    y[:] = a*x + np.sin(b)

In [9]:
def dumb_dec(f):
    print("Haha I got the function 2")
    def _shadow_f(x):
        print("I am called every time!")
        return f(x)
    return _shadow_f

In [10]:
@dumb_dec
def g(x):
    return x + 1

Haha I got the function 2


In [11]:
g(1)

I am called every time!


2

Same as:

In [12]:
nvaxpb = numba.njit(vaxpb)

In [13]:
naxpb = numba.njit(axpb)

In [14]:
%timeit nvaxpb(y, x, a, b)

19.2 µs ± 9.64 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [15]:
%timeit naxpb(y, x, a, b)

11.2 µs ± 610 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


## Some details

- `numba.njit` == `numba.jit(nopython=True)`
- What is nopython?
- Avoid using it


## Parallel computing

- This has been somewhat experimental


In [16]:
from numba import prange
@numba.njit(parallel=True)
def paxpb(y, x, a, b):
    for i in prange(y.shape[0]):
        y[i] = a[i]*x[i] + np.sin(b[i])


- Doesn't work for me!

In [17]:
y, x, a, b = make_data(1000000)

In [18]:
%timeit paxpb(y, x, a, b)

3.74 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]:
@numba.njit(parallel=True)
def pvaxpb(y, x, a, b):
    y[:] = a*x + np.sin(b)

- Works and is very fast.

In [20]:
%timeit pvaxpb(y, x, a, b)

4.51 ms ± 2.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [21]:
@numba.vectorize
def junk(x):
    if x > 0:
        return np.sin(x)
    else:
        return np.cos(x)

In [22]:
x = np.linspace(-1, 1, 100000)

In [23]:
%timeit junk(x)

1.78 ms ± 650 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## More options

- `@vectorize` - numpy ufuncs

- `@jitclass` - for jitted classes

- Many more: see documentation: https://numba.pydata.org

- Possible to get excellent performance with Python
- Use the right tools
