# PyHEP Numba tutorial on February 3, 2021

## Simple stuff

Even if you've used Numba before, let's start with the basics.

In fact, let's start with NumPy itself.

### NumPy

In [1]:
import numpy as np

NumPy accelerates Python code by replacing loops in Python's virtual machine (with type-checks at runtime) with precompiled loops that transform arrays into arrays.

In [2]:
data = np.arange(1000000)
data

array([     0,      1,      2, ..., 999997, 999998, 999999])

In [3]:
%%timeit

output = np.empty(len(data))

for i, x in enumerate(data):
    output[i] = x**2

346 ms ± 20.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [4]:
%%timeit

output = data**2

695 µs ± 8.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


But if you have to compute a complex formula, NumPy would have to _make an array for each intermediate step_.

(There are tricks for circumventing this, but we won't get into that.)

In [5]:
energy = np.random.normal(100, 10, 1000000)
px = np.random.normal(0, 10, 1000000)
py = np.random.normal(0, 10, 1000000)
pz = np.random.normal(0, 10, 1000000)

In [6]:
%%timeit

mass = np.sqrt(energy**2 - px**2 - py**2 - pz**2)

6.56 ms ± 498 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The above is equivalent to

In [7]:
%%timeit

tmp1 = energy**2
tmp2 = px**2
tmp3 = py**2
tmp4 = pz**2
tmp5 = tmp1 - tmp2
tmp6 = tmp5 - tmp3
tmp7 = tmp6 - tmp4
mass = np.sqrt(tmp7)

8.02 ms ± 403 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Numba

(I always mistype "numba" as "numpy"...)

In [8]:
import numba as nb

Numba lets us compile a function to compute a whole formula in one step.

```python
@nb.jit
```

means "JIT-compile" and

```python
@nb.njit
```

means "really JIT-compile" because the original has a fallback mode that's getting deprecated. If we're using Numba at all, we don't want it to fall back to ordinary Python.

In [9]:
@nb.njit
def compute_mass(energy, px, py, pz):
    mass = np.empty(len(energy))
    for i in range(len(energy)):
        mass[i] = np.sqrt(energy[i]**2 - px[i]**2 - py[i]**2 - pz[i]**2)
    return mass

The `compute_mass` function is now "ready to be compiled." It will be compiled when we give it arguments, so that it can propagate types.

In [10]:
compute_mass(energy, px, py, pz)

array([102.52544721,  99.6333991 , 109.71246135, ...,  86.76466324,
        96.45253913,  99.89977107])

In [11]:
%%timeit

mass = compute_mass(energy, px, py, pz)

2.7 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


_(Note to self: show `fastmath` and `parallel`.)_

### Numba performance mistakes

What's wrong with the following?

In [12]:
@nb.njit
def compute_mass_i(energy_i, px_i, py_i, pz_i):
    return np.sqrt(energy_i**2 - px_i**2 - py_i**2 - pz_i**2)

compute_mass_i(energy[0], px[0], py[0], pz[0])

102.52544721062226

In [13]:
%%timeit

mass = np.empty(len(energy))
for i in range(len(energy)):
    mass[i] = compute_mass_i(energy[i], px[i], py[i], pz[i])

789 ms ± 24.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


What do you think about this one?

In [14]:
@nb.njit
def compute_mass_arrays(energy, px, py, pz):
    return np.sqrt(energy**2 - px**2 - py**2 - pz**2)

compute_mass_arrays(energy, px, py, pz)

array([102.52544721,  99.6333991 , 109.71246135, ...,  86.76466324,
        96.45253913,  99.89977107])

In [15]:
%%timeit

mass = compute_mass_arrays(energy, px, py, pz)

2.35 ms ± 180 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


_(Note to self: show `nb.vectorize`.)_

### Dynamically typed programs that are not statically typed

Much of Python's flexibility comes from the fact that it does not need to know the types of all variables before it starts to run.

That dynamism makes it easier to express complex logic (what I call "bookkeeping"), but it is a hurdle for speed. Dynamic typing was the first thing to go in Didier Verna's _How to make LISP go faster than C_.

In [16]:
def perfectly_valid_script(data):
    output = np.empty(len(data))
    for i, group in enumerate(data):
        minimum = "nothing yet"
        for x in group:
            if minimum == "nothing yet":
                minimum = x
            elif x < minimum:
                minimum = x
        output[i] = minimum
    return output

In [17]:
data = np.random.normal(2, 1, (100000, 3))
data

array([[3.15127999, 0.39407033, 2.40792158],
       [2.4719327 , 1.951657  , 1.19690607],
       [1.50390663, 2.6622091 , 1.33609662],
       ...,
       [1.64232371, 1.97757591, 3.66571519],
       [0.45069564, 1.97901781, 0.73951216],
       [4.11026688, 2.44254492, 3.55893654]])

In [18]:
perfectly_valid_script(data)

array([0.39407033, 1.19690607, 1.33609662, ..., 1.64232371, 0.45069564,
       2.44254492])

In [19]:
invalid_for_numba = nb.njit(perfectly_valid_script)

In [20]:
invalid_for_numba(data)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
[1m[1mNo implementation of function Function(<built-in function lt>) found for signature:
 
 >>> lt(float64, Literal[str](nothing yet))
 
There are 22 candidate implementations:
[1m  - Of which 20 did not match due to:
  Overload of function 'lt': File: <numerous>: Line N/A.
    With argument(s): '(float64, unicode_type)':[0m
[1m   No match.[0m
[1m  - Of which 2 did not match due to:
  Operator Overload in function 'lt': File: unknown: Line unknown.
    With argument(s): '(float64, unicode_type)':[0m
[1m   No match for registered cases:
    * (bool, bool) -> bool
    * (int8, int8) -> bool
    * (int16, int16) -> bool
    * (int32, int32) -> bool
    * (int64, int64) -> bool
    * (uint8, uint8) -> bool
    * (uint16, uint16) -> bool
    * (uint32, uint32) -> bool
    * (uint64, uint64) -> bool
    * (float32, float32) -> bool
    * (float64, float64) -> bool[0m
[0m
[0m[1mDuring: typing of intrinsic-call at <ipython-input-16-17299725d0ad> (8)[0m
[1m
File "<ipython-input-16-17299725d0ad>", line 8:[0m
[1mdef perfectly_valid_script(data):
    <source elided>
                minimum = x
[1m            elif x < minimum:
[0m            [1m^[0m[0m


How can we fix it up?

### What is "type unification?"

Consider the following:

In [21]:
def another_valid_script(data):
    output = np.empty(len(data))
    for i, group in enumerate(data):
        total = np.sum(group)
        if total < 0:
            return "we don't like negative sums"
        else:
            output[i] = total
    return output

In [22]:
another_valid_script(data[:10])

array([5.95327189, 5.62049577, 5.50221235, 5.64634391, 4.34770178,
       6.03129462, 3.79146602, 6.91879592, 9.54459418, 4.55619239])

In [23]:
invalid_for_numba = nb.njit(another_valid_script)

In [24]:
invalid_for_numba(data)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
[1mCan't unify return type from the following types: Literal[str](we don't like negative sums), array(float64, 1d, C)
[1mReturn of: IR name '$54return_value.2', type 'Literal[str](we don't like negative sums)', location: [1m
File "<ipython-input-21-dade312fbc5a>", line 6:[0m
[1mdef another_valid_script(data):
    <source elided>
        if total < 0:
[1m            return "we don't like negative sums"
[0m            [1m^[0m[0m[0m
[1mReturn of: IR name '$68return_value.1', type 'array(float64, 1d, C)', location: [1m
File "<ipython-input-21-dade312fbc5a>", line 9:[0m
[1mdef another_valid_script(data):
    <source elided>
            output[i] = total
[1m    return output
[0m    [1m^[0m[0m[0m[0m

Does it matter whether we limit `data` to the first 10 elements?

How can we fix it up?

### Avoid lists and dicts

Although Numba developers are doing a lot of work on supporting Python `list` and `dict` (of identically typed contents), I find it to be too easy to run into unsupported cases. The main problem is that their contents _must_ be typed, but the Python language didn't have ways to express types.

(Yes, Python has type annotations now, but they're not fully integrated into Numba yet.)

Let's start with something that works and make small changes to it.

In [25]:
@nb.njit
def output_a_list(data):
    output = []
    for group in data:
        total = 0.0
        for x in group:
            total += x
        output.append(total)
    return output

In [26]:
data = np.random.normal(2, 1, (10, 3))
data

array([[ 2.74829774e+00,  7.79848689e-01,  2.04009583e+00],
       [-3.71287798e-01,  2.81971845e+00,  2.21857585e+00],
       [ 1.46419160e+00,  2.39163555e+00,  2.14499998e+00],
       [ 3.59957855e+00,  6.12961221e-01,  3.44959895e+00],
       [ 1.98402123e+00, -1.43813068e-03,  9.88503922e-01],
       [ 4.07944930e+00,  1.87490606e+00,  2.04257216e+00],
       [ 8.69135777e-01,  3.23197955e+00,  1.35330466e+00],
       [-3.46567874e-01,  2.91559542e+00,  2.61019679e+00],
       [ 1.48838143e+00,  3.11515648e+00,  1.53754639e+00],
       [ 2.04065633e+00,  1.11114511e+00,  2.16449657e+00]])

In [27]:
output_a_list(data)

[5.5682422577706365,
 4.667006503512932,
 6.000827124822459,
 7.662138716429409,
 2.9710870225622283,
 7.996927518540405,
 5.454419989342256,
 5.179224328118463,
 6.1410842988971,
 5.316298015049456]

### Closures versus arguments

In Python, you can create functions at runtime, and these functions can reference data defined outside of the function.

In [28]:
accumulate = nb.typed.List([])

def yet_another_valid_script(data):
    for group in data:
        total = 0.0
        for x in group:
            total += x
        accumulate.append(total)

In [29]:
accumulate

ListType[Undefined]([])

In [30]:
yet_another_valid_script(data)

In [31]:
accumulate

ListType[float64]([5.5682422577706365, 4.667006503512932, 6.000827124822459, 7.662138716429409, 2.9710870225622283, 7.996927518540405, 5.454419989342256, 5.179224328118463, 6.1410842988971, 5.316298015049456])

In [32]:
try_it_in_numba = nb.njit(yet_another_valid_script)

In [33]:
try_it_in_numba(data)

TypingError: Failed in nopython mode pipeline (step: ensure IR is legal prior to lowering)
[1mThe use of a ListType[float64] type, assigned to variable 'accumulate' in globals, is not supported as globals are considered compile-time constants and there is no known way to compile a ListType[float64] type as a constant.
[1m
File "<ipython-input-28-725ca9e5f929>", line 8:[0m
[1mdef yet_another_valid_script(data):
    <source elided>
            total += x
[1m        accumulate.append(total)
[0m        [1m^[0m[0m
[0m

In [None]:
accumulate