# Numba

[Numba](https://numba.pydata.org/numba-doc/dev/user/overview.html) is a compiler for Python array and numerical functions that gives you the power to speed up your applications with high performance functions written directly in Python.

Numba generates optimized machine code from pure Python code using the LLVM compiler infrastructure. With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time optimized to performance similar as C, C++ and Fortran, without having to switch languages or Python interpreters.

Numba’s main features are:

* on-the-fly code generation (at import time or runtime, at the user’s preference)

* native code generation for the CPU (default) and GPU hardware

* integration with the Python scientific software stack (thanks to Numpy)

## Compiling Python code with `@jit`

### Lazy compilation
The recommended way to use the `@jit` decorator is to let Numba decide when and how to optimize:

In [8]:
from numba import jit

@jit
def sum(x, y):
    return x + y

%timeit sum(2,2)

210 ns ± 3.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In this mode, compilation will be deferred until the first function execution. Numba will infer the argument types at call time, and generate optimized code based on this information. Numba will also be able to compile separate specializations depending on the input types. For example, calling the `f()` function above with integer or complex numbers will generate different code paths:

### Eager compilation

You can also tell Numba the function signature you are expecting. The function f() would now look like:

In [13]:
@jit('int8(int8,int8)')
def sum(x, y):
    return x + y

%timeit sum(2,2)

202 ns ± 0.169 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In this case, the corresponding specialization will be compiled by the `@jit` decorator, and no other specialization will be allowed. This is useful if you want fine-grained control over types chosen by the compiler (for example, to use single-precision floats).

### Signature specifications

Explicit `@jit` signatures can use a number of types. Here are some common ones:

- `void` is the return type of functions returning nothing (which actually return None when called from Python)
- `intp` and `uintp` are pointer-sized integers (signed and unsigned, respectively)
- `intc` and `uintc` are equivalent to C int and unsigned int integer types
- `int8`, `uint8`, `int16`, `uint16`, `int32`, `uint32`, `int64`, `uint64` are fixed-width integers of the corresponding bit width (signed and unsigned)
- `float32` and `float64` are single- and double-precision floating-point numbers, respectively
- `complex64` and `complex128` are single- and double-precision complex numbers, respectively
- array types can be specified by indexing any numeric type, e.g. `float32[:]` for a one-dimensional single-precision array or `int8[:,:]` for a two-dimensional array of 8-bit integers.

### Compilation options

There are a number of keyword-only arguments can be passed to the `@jit` decorator.

#### nopython

Numba has two compilation modes: `nopython` mode and `object` mode. The former produces much faster code, but has limitations that can force Numba to fall back to the latter. To prevent Numba from falling back, and instead raise an error, pass `nopython=True`.

In [14]:
@jit(nopython=True)
def sum(x, y):
    return x + y

%timeit sum(2,2)

190 ns ± 0.257 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


#### nogil

Whenever Numba optimizes Python code to native code that only works on native types and variables (rather than Python objects), it is not necessary anymore to hold Python’s global interpreter lock (GIL). Numba will release the GIL when entering such a compiled function if you passed `nogil=True`.

 This will not be possible if the function is compiled in `object` mode.

In [15]:
@jit(nogil=True)
def sum(x, y):
    return x + y

%timeit sum(2,2)

249 ns ± 0.259 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


#### cache
To avoid compilation times each time you invoke a Python program, you can instruct Numba to write the result of function compilation into a file-based cache. This is done by passing cache=True:

In [16]:
@jit(cache=True)
def sum(x, y):
    return x + y

%timeit sum(2,2)

191 ns ± 0.563 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


#### parallel

Enables automatic parallelization (and related optimizations) for those operations in the function known to have parallel semantics. For a list of supported operations, see Automatic parallelization with `@jit`. This feature is enabled by passing `parallel=True` and must be used in conjunction with `nopython=True`:

In [19]:
@jit(nopython=True, parallel=True)
def sum(x, y):
    return x + y


## The explicit Matrix mulitplication example now with Numba!

In [22]:
from numba import jit
from random import random
import numpy as np

#what kind of decorator will you put here?
def explicit_matmul(A,B):
    #A[m][n]
    #B[n][p]
    #C[m][p]    
    C_temp = np.zeros((np.shape(A)[0],np.shape(A)[1]))  
    for i in range(np.shape(A)[0]): #(i=1...m) Rows in A
        for j in range(np.shape(B)[1]): # (j=1...p) Columns in B
            for k in range(np.shape(A)[1]): # (k=1...n) Columns in A
                C_temp[i][j] += A[i][k] * B[k][j]
    return(C_temp)

AX=AY=BX=BY=500
print("Multiplying 2 matricies of shape (" +str(AX)+","+str(AY)+")")

A = np.random.rand(AX,AY)
B = np.random.rand(BX,BY)  

%timeit explicit_matmul(A,B)

%timeit np.matmul(A,B)

Multiplying 2 matricies of shape (5,5)
122 µs ± 95.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.21 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
