# `numba` Tutorial

REF: https://numba.pydata.org/numba-doc/latest/user/index.html 

In [None]:
# uncomment the below line to install numba
#!conda install numba --yes

In [1]:
import numpy as np
from numba import jit

## Numba's `@jit`

Let's start from some simple python codes without `numba`.

In [None]:
x = np.arange(100_00).reshape(100, 100)

def some_func(a):
    value = 0.0
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            value += np.tanh(a[i,j])
    return value

print(some_func(x)) 


In [None]:
%timeit some_func(x)

Now, let's rewrite it with `numba`.

In [None]:
@jit(nopython=True)
def numba_func(a):
    value = 0.0
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            value += np.tanh(a[i,j])
    return value

print(numba_func(x)) 

In [None]:
%timeit numba_func(x)

The Numba `@jit` decorator fundamentally operates in two compilation modes, `nopython` mode and `object` mode. In the `numba_func` example above, `nopython=True` is set in the `@jit` decorator, this is instructing Numba to operate in `nopython` mode. The behaviour of the `nopython` compilation mode is to essentially compile the decorated function so that it will run entirely without the involvement of the Python interpreter. This is the recommended and best-practice way to use the Numba jit decorator as it leads to the best performance.

Should the compilation in `nopython` mode fail, Numba can compile using `object` mode, this is a fall back mode for the `@jit` decorator if `nopython=True` is not set. In this mode Numba will identify loops that it can compile and compile those into functions that run in machine code, and it will run the rest of the code in the interpreter. For best performance avoid using this mode!

## Exercise 1: N-body

Considering that there are `N=100_000` particles randomly distributed in a 3D Cartesian domain with `-5 < x/y/z < 5`. Use nested `for` loops to calculate the gravitational accerlation ($a = [a_x, a_y, a_z]$) of these particles. Assume the particle mass `m=1` and the gravitational constant `G=1`. 

The gravitational force is 

$\begin{equation}
F=- \frac{GM_1M_2}{r^2},
\end{equation}$

or

$\begin{equation}
F_{21,x/y/z} = - \frac{GM_1M_2}{r^3} r_{21,x/y/z},
\end{equation}$

where $F_{21}$ is the force on the particle 2 (caused by particle 1), 
and $r_{21} = r_2 - r_1$.

1. Pure `python`/`numpy` version with nestest `for` loop.
2. Modify the above code with `numba`'s `@jit`.
3. Compare the performance difference. 

Note: this exercise actually can be done purely with `numpy` functions (without `for` loop).

In [None]:
# TODO:













### `@vectorize`

When we talk about the `ufunc` in `numba`, we had a numba example using the `@vectorize` decorator.

In [None]:
from numba import vectorize, float64

In [None]:
# preparing the list
N = 1_000_000
py_list1 = [x for x in range(N)]
py_list2 = [2.0*x for x in range(N)]
np_list1 = np.arange(N)
np_list2 = 2.0 * np.arange(N)

In [None]:
def python_add(x,y):
    res = []
    for v1,v2 in zip(x,y):
        res.append(v1+v2)
    return res

list3 = python_add(py_list1,py_list2)
print(list3)

In [None]:
%timeit python_add(py_list1,py_list2)

Numpy's vectorize

In [None]:
def np_add(x,y):
    return x+y

numpy_add = np.vectorize(np_add)

In [None]:
%timeit numpy_add(np_list1,np_list2)

Numba's version

In [None]:
@vectorize([float64(float64, float64)])
def numba_add(x,y):
    return x+y

In [None]:
%timeit list3 = numba_add(np_list1,np_list2)

## Automatic parallelization with `@jit`

In [None]:
from numba import njit, prange, set_num_threads

First, we shoule make the problem size slightly bigger.

In [None]:
x = np.arange(25_00_00).reshape(500, 500)

In [None]:
# the numba version without parallel
%timeit numba_func(x)

In [None]:
num_threads = 8 
set_num_threads(int(num_threads))

In [None]:
@njit(parallel=True)
def parallel_numba_func(a):
    value = 0.0
    for i in prange(a.shape[0]):
        for j in prange(a.shape[1]):
            value += np.tanh(a[i,j])
    return value

print(parallel_numba_func(x)) 

In [None]:
%timeit parallel_numba_func(x)

## Exercise 2: N-body

Re-do the Exercise 1 with `njit` and `prange`.

In [None]:
#TODO:











