## Parallel Distributed Processing with `numba` in `Python3`
- `njit` is `nopython jit`
- 8-node cluster, 4 cores per node

#### James Gaboardi

In [1]:
import numpy as np
import math
from numba import njit, vectorize
import multiprocessing as mp
cores = mp.cpu_count()
print('Using up to', cores, 'cores.')

Using up to 32 cores.


### Create 2 `numpy.array` objects

In [2]:
array_dimensions = 15000
np.random.seed(352)
thing_1 = np.random.random((array_dimensions,
                            array_dimensions))
np.random.seed(850)
thing_2 = np.random.random((array_dimensions,
                            array_dimensions))

### Define a function without using `numpy` or the `njit` decorator

In [3]:
def no_njit(a, b):
    return math.sin(a**2) * math.exp(b)

### This will not work:  `TypeError: only length-1 arrays can be converted to Python scalars`

In [4]:
%timeit -o no_njit(thing_1, thing_2)

TypeError: only length-1 arrays can be converted to Python scalars

### So decorate the function with `njit` and change `math` to `numpy` for a 'faster' implementation.

In [5]:
@njit
def njit(a, b):
    return np.sin(a**2) * np.exp(b)

In [6]:
%timeit -o njit(thing_1, thing_2)

1 loop, best of 3: 10.9 s per loop


<TimeitResult : 1 loop, best of 3: 10.9 s per loop>

### In order to take advantage of all the cores at disposal, vectorize the `no_njit` function declare the basic type of variable and the two types of arguments being passed in to `no_njit`, the set the keyword argument `target='parallel`.

In [7]:
vect_no_njit = vectorize('float64(float64, float64)', target='parallel')(no_njit)

### In this example we are utilizing 32 cores, so it is significantly faster the the `njit` function above.

In [8]:
%timeit -o vect_no_njit(thing_1, thing_2)

1 loop, best of 3: 658 ms per loop


<TimeitResult : 1 loop, best of 3: 658 ms per loop>

-----------------