# Numba

In [38]:
from numba import jit, njit, prange, vectorize
import random

Let's define a function to compute $\pi$

In [39]:
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

In [3]:
%%time
print(monte_carlo_pi(10000000))

3.14072
CPU times: user 971 ms, sys: 70 μs, total: 971 ms
Wall time: 970 ms


## `@jit` decorator 

The `@jit` decorator is used to compile a Python function in order to speed up the function.

In [21]:
@jit
def monte_carlo_pi_jit(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

First execution (include compilation of the function)

In [22]:
%%time
print(monte_carlo_pi_jit(10000000))

3.1413036
CPU times: user 369 ms, sys: 262 μs, total: 369 ms
Wall time: 367 ms


Second execution (faster execution time)

In [23]:
%%time
print(monte_carlo_pi_jit(10000000))

3.141134
CPU times: user 177 ms, sys: 2.16 ms, total: 179 ms
Wall time: 177 ms


Another execution

In [24]:
%%time
print(monte_carlo_pi_jit(10000000))

3.1413204
CPU times: user 193 ms, sys: 493 μs, total: 193 ms
Wall time: 190 ms


## `@njit` decorator 

The `@njit` decorator is a shorthand for `@jit(nopython=True)`. It is used to generate optimized compiled code that does not require the Python interpreter to execute.

In [25]:
@njit
def monte_carlo_pi_njit(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

First execution (include compilation)

In [26]:
%%time
print(monte_carlo_pi_njit(10000000))

3.1409964
CPU times: user 271 ms, sys: 3.13 ms, total: 274 ms
Wall time: 273 ms


Second execution

In [27]:
%%time
print(monte_carlo_pi_njit(10000000))

3.142288
CPU times: user 194 ms, sys: 1.45 ms, total: 195 ms
Wall time: 192 ms


Another execution

In [28]:
%%time
print(monte_carlo_pi_njit(10000000))

3.1422568
CPU times: user 171 ms, sys: 0 ns, total: 171 ms
Wall time: 168 ms


## Parallelize loops

Numba can parallelize loops to run on multiple CPU cores using the `@njit(parallel=True)` decorator. You must use `prange` instead of range for loops you want to parallelize.

Parallelize simple loop

In [73]:
# Without parallelization
def sum_of_squares(n):
    result = 0
    for i in range(n):
        result += i ** 2
    return result

In [74]:
%%time
print(sum_of_squares(1000000))

333332833333500000
CPU times: user 126 ms, sys: 2.63 ms, total: 129 ms
Wall time: 128 ms


Try to parallelize the function with Numba

In [75]:
# With parallelization
def parallel_sum_of_squares(n):
    # TODO@njit(parallel=True)
def parallel_sum_of_squares(n):
    result = 0
    for i in prange(n):
        result += i ** 2
    return result

Execute the next cell twice

In [77]:
%%time
print(parallel_sum_of_squares(1000000))

333332833333500000
CPU times: user 1.79 ms, sys: 63 μs, total: 1.86 ms
Wall time: 1.32 ms


## `vectorize` decorator

The @vectorize decorator allows a function to be executed in parallel across multiple elements of an array. 

In [78]:
@vectorize(['float64(float64, float64)'], target='parallel')
def parallel_vectorize(x, y):
    return x * y

In [82]:
arr1 = np.random.rand(10000000)
arr2 = np.random.rand(10000000)

In [83]:
%%time
result = parallel_vectorize(arr1, arr2)

CPU times: user 221 ms, sys: 50 ms, total: 271 ms
Wall time: 38 ms


## Using cache

You can tell Numba to cache compiled functions to avoid recompilation. This is done by passing cache=True

In [88]:
arr = np.random.rand(1000000)

In [89]:
@njit
def function(arr):
    return np.sum(arr ** 2)

In [91]:
%%time
function(arr)

CPU times: user 3.2 ms, sys: 0 ns, total: 3.2 ms
Wall time: 3.24 ms


333560.439060744

In [92]:
@njit(cache=True)
def cached_function(arr):
    return np.sum(arr ** 2)

In [94]:
%%time
cached_function(arr)

CPU times: user 3.29 ms, sys: 25 μs, total: 3.31 ms
Wall time: 3.32 ms


333560.439060744