<a href="https://colab.research.google.com/github/keuperj/DataEngineering22/blob/main/week_9/Numba_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numba Demo
* API:  https://numba.pydata.org/numba-doc/latest/index.html


### Implementing a simple function and getting the runtime

In [1]:
import random
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

In [2]:
%%timeit
monte_carlo_pi(10000)

100 loops, best of 5: 4.27 ms per loop


### Now the same thing with NUMBA compilation

In [3]:
from numba import jit
import random

@jit(nopython=True)
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

#NOTE calling the function once before timing so that the compilation time is not included in our comparison
monte_carlo_pi(10000)

3.1336

In [4]:
%%timeit
monte_carlo_pi(10000)

1000 loops, best of 5: 244 µs per loop


### Now with Multi-Threading

In [5]:
#need extra threading lib
!pip install tbb

Collecting tbb
  Downloading tbb-2021.6.0-py2.py3-none-manylinux1_x86_64.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 7.8 MB/s 
[?25hInstalling collected packages: tbb
Successfully installed tbb-2021.6.0


In [6]:
import numba as nb
@jit(nopython=True, parallel=True)
def monte_carlo_pi_parallel(nsamples):
    acc = 0
    for i in nb.prange(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

#NOTE calling the function once before timing so that the compilation time is not included in our comparison
monte_carlo_pi_parallel(10000)



3.16

In [7]:
%%timeit
monte_carlo_pi(10000)

10000 loops, best of 5: 120 µs per loop


not always faster -> overhead of parallelization :-(