# Interacting other languages with Python % Numba

#Numba

• Numba gives you the power to speed up your applications
with high performance functions written directly in Python.

• With a few annotations, array-oriented and math-heavy
Python code can be just-in-time compiled to native
machine instructions, similar in performance to C, C++ and
Fortran, without having to switch languages or Python
interpreters.

• Numba works by generating optimized machine code using
the LLVM compiler infrastructure at import time, runtime,
or statically (using the included pycc tool).

• Numba supports compilation of Python to run on either
CPU or GPU hardware, and is designed to integrate with the
Python scientific software stack.

---

## Example 1

* Without Numba

In [0]:
import numpy as np

def sum_all(A):
    acc = 0.0
    for i in range(A.shape[0]):
        acc += A[i]
    return acc

sample_array = np.arange(10000)

In [4]:
%timeit sum_all(sample_array)

100 loops, best of 3: 2.94 ms per loop


* With Numba

In [0]:
from numba import jit
import numpy as np

#@jit tells numba to compile this function
@jit
def sum_all(A):
    acc = 0.0
    for i in range(A.shape[0]):
        acc += A[i]
    return acc

sample_array = np.arange(10000)

In [6]:
%timeit sum_all(sample_array)

The slowest run took 15711.31 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 15.4 µs per loop


---

## Example 2

* With Numba

In [0]:
from numba import jit
from numpy import arange

@jit
def sum2d(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

a = arange(10000).reshape(100,100)

In [8]:
%timeit sum2d(a)

The slowest run took 6925.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 15.5 µs per loop


* Without Numba

In [0]:
from numpy import arange

def sum2d(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

a = arange(10000).reshape(100,100)

In [10]:
%timeit sum2d(a)

100 loops, best of 3: 3.42 ms per loop


---

## Pi calculation with vectorization

* With Numba

In [3]:
import numpy as np
from time import perf_counter
import matplotlib.pyplot as plt
from numba import jit

@jit
def withVectorization(all):
    inside = 0

    x = np.random.rand(all)
    y = np.random.rand(all)

    inside = np.where((x**2 + y**2)**(0.5) < 1,1,0).sum()

    mypi = 4.0*(inside/all)

    return mypi

calcpi = withVectorization(10**6)
print(calcpi)

3.141456


In [4]:
%timeit withVectorization(10**6)

100 loops, best of 3: 18.5 ms per loop


* Without Numba

In [5]:
import numpy as np
from time import perf_counter
import matplotlib.pyplot as plt

def withVectorization(all):
    inside = 0

    x = np.random.rand(all)
    y = np.random.rand(all)

    inside = np.where((x**2 + y**2)**(0.5) < 1,1,0).sum()

    mypi = 4.0*(inside/all)

    return mypi

calcpi = withVectorization(10**6)
print(calcpi)

3.1414


In [6]:
%timeit withVectorization(10**6)

10 loops, best of 3: 30.7 ms per loop


---

## Pi calculation without vectorization

* Without Numba

In [0]:
def withoutVectorization(all):
    inside = 0

    x = np.random.rand(all)
    y = np.random.rand(all)

    for i in range(all):
        if(x[i]**2 + y[i]**2)**(0.5) < 1: inside -= -1

    mypi = 4.0 * (float(inside)/all)
    
    return mypi

In [10]:
%timeit withoutVectorization(10**6)

1 loop, best of 3: 1.6 s per loop


* With Numba

In [0]:
from numba import jit

@jit
def withoutVectorization(all):
    inside = 0

    x = np.random.rand(all)
    y = np.random.rand(all)

    for i in range(all):
        if(x[i]**2 + y[i]**2)**(0.5) < 1: inside -= -1

    mypi = 4.0 * (float(inside)/all)
    
    return mypi

In [12]:
%timeit withoutVectorization(10**6)

The slowest run took 15.01 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 15.8 ms per loop


---