# Using numba package to compile functions to machine code.
Compiling will make our function WAY faster, since machine code is basicaly the language of our processor (CPU).

In [1]:
from numba import njit
import numpy as np

### Let's go for some big array of this shape:

In [2]:

ROWS, COLUMNS = 2000, 2000

### TL;DR
### - We calculate ugly shit and put the result in every cell.



(U really don't want to read this... :) )

We're gonna populate it with a pretty ugly calculations:
- Every cell will contain a number that is:
- a sum of a product of row number cubed and column number squared and a product of row number and column number,
- well... that sounds awfull, so let's write it as an equation.


Let's call row number R, column number C, then what we get is:
- result = ( (R^3) x (C^2) ) + ( RxC ),
- and we put this result of this calculation in every cell, with respect to its row and column number.

### First we do it python default way - with list comprehension - and time the first run:

In [3]:
def f(x,y):
    return [[(i**3) * (j**2) + i*j for j in range(y)] for i in range(x)]
%time res = f(ROWS, COLUMNS)

Wall time: 2.89 s


("Wall time" is just a fancy way to say "time it took your pc to run this piece".)

Let's check how it scores in an average of multiple runs:

In [4]:
%timeit f(ROWS, COLUMNS)

3.08 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Now let's see how it works with a numpy array, and time the first run:

In [5]:
def numpy_f(x,y):
    np_arrray = np.zeros(shape=(x,y), dtype=np.int64)
    for i in range(x):
        for j in range(y):
            np_arrray[i,j] = (i**3) * (j**2) + i*j
    return np_arrray
    
%time res = numpy_f(ROWS, COLUMNS)

Wall time: 3.52 s


Let's check how it scores in an average of multiple runs:

In [6]:
%timeit numpy_f(ROWS, COLUMNS)

3.5 s ± 90.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### And finally let's create compiled version of our function using @njit decorator we imported from Numba. Then we time it:
(Also we're using numpy, because that's how you get crazy times after compiling to machine code.)

In [7]:
@njit
def machine_f(x,y):
    np_arrray = np.zeros(shape=(x,y), dtype=np.int64)
    for i in range(x):
        for j in range(y):
            np_arrray[i,j] = (i**3) * (j**2) + i*j
    return np_arrray

# We could also use the fact that we already created this function before,
# and just create a compiled version of it like this:
# machine_f = njit()(numpy_f)

%time res = machine_f(ROWS, COLUMNS)

Wall time: 624 ms


Notice that the first run wasn't exceptionaly fast, since it compiles the function when you use (call) it for the first time.

And now that it's compiled into a machine code lets check how it scores in an average of multiple runs:

In [8]:
%timeit machine_f(ROWS, COLUMNS)

16.9 ms ± 742 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Wow! That's awesome! It's pretty quick in comparison with the python interpreted code. :D
We went down from around 3 seconds per run with python list to around 16 milliseconds in machine code!

## Conclusion:

It's sometimes worth getting the code into machinve version, but u need a specific circumstances.

If you're just gonna run the function ONCE, it's not really worth the effort. 

But if you're about to use the same, big, bulky function A LOT OF TIMES in your project
it's worth considering compiling it!



#### I u like to 'gotta go fast' check [these performance tips](https://numba.pydata.org/numba-doc/dev/user/performance-tips.html) on Numba documentation webpage.

## It's been fun. :) Have a nice day!