# Speeding up your code using Just-In-Time compilation

Numba is a high-performance 'just-in-time' compiler. If you are not familiar with JIT concept, here is a simple explanation.

Imagine you have a Python function. If you use numba compilation, then when you run that function, it is going to be compiled to machine code. The magic happens next when we call that function the second time: it is not going to be interpreted by the Python interpreter. Instead, it is going to be run as a machine code which is significantly faster. 

As per the manual, Numba likes loops, broadcasting, Numpy. For more info refer to the official documentation: [Link to Numba Documentation](https://numba.pydata.org/numba-doc/latest/index.html)

Let's check this out using a real example below.


In [1]:
# Installing numba first

%pip install numba pandas_datareader

Note: you may need to restart the kernel to use updated packages.


## Comparing runtime

First, let's write a simple script which runs some time consuming operation and execute it to get a baseline performance.

### Example 1. Simple loop

In [2]:
%%writefile main.py

from numba import jit 
import random
import math
import time

def some_function(n):
    z = 0
    for i in range(n):
        x = random.random()
        y = random.random()
        z = math.sqrt(x**2 + y**2)
        
    return z


start = time.time()
some_function(10000000)
end = time.time()

print(end-start)

Overwriting main.py


In [3]:
!python main.py 

3.755401134490967


This simple function execution takes ~3.7 seconds. Now, let's see how JIT will optimize this process. All we need to do is to add '@jit' decorator before function declaration.
Now, there is an option to specify several parameters:
- `nopython` (defaul: `False`). If nopython is `False`, then if for some reason numba will fail to compile given function into machine code, it will use Python interpreter as a fall back option. Some libraries might have some problems with this (e.g. pandas).

In [4]:
%%writefile main.py

from numba import jit 
import random
import math
import time

@jit(nopython=True)
def some_function(n):
    z = 0
    for i in range(n):
        x = random.random()
        y = random.random()
        z = math.sqrt(x**2 + y**2)
        
    return z


start = time.time()
some_function(10000000)
end = time.time()

print(end-start)

Overwriting main.py


In [5]:
!python main.py 

0.3960449695587158
[0m

If we run it second time, it should be even faster as the first time total execution time includes compilation. The second time it won't be compiled. Let's check it out:


In [6]:
!python main.py 

0.38800787925720215
[0m

### Example #2. Numpy 

In [7]:
%%writefile main.py

from numba import jit 
import random
import math
import time
import numpy as np

def some_function(n):
    z = np.zeros((n,n))
    for i in range(n):
        x = np.random.rand(n,n)
        y = np.random.rand(n,n)
        z += np.sqrt(x**2 + y**2)   
    return z


start = time.time()
some_function(1000)
end = time.time()

print(end-start)

Overwriting main.py


In [8]:
!python main.py 

18.17553400993347


In [9]:
%%writefile main.py

from numba import jit 
import random
import math
import time
import numpy as np

@jit(nopython=True)
def some_function(n):
    z = np.zeros((n,n))
    for i in range(n):
        x = np.random.rand(n,n)
        y = np.random.rand(n,n)
        z += np.sqrt(x**2 + y**2)   
    return z


start = time.time()
some_function(1000)
end = time.time()

print(end-start)

Overwriting main.py


In [10]:
!python main.py 

12.464415073394775
[0m

In [11]:
!python main.py 

12.477339267730713
[0m

### Example #3. Pandas 

In [12]:
%pip install pandas_datareader

Note: you may need to restart the kernel to use updated packages.


In [13]:
%%writefile main.py

from numba import jit 
import random
import math
import time
import numpy as np
import pandas as pd
import pandas_datareader as web

#just a meaningless function that does something
def pandas_function(data):
    result = data.sort_values(by=['Volume'])
    result = result.applymap(math.sqrt)
    result += 2
    result = result.applymap(lambda x: x**2)
    result = result.T
    return result

data = web.DataReader('AAPL','stooq')

start = time.time()
pandas_function(data)
end = time.time()

print(end-start)

Overwriting main.py


In [14]:
!python main.py 

0.006792783737182617


In [15]:
%%writefile main.py

from numba import jit 
import random
import math
import time
import numpy as np
import pandas as pd
import pandas_datareader as web

#just a meaningless function that does something
@jit(nopython=True)
def pandas_function(data):
    result = data.sort_values(by=['Volume'])
    result = result.applymap(math.sqrt)
    result += 2
    result = result.applymap(lambda x: x**2)
    result = result.T
    return result

data = web.DataReader('AAPL','stooq')

start = time.time()
pandas_function(data)
end = time.time()

print(end-start)

Overwriting main.py


In [16]:
!python main.py 

Traceback (most recent call last):
  File "main.py", line 23, in <module>
    pandas_function(data)
  File "/Users/narmina/.pyenv/versions/env-3.8.10/lib/python3.8/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/Users/narmina/.pyenv/versions/env-3.8.10/lib/python3.8/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
[1m[1mnon-precise type pyobject[0m
[0m[1mDuring: typing of argument at main.py (13)[0m
[1m
File "main.py", line 13:[0m
[1mdef pandas_function(data):
[1m    result = data.sort_values(by=['Volume'])
[0m    [1m^[0m[0m 

This error may have been caused by the following argument(s):
- argument 0: [1mCannot determine Numba type of <class 'pandas.core.frame.DataFrame'>[0m

[0m

In [17]:
%%writefile main.py

from numba import jit 
import random
import math
import time
import numpy as np
import pandas as pd
import pandas_datareader as web

#just a meaningless function that does something
@jit
def pandas_function(data):
    result = data.sort_values(by=['Volume'])
    result = result.applymap(math.sqrt)
    result += 2
    result = result.applymap(lambda x: x**2)
    result = result.T
    return result

data = web.DataReader('AAPL','stooq')

start = time.time()
pandas_function(data)
end = time.time()

print(end-start)

Overwriting main.py


In [18]:
!python main.py 

Compilation is falling back to object mode WITH looplifting enabled because Function "pandas_function" failed type inference due to: [1m[1mnon-precise type pyobject[0m
[0m[1mDuring: typing of argument at main.py (13)[0m
[1m
File "main.py", line 13:[0m
[1mdef pandas_function(data):
[1m    result = data.sort_values(by=['Volume'])
[0m    [1m^[0m[0m
[0m
  @jit
[1m
File "main.py", line 12:[0m
[1m@jit
[1mdef pandas_function(data):
[0m[1m^[0m[0m
[0m
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
[1m
File "main.py", line 12:[0m
[1m@jit
[1mdef pandas_function(data):
[0m[1m^[0m[0m
[0m
Traceback (most recent call last):
  File "main.py", line 23, in <module>
    pandas_function(data)
NameError: global name '<lambda>' is not defin

It didn't like lambda, let's get rid of it:

In [19]:
%%writefile main.py

from numba import jit 
import random
import math
import time
import numpy as np
import pandas as pd
import pandas_datareader as web

#just a meaningless function that does something
@jit
def pandas_function(data):
    result = data.sort_values(by=['Volume'])
    result = result.applymap(math.sqrt)
    result += 2
    result = result.T
    return result

data = web.DataReader('AAPL','stooq')

start = time.time()
pandas_function(data)
end = time.time()

print(end-start)

Overwriting main.py


In [20]:
!python main.py 

Compilation is falling back to object mode WITH looplifting enabled because Function "pandas_function" failed type inference due to: [1m[1mnon-precise type pyobject[0m
[0m[1mDuring: typing of argument at main.py (13)[0m
[1m
File "main.py", line 13:[0m
[1mdef pandas_function(data):
[1m    result = data.sort_values(by=['Volume'])
[0m    [1m^[0m[0m
[0m
  @jit
[1m
File "main.py", line 12:[0m
[1m@jit
[1mdef pandas_function(data):
[0m[1m^[0m[0m
[0m
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
[1m
File "main.py", line 12:[0m
[1m@jit
[1mdef pandas_function(data):
[0m[1m^[0m[0m
[0m
0.2825660705566406
[0m

As you can see, Numba doesn't work with Pandas.