# numba

https://numba.readthedocs.io/en/stable/user/5minguide.html

Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops. The most common way to use Numba is through its collection of decorators that can be applied to your functions to instruct Numba to compile them. When a call is made to a Numba-decorated function it is compiled to machine code “just-in-time” for execution and all or part of your code can subsequently run at native machine code speed!

A JIT compiler runs after the program has started and compiles the code (usually bytecode or some kind of VM instructions) on the fly (or just-in-time, as it's called) into a form that's usually faster, typically the host CPU's native instruction set. A JIT has access to dynamic runtime information whereas a standard compiler doesn't and can make better optimizations like inlining functions that are used frequently.


# numba.jit 
is a function decorator that tells Numba to compile a Python function into native machine code using just-in-time (JIT) compilation. It can be used to speed up the execution of the function by compiling it to machine code, which can be faster than interpreting the Python code.

# numba.njit 
is a function decorator that is similar to numba.jit, but it has stricter requirements for the types of input and output that the function can accept and return. This means that numba.njit may be more restrictive in the types of functions it can compile, but it may also be faster and more memory-efficient than numba.jit

# nopython mode
A Numba compilation mode that generates code that does not access the Python C API. This compilation mode produces the highest performance code, but requires that the native types of all values in the function can be inferred.

# cache
To avoid compilation times each time you invoke a Python program, you can instruct Numba to write the result of function compilation into a file-based cache

# parallel
Enables automatic parallelization (and related optimizations) for those operations in the function known to have parallel semantics.

In [52]:
import pandas as pd
import numpy as np
from numba import njit, jit
from functools import lru_cache , cache

In [3]:
N = 1000000
A_list = np.random.randint(1, 200, N)
B_list = np.random.randint(1, 200, N)
df = pd.DataFrame({'A': A_list, 'B': B_list})
df.head()

Unnamed: 0,A,B
0,23,101
1,32,95
2,57,11
3,74,26
4,179,144


In [30]:
def f(x, y):
    return x + y

In [58]:
@njit
def f_jit(x, y):
    return x + y

@njit(parallel=True)
def f_jit_parallel(x, y):
    return x + y

@njit(cache=True)
def f_jit_cache(x, y):
    return x + y

@njit(cache=True, parallel=True)
def f_jit_cache_p(x, y):
    return x + y



In [46]:
%timeit df['apply'] = df.apply(lambda row: f(row['A'], row['B']), axis=1)

18.2 s ± 95.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [47]:
%timeit f_jit(df['A'].values, df['B'].values) 

3.27 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [48]:
%timeit f_jit_parallel(df['A'].values, df['B'].values) 

1.37 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [49]:
%timeit f_jit_cache(df['A'].values, df['B'].values) 

3.3 ms ± 59.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [50]:
%timeit f_jit_cache_p(df['A'].values, df['B'].values) 

1.37 ms ± 9.32 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [60]:
(18.2 *1000) / 3.27

5565.749235474006

In [59]:
%timeit df['vectorize'] = np.vectorize(f)(df['A'], df['B'])

405 ms ± 3.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [61]:
(18.2 *1000) / 405

44.93827160493827

In [70]:
name_series = pd.Series(np.random.choice(['adam', 'chang', 'eliza', 'odom'], replace=True, size=100000))

def parse_name(name):
    if name.lower().startswith('a'):
        return 'A'
    elif name.lower().startswith('e'):
        return 'E'
    elif name.lower().startswith('i'):
        return 'I'
    elif name.lower().startswith('o'):
        return 'O'
    elif name.lower().startswith('u'):
        return 'U'
    return name

def parse_name_p(name):
    if name.lower().startswith('a'):
        return 'A'
    elif name.lower().startswith('e'):
        return 'E'
    elif name.lower().startswith('o'):
        return 'O'
    elif name.lower().startswith('i'):
        return 'I'
    elif name.lower().startswith('u'):
        return 'U'
    return name

def parse_name_pp(name):
    if name.lower().startswith('c'):
        return name
    elif name.lower().startswith('a'):
        return 'A'
    elif name.lower().startswith('e'):
        return 'E'
    elif name.lower().startswith('o'):
        return 'O'
    elif name.lower().startswith('i'):
        return 'I'
    elif name.lower().startswith('u'):
        return 'U'
    return name
    
@jit(nopython=True)
def parse_name_jit(name):
    if name.lower().startswith('a'):
        return 'A'
    elif name.lower().startswith('e'):
        return 'E'
    elif name.lower().startswith('i'):
        return 'I'
    elif name.lower().startswith('o'):
        return 'O'
    elif name.lower().startswith('u'):
        return 'U'
    return name

@jit(nopython=True,parallel=True)
def parse_name_parallel(name):
    if name.lower().startswith('a'):
        return 'A'
    elif name.lower().startswith('e'):
        return 'E'
    elif name.lower().startswith('i'):
        return 'I'
    elif name.lower().startswith('o'):
        return 'O'
    elif name.lower().startswith('u'):
        return 'U'
    return name

@jit(nopython=True,cache=True)
def parse_name_cache(name):
    if name.lower().startswith('a'):
        return 'A'
    elif name.lower().startswith('e'):
        return 'E'
    elif name.lower().startswith('i'):
        return 'I'
    elif name.lower().startswith('o'):
        return 'O'
    elif name.lower().startswith('u'):
        return 'U'
    return name

In [64]:
%timeit name_series.apply(parse_name)

159 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [65]:
%timeit name_series.apply(parse_name_p)

149 ms ± 831 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [71]:
%timeit name_series.apply(parse_name_pp)

141 ms ± 860 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [67]:
%timeit name_series.apply(parse_name_jit)

747 ms ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [68]:
%timeit name_series.apply(parse_name_parallel)

The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.readthedocs.io/en/stable/user/parallel.html#diagnostics for help.
[1m
File "C:\Users\User\AppData\Local\Temp\ipykernel_51492\90145841.py", line 44:[0m
[1m@jit(nopython=True,parallel=True)
[1mdef parse_name_parallel(name):
[0m[1m^[0m[0m
[0m


734 ms ± 2.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [69]:
%timeit name_series.apply(parse_name_cache)

724 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
