<!--NAVIGATION-->
<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>< [More functionality: SciPy](03.02-SciPy.ipynb) | [Contents](00.00-Index.ipynb) | [Quiz 3](03.04-Quiz.ipynb)></span>

<a href="https://colab.research.google.com/github/eurostat/e-learning/blob/main/python-official-statistics/03.03-Numba-Cython.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

<a id='top'></a>

# More speed: Numba & Cython
## Content  
- [Numba](#numba)
- [Cython](#cython)
- [Numba or Cython](#compare)

Earlier ([Numpy - Vectorized Functions](03.01-Numpy.ipynb#ufunc)) we talked about vectorization (and ufunc), which is one method to improve speed and efficiency in numerical work.  
But vectorization cannot be always the solution.  

<a id='numba'></a>

## Numba
Fortunately, a new Python library called [Numba](http://numba.pydata.org/) solves many of these problems.

It does so through something called **just in time (JIT) compilation**.

The key idea is to compile functions to native machine code instructions on the fly.  
When it succeeds, the compiled code is extremely fast.

Numba is specifically designed for numerical work and can also do other tricks such as [multithreading](https://en.wikipedia.org/wiki/Multithreading_%28computer_architecture%29).

### Example:
(from [numba documentation](http://numba.pydata.org/numba-doc/0.15.1/examples.html))

Suppose we want to write an image-processing function in Python. Here’s how it might look:

In [None]:
def filter_2d(image, filt):
    M, N = image.shape
    Mf, Nf = filt.shape
    Mf2 = Mf // 2
    Nf2 = Nf // 2
    result = numpy.zeros_like(image)
    for i in range(Mf2, M - Mf2):
        for j in range(Nf2, N - Nf2):
            num = 0.0
            for ii in range(Mf):
                for jj in range(Nf):
                    num += (filt[Mf-1-ii, Nf-1-jj] * image[i-Mf2+ii, j-Nf2+jj])
            result[i, j] = num
    return result

This kind of quadruply-nested for-loop is going to be quite slow.  
Using Numba we can compile this code to LLVM which then gets compiled to machine code:

In [None]:
import numpy
from numba import double, jit

fastfilter_2d = jit(double[:,:](double[:,:], double[:,:]))(filter_2d)

# Now fastfilter_2d runs at speeds as if you had first translated
# it to C, compiled the code and wrapped it with Python
image = numpy.random.random((100, 100))
filt = numpy.random.random((10, 10))

%timeit res = filter_2d(image, filt)
%timeit res = fastfilter_2d(image, filt)

Numba actually produces two functions. The first function is the low-level compiled version of filter2d. The second function is the Python wrapper to that low-level function so that the function can be called from Python. The first function can be called from other numba functions to eliminate all python overhead in function calling.

### Decorator Notation

To target a function for JIT compilation we can put `@jit` before the function definition.

Here’s what this looks like for `filter_2d`:

In [None]:
@jit
def filter_2d(image, filt):
    M, N = image.shape
    Mf, Nf = filt.shape
    Mf2 = Mf // 2
    Nf2 = Nf // 2
    result = numpy.zeros_like(image)
    for i in range(Mf2, M - Mf2):
        for j in range(Nf2, N - Nf2):
            num = 0.0
            for ii in range(Mf):
                for jj in range(Nf):
                    num += (filt[Mf-1-ii, Nf-1-jj] * image[i-Mf2+ii, j-Nf2+jj])
            result[i, j] = num
    return result

%timeit res = filter_2d(image, filt)

This is equivalent to `filter_2d = jit(filter_2d)`.

<a id='cython'></a>

## Cython
There are additional options for accelerating Python loops.  

Like [Numba](https://python-programming.quantecon.org/.html),  [Cython](http://cython.org/) provides an approach to generating fast compiled code that can be used from Python.

As was the case with Numba, a key problem is the fact that Python is dynamically typed.  
Numba solves this problem (where possible) by inferring type.

Cython’s approach is different — programmers add type definitions directly to their “Python” code.  
As such, the Cython language can be thought of as Python with type definitions.

In addition to a language specification, Cython is also a language translator, transforming Cython code into optimized C and C++ code.

Cython also takes care of building language extensions — the wrapper code that interfaces between the resulting compiled code and Python.

<a id='compare'></a>

### Numba or Cython
Numba is the simplest one, you must only add some instructions to the beginning of the code and is ready to use. But it has limitations, which are less and less with each version. Actually for some complicated (in Cython) optimizations, Numba provides better results. 

With Cython, you can feel like an ace of optimization. You can have everything under control. It is a little slower to digest since it has many options. For cython it's the learning curve the biggest problem. In a way is like learning a new programming language, based on Python.

<!--NAVIGATION-->
<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>< [Welcome](00.00-Welcome.ipynb) | [Contents](00.00-Index.ipynb) | [Quiz 3](03.04-Quiz.ipynb) > [Top](#top) ^ </span>

<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>This is the Jupyter notebook version of the __Python for Official Statistics__ produced by Eurostat; the content is available [on GitHub](https://github.com/eurostat/e-learning/tree/main/python-official-statistics).
<br>The text and code are released under the [EUPL-1.2 license](https://github.com/eurostat/e-learning/blob/main/LICENSE).</span>