# Compilers

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lukeconibear/swd6_hpp/blob/main/docs/05_compilers.ipynb)

## [CPython](https://www.python.org/)

CPython is the main Python distribution.

(Not to be confused with Cython, which I'll touch on later).

CPython uses an Ahead-Of-Time (AOT) compiler i.e., the code is compiled in advance.

This is as an assortment of statically compiled C extensions.

CPython is a general purpose interpreter, allowing it to work on a variety of problems.

It is dynamically typed, so types can change as you go.

For example:

In [107]:
# define x to be an integer
x = 5
print(x)

# then define x to be a string
x = 'Gary'
print(x)

5
Gary


## [Numba](http://numba.pydata.org/)

Numba uses a JIT (just-in-time) compiler on functions i.e., compiles the function at execution time.

This converts the function to fast machine code ([LLVM](https://en.wikipedia.org/wiki/LLVM)).

Numba works with the default CPython.

It works by adding decorators around functions.

There are two main modes:
- [`object`](https://numba.readthedocs.io/en/stable/glossary.html#term-object-mode) mode: [`@jit`](https://numba.readthedocs.io/en/stable/user/jit.html)
    - Compiles code that handles all values as Python objects and uses CPython to work on those objects.
    - `@jit` first tries to use `nopython` mode, and if it fails uses `object` mode.
    - Main improvement over CPython is for [loops](https://numba.readthedocs.io/en/stable/glossary.html#term-loop-lifting).
- [`nopython`](https://numba.readthedocs.io/en/stable/glossary.html#term-nopython-mode) mode: `@jit(nopython=True)` aliased as `@njit`.
    - Compiles code that does not access CPython.
    - Highest performance.
    - Requires [specific types](https://numba.readthedocs.io/en/stable/reference/pysupported.html#pysupported) (mainly numbers), otherwise returns error.


Numba is helpful when you want to speed up numerical operations in specific functions.  

For example:

### [`@njit`](https://numba.readthedocs.io/en/stable/glossary.html#term-nopython-mode)

In [108]:
import numpy as np
from numba import njit

First, lets profile an example numerical function without Numba:

In [109]:
nums = np.arange(1_000_000)

In [110]:
def slow_function(nums):
    trace = 0.0
    for num in nums:          # loop
        trace += np.cos(num)  # numpy
    return nums + trace       # broadcasting

In [111]:
%%timeit
slow_function(nums)

900 ms ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Now, lets add the Numba `njit` decorator on the same function:

In [112]:
@njit
def fast_function(nums):
    trace = 0.0
    for num in nums:         # loop
        trace += np.cos(num) # numpy
    return nums + trace      # broadcasting

The first call to the Numba function has an overhead to compile the function.

In [113]:
%%timeit -n 1 -r 1 # -n 1 means execute the statement once, -r 1 means for one repetition
fast_function(nums)

89.1 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


Then, all subsequent calls use this compiled version, which are much faster.

In [114]:
%%timeit -n 1 -r 1
fast_function(nums)

11.1 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


```{admonition} Question
For the function (`fast_add`) below, what will happen when it is called with: `fast_add(1, (2,))`?
```

In [115]:
@njit()
def fast_add(x, y):
    return x + y

```{admonition} Solution
:class: dropdown

A `TypingError` is returned.

This is because Numba is trying to compile a function that adds a `int64` to a `tuple`.

Take care with types.

Ensure the function works without Numba first.

```

### [`@vectorize`](https://numba.readthedocs.io/en/stable/user/vectorize.html#vectorize)

Numba also simplifies the creation of a NumPy ufunc using `vectorize`.

This works on one element at a time.

They can be targeted to different hardware in the signature (i.e., the decorator arguments).

The [default](http://numba.pydata.org/numba-doc/latest/user/vectorize.html#dynamic-universal-functions) target is for a single CPU case (least overhead).

This is suitable for smaller data sizes (<1 KB) and low compute intensities.

For example:

In [132]:
import math
from numba import vectorize

In [133]:
SQRT_2PI = np.float32((2.0 * math.pi)**0.5)

@vectorize
def numba_my_function(x, mean, sigma):
    '''Compute the value of a Gaussian probability density function at x with given mean and sigma.'''
    return math.exp(-0.5 * ((x - mean) / sigma)**2.0) / (sigma * SQRT_2PI)

In [134]:
x = np.random.uniform(-3.0, 3.0, size=1_000_000)

First call to compile:

In [135]:
%%timeit -n 1 -r 1
numba_my_function(x, 0.0, 1.0)

61.7 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


Second call to see how fast it goes:

In [136]:
%%timeit -n 1 -r 1
numba_my_function(x, 0.0, 1.0)

14.7 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### [`@guvectorize`](http://numba.pydata.org/numba-doc/latest/user/vectorize.html#the-guvectorize-decorator)

Extends `vectorize` to work on many input elements (similar to NumPy's gufuncs).

The signature requires the types to be specified first in a list.

For example: `[(int64[:], int64, int64[:])]` means an n-element one-dimensional array of `int64`, a scalar of `int64`, and another n-element one-dimensional array of `int64`.

The signature also includes the input(s) and output(s) dimensions in symbolic form.

For example: `'(n),()->(n)'` means input an n-element one-dimensional array (`(n)`) and a scalar (`()`), and output an n-element one-dimensional array (`(n)`).

In [95]:
from numba import guvectorize

In [96]:
@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + y

In [101]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [102]:
g(x, 5)

array([5, 6, 7, 8, 9])

In [103]:
x = np.arange(6).reshape(2, 3)
x

array([[0, 1, 2],
       [3, 4, 5]])

In [105]:
g(x, 10)

array([[10, 11, 12],
       [13, 14, 15]])

In [106]:
g(x, np.array([10, 20]))

array([[10, 11, 12],
       [23, 24, 25]])

### [`parallel = True`](https://numba.readthedocs.io/en/stable/user/performance-tips.html#parallel-true)

Another target is for a multi-core CPU.

This has small additional overheads for threading.

This suitable for medium data sizes (1 KB - 1 MB).

If code contains operations that are parallelisable (and [supported](https://numba.readthedocs.io/en/stable/user/parallel.html#numba-parallel-supported)) Numba can compile a version that will run in parallel on multiple threads.

This parallelisation is performed automatically and is enabled by simply adding the keyword agurment `parallel=True` to `@njit`.

In [3]:
x = np.arange(1.e7)

First, in serial (i.e., `parallel=False`, also the default):

In [4]:
@njit(parallel=False)
def ident_serial(x):
    return np.cos(x) ** 2 + np.sin(x) ** 2

In [5]:
%%timeit
ident_serial(x)

139 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


And now in parallel:

In [6]:
@njit(parallel=True)
def ident_parallel(x):
    return np.cos(x) ** 2 + np.sin(x) ** 2

In [7]:
%%timeit
ident_parallel(x)



50.1 ms ± 3.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Exercise

...

## Further information

### More information and considerations

- Factor out the performance-critical part of the code for compilation in Numba.
- What data precision is required i.e., is 64-bit needed?
- Numba examples for [NumPy](https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html) and [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html#numba-jit-compilation).
- Custom functions beyond ufuncs ([kernels](https://numba.pydata.org/numba-doc/dev/cuda/kernels.html)).
- [Troubleshooting](https://numba.readthedocs.io/en/stable/user/troubleshoot.html#)

### Other options

- [Cython](https://cython.org/)
  - *Compiles to statically typed C/C++*.
  - Use for any amount of code.
  - Use with the default CPython.
  - Helpful when need static typing and optimising libraries.  
  - Examples [not using IPython](https://cython.readthedocs.io/en/latest/src/quickstart/build.html#building-a-cython-module-using-setuptools), [NumPy](https://cython.readthedocs.io/en/latest/src/tutorial/numpy.html), [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html).
- [PyPy](https://www.pypy.org/)
  - *Just−In−Time (JIT) compiler (written in Python).*
  - Enables optimisations at run time, especially for numerical tasks with repitition and loops.
  - Replaces CPython.
  - Faster, though overheads for start-up and memory.
  - Helpful when want to speed up numerical opterations in all of code. 
  - May not be [compatible](http://packages.pypy.org/) with the libraries you use.

### Resources

- [Why is Python slow?](https://youtu.be/I4nkgJdVZFA), Anthony Shaw, PyCon 2020. [CPython Internals](https://realpython.com/products/cpython-internals-book/).