# Compilers

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lukeconibear/swd6_hpp/blob/main/docs/05_compilers.ipynb)

- [CPython](https://www.python.org/)
  - *Ahead-Of-Time (AOT) compiler.*
    - Statically compiled C extensions.
  - General purpose interpreter.
    - Can work on a variety of problems.
  - Dynamically typed.
    - Types can change e.g. `x = 5`, then later `x = 'gary'`.
- [PyPy](https://www.pypy.org/)
  - *Just−In−Time (JIT) compiler (written in Python).*
    - Enables optimisations at run time, especially for numerical tasks with repitition and loops.
    - Replaces CPython.
    - Faster, though overheads for start-up and memory.
    - Helpful when want to speed up numerical opterations in all of code.  
- [Numba](http://numba.pydata.org/)
  - *Uses JIT compiler on functions.*
    - Converts to fast machine code (LLVM).
    - Uses decorators around functions.
    - Use with the default CPython.
    - Helpful when want to speed up numerical opterations in specific functions.  
    - Examples for [NumPy](https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html) and [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html#using-numba).

In [1]:
import numpy as np
from numba import njit

In [2]:
nums = np.arange(1_000_000)

In [3]:
def super_function(nums):
    trace = 0.0
    for num in nums: # loop
        trace += np.cos(num) # numpy
    return nums + trace # broadcasting

In [4]:
%timeit super_function(nums)

1.39 s ± 13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [20]:
@njit # numba decorator
def super_function(nums):
    trace = 0.0
    for num in nums: # loop
        trace += np.cos(num) # numpy
    return nums + trace # broadcasting

The first call of the expression has an overhead to compile the function.

In [21]:
%%timeit -n 1 -r 1
super_function(nums)

225 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


All subsequent calls use this compiled version, and are therefore much faster.

In [22]:
%%timeit -n 1 -r 1
super_function(nums)

73.2 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


- [Cython](https://cython.org/)
  - *Compiles to statically typed C/C++*.
  - Use for any amount of code.
  - Use with the default CPython.
  - Helpful when need static typing.  
  - Examples [not using IPython](https://cython.readthedocs.io/en/latest/src/quickstart/build.html#building-a-cython-module-using-setuptools), [NumPy](https://cython.readthedocs.io/en/latest/src/tutorial/numpy.html), [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html) (example below).

In [7]:
import pandas as pd

In [8]:
df = pd.DataFrame({
    "a": np.random.randn(1000),
    "b": np.random.randn(1000),
    "N": np.random.randint(100, 1000, (1000)),
    "x": "x",
})
df.head()

Unnamed: 0,a,b,N,x
0,-0.923216,-0.254888,974,x
1,0.641271,0.205449,497,x
2,0.254692,0.409151,391,x
3,-0.967972,-1.980833,497,x
4,0.154094,0.510685,480,x


In [9]:
def f(x):
    return x * (x - 1)
   

def integrate_f(a, b, N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
        
    return s * dx

In [10]:
%timeit df.apply(lambda x: integrate_f(x["a"], x["b"], x["N"]), axis=1)

97.3 ms ± 83.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [11]:
%load_ext Cython

The only change below is the addition of the `%%cython` IPython magic command to state that this is a cython cell.

In [13]:
%%cython
def f(x):
    return x * (x - 1)
   

def integrate_f(a, b, N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
        
    return s * dx

In [14]:
%timeit df.apply(lambda x: integrate_f(x["a"], x["b"], x["N"]), axis=1)

52.3 ms ± 83.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [15]:
%%cython
cdef double f(double x) except? -2:                  # adding types
    return x * (x - 1)
   

cpdef double integrate_f(double a, double b, int N): # adding types
    cdef int i                                       # adding types
    cdef double s, dx                                # adding types
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
        
    return s * dx

In [16]:
%timeit df.apply(lambda x: integrate_f(x["a"], x["b"], x["N"]), axis=1)

13.4 ms ± 27.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Further information
[Why is Python slow?](https://youtu.be/I4nkgJdVZFA), Anthony Shaw, PyCon 2020. [CPython Internals](https://realpython.com/products/cpython-internals-book/).