# Advanced programming: performance

https://hackmd.io/@fmaussion/rJlfAkJm5

## The rules of optimization


1. **Don't do it.**
2. **(For experts only) Don't do it yet.**

Inspired from famous internet programmer knowledge.

### Rules of thumb

<img src="https://imgs.xkcd.com/comics/is_it_worth_the_time_2x.png" alt="xkcd" width="400"/>

### If you really really want to do it anyway

1. **Write tests**
2. **Profile before optimizing**

Writing tests is important to keep the code running and making sure that it works the same way after optimizing. At the very least, keep a copy of the non-optimized code and test your optimized code against it.

Profiling helps to focus on what really matters. You might find the results of profiling very surprising!

## Some optimisation stories from Fabien

- vectorization instead of for-loops
- from fortran to c (vice versa): change the order of accessing rows and columns
- conversion between xarray/numpy arrays etc. is expensive - avoid it if you need performance
- writing functions instead of hard-coding every single step multiple times
- opening the same file every time in a for-loop instead of doing it once before starting the for-loop
- think of writing your own functions for routines you are using a lot instead of potentially slow package functions (e.g. `np.average`). See examples below.

## Profiling code

- default: https://docs.python.org/3/library/profile.html
- better: https://github.com/benfred/py-spy

## Micro optimisation examples

### Back to basics xarray -> numpy -> own functions 

Because of the "generalization overhead" (certain numpy and xarray functions are slow because they are meant to be general and do a lot of input checks.

If you know what your input looks like and have control over it, you can speed-up operations quite a bit.

#### xarray and pandas are slow on arithmetics 

In [2]:
import xarray as xr
import numpy as np

da = xr.DataArray(np.random.uniform(240, 300))

In [3]:
%timeit da**2

31 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [4]:
%timeit da.data**2

922 ns ± 9.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


*Question to class: why?*

#### numpy is also slow in certain situations

In [5]:
d = np.arange(1, 1000)
w = d + 2

In [6]:
%timeit np.average(d, weights=w)

20.3 µs ± 993 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [7]:
def my_avg(d, wgt):
    return np.multiply(d, wgt).sum() / wgt.sum()

In [8]:
%timeit my_avg(d, w)

5.35 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


*Question to class: why?*

See also (similar story): https://github.com/numpy/numpy/issues/14281

**As for ALL cases of optimisation: please be aware of the risks! Make a careful gain / costs analysis.**

#### Array creation is costly 

In [44]:
dx = 100
surface_h = np.linspace(3000, 1000, nx)

def my_grad(surface_h, dx):
    gradient = np.zeros(surface_h.shape)
    gradient[1:nx-1] = (surface_h[2:] - surface_h[:nx-2])/(2*dx)
    gradient[[-1, 0]] = 0
    return gradient

def grad_np(surface_h, dx):
    gradient = np.gradient(surface_h, dx)
    gradient[[-1, 0]] = 0
    return gradient

In [45]:
np.testing.assert_allclose(grad_np(surface_h, dx), my_grad(surface_h, dx))

In [46]:
%timeit grad_np(surface_h, dx)

21 µs ± 2.06 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [47]:
%timeit my_grad(surface_h, dx)

5.51 µs ± 53.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


#### Vectorization is awesome

Example salem MetPy