# Introduction to Python V: modules and testing

## Content
- How to write and use modules?
- How tu use testing frameworks?

## Remember jupyter notebooks
- To run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>.
- To get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pytest
from regression import mean
from regression import scalar_product
from regression import linear_regression
from langevin import langevin

## Example I: linear regression

We consider a linear regression problem: an experiment yielded data points $(x_n, y_n)$, $n=1,\dots,N$. In this example, `y_true` refers to the actual truth, but due to experimental inaccuracies, the observed values `y_observed` have some error.

In [None]:
x = np.random.uniform(low=0, high=10, size=500)
y_true = 0.5 * x + 1
y_observed = y_true + np.random.normal(size=x.shape)

plt.scatter(x, y_observed, s=1, label='observation')
plt.plot(x, y_true, color='C1', label='ground truth')
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
plt.legend();

We now use `linear_regression()` as provided by the `regression` module to estimate the best linear model to describe the $(x_n,y_n)$ relation:

In [None]:
plt.scatter(x, y_observed, s=1, label='observation')
plt.plot(x, y_true, color='C1', label='ground truth')
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')

slope, const = linear_regression(x, y_observed)

x_model = np.linspace(x.min(), x.max(), 20)
y_model = slope * x_model + const

plt.plot(x_model, y_model, '--o', color='C2', label='model')
plt.legend()

print(f'model: y = {slope:.3f} * x + {const:.3f}')

### Black box testing

In this part, we try to write tests blindly, i.e., without looking at the actual implementation of `mean()`, `scalar_product()`, and `linear_regression()`. These tests should tell us whether the functions behave as we expect. 

We begin with `mean()`and our only source of information is the function's signature

```Python
mean(a: iterable) -> float
```

and a mathemetical expression

$$\bar{a} = \frac{1}{N}\sum_{n=0}^{N-1} a_n$$

In [None]:
def test_mean():
    assert mean([0]) == 0
    assert mean([0, 0]) == 0
    assert mean([1]) == 1
    assert mean([1, 1]) == 1
    assert mean([float(i + 1) for i in range(100)]) == 50.5


def test_mean_border_cases():
    with pytest.raises(TypeError):
        mean()
    with pytest.raises(TypeError):
        mean(1)
    with pytest.raises(TypeError):
        mean('hello, world')
    assert mean([]) == 0


test_mean()
test_mean_border_cases()

We have located a possible first issue: for an empty iterable, `mean()` raises a `ZeroDivisionError`.

Next, we write unit tests for

```Python
scalar_product(a: iterable, b: iterable) -> float
```

$$\left\langle \mathbf{a},\mathbf{b} \right\rangle = \sum\limits_{n=0}^{N-1} a_n b_n$$

In [None]:
def test_scalar_product_orthogonal():
    assert scalar_product([1, 1], [0, 0]) == 0.0
    assert scalar_product([0, 1], [1, 0]) == 0.0
    assert scalar_product([1, 1], [1, -1]) == 0.0


def test_scalar_product_squared_norm():
    assert scalar_product([1, 1], [1, 1]) == 2.0
    assert scalar_product([3, 4], [3, 4]) == 5**2


def test_scalar_product_border_cases():
    with pytest.raises(TypeError):
        scalar_product()
    with pytest.raises(TypeError):
        scalar_product(1, 1)
    with pytest.raises(TypeError):
        scalar_product([1], 1)
    with pytest.raises(TypeError):
        scalar_product(1, [1])
    with pytest.raises(ValueError):
        scalar_product([1, 1], [1])
    with pytest.raises(ValueError):
        scalar_product([1], [1, 1])
    with pytest.raises(TypeError):
        scalar_product('hello', 'world')
    assert scalar_product([], []) == 0


test_scalar_product_orthogonal()
test_scalar_product_squared_norm()
test_scalar_product_border_cases()

This function seems to be quite predictable.

Finally, we write integration tests (as we reuse `mean()` and `scalar_product()`) for

```Python
linear_regression(x: iterable, y: iterable) -> (float, float)
```

$$\begin{eqnarray*}
\textrm{slope} & = & \frac{\sum_{n=0}^{N-1} \left( x_n - \bar{x} \middle) \middle( y_n - \bar{y} \right)}{\sum_{n=0}^{N-1} \left( x_n - \bar{x} \right)^2} \\[0.5em]
\textrm{const} & = & \bar{y} - \textrm{slope } \bar{x}
\end{eqnarray*}$$

In [None]:
def test_linear_regression():
    slope, const = linear_regression([0, 1], [0, 0])
    assert slope == 0
    assert const == 0
    slope, const = linear_regression([0, 1], [1, 1])
    assert slope == 0
    assert const == 1
    slope, const = linear_regression([0, 1], [1, 0])
    assert slope == -1
    assert const == 1


def test_linear_regression_border_cases():
    with pytest.raises(TypeError):
        linear_regression()
    with pytest.raises(TypeError):
        linear_regression(1)
    with pytest.raises(TypeError):
        linear_regression(1, 1)
    with pytest.raises(TypeError):
        linear_regression(1, [1])
    with pytest.raises(TypeError):
        linear_regression([1], 1)
    with pytest.raises(ValueError):
        linear_regression([1, 1], [1])
    with pytest.raises(ValueError):
        linear_regression([1], [1, 1])
    slope, const = linear_regression([], [])
    assert slope == 0
    assert const == 0


test_linear_regression()
test_linear_regression_border_cases()

And, again, we find the `ZeroDivisionError` raised by `mean()` in the integration test for our `linear_regression()`.

We now must decide how to deal with this situation. Do we keep the current behaviour and deal with the raised exception? Or do we catch this issue within `mean()` and use a sensible fix, e.g., set `mean([])` to zero?

## Example II: a Langevin integrator

The `langevin` module has an equally named function `langevin()`which provides the signature and docstring

```Python
def langevin(
        force, n_steps, x_init, v_init, mass,
        time_step=0.001, damping=0.1, beta=1.0):
    """Langevin integrator for initial value problems

    This function implements the BAOAB algorithm of Benedict Leimkuhler
    and Charles Matthews. See J. Chem. Phys. 138, 174102 (2013) for
    further details.

    Arguments:
        force (function): computes the forces of a single configuration
        n_steps (int): number of integration steps
        x_init (numpy.ndarray(n, d)): initial configuration
        v_init (numpy.ndarray(n, d)): initial velocities
        mass (numpy.ndarray(n)): particle masses
        time_step (float): time step for the integration
        damping (float): damping term, use zero if not coupled
        beta (float): inverse temperature

    Returns:
        x (numpy.ndarray(n_steps + 1, n, d)): configuraiton trajectory
        v (numpy.ndarray(n_steps + 1, n, d)): velocity trajectory
    """
```

In [None]:
def harmonic_potential(x):
    return 0.5 * np.sum(x**2, axis=(-2, -1))


def harmonic_force(x):
    return -x


def kinetic_energy(v, mass):
    return 0.5 * np.sum(v**2 * mass[None, :, None], axis=(-2, -1))


x_init = np.array([[1.0]])
v_init = np.array([[0.0]])
mass = np.array([1.0])

fig, axes = plt.subplots(1, 3, figsize=(10, 4), sharex=True, sharey=True)
for ax, damping in zip(axes.flat, (0, 0.001, 0.005)):
    ax.set_title(f'damping={damping}')
    x, v = langevin(harmonic_force, 3000, x_init, v_init, mass, time_step=0.01, damping=damping)
    ax.scatter(x.reshape(-1), v.reshape(-1), c=np.arange(x.size), s=1)
    ax.set_aspect('equal')
    ax.set_xlabel(r'$x$')
axes[0].set_ylabel(r'$v$')
fig.tight_layout()

**Your task**: design a test suite for the `langevin` module!