# Introduction to Python III: numpy and matplotlib


## Content
- numpy.ndarray creation and usage
- basic plotting with matplotlib

## Remember jupyter notebooks
- To run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>.
- To get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>.

## A notebook "preamble"
The forst code block prepares our notebook by specifying how to render plots and importing two required packages.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

## ndarray: numpy's central data structure

In [None]:
a = list(range(5))
print(a, type(a))

In [None]:
b = np.asarray(a)
print(b, type(b))

In [None]:
a = [[0, 1, 2], [3, 4, 5]]
print(a, type(a))

In [None]:
b = np.asarray(a)
print(b, type(b))

In [None]:
print(b.size)
print(b.ndim)
print(b.shape)
print(b.dtype)

How does `numpy` select the appropriate `dtype`?

In [None]:
a = np.array([0, 1])
print(a, a.dtype)

In [None]:
a = np.array([0, 1, 2.0])
print(a, a.dtype)

In [None]:
a = np.array([0, 1, 2.0, 3+0j])
print(a, a.dtype)

In [None]:
a = np.array([0, 1, 2.0, 3+0j, 'four'])
print(a, a.dtype)

In [None]:
a = np.array([0, 1, 2.0, 3+0j, 'four', None])
print(a, a.dtype)

Creating arrays with "default" values.

In [None]:
a = np.zeros((2, 3, 4), dtype=np.float64)
print(a)
print(a.size, a.ndim, a.shape)

In [None]:
print(np.ones((4, 3, 2), dtype=np.int))

In [None]:
a = np.arange(16)
print(a)

The `shape` can be changed...

In [None]:
a = a.reshape(-1, 4)
print(a)

In [None]:
a.reshape(-1)

... and the `dtype`, too:

In [None]:
a = a.astype(np.float64)
print(a)

You can index like a nested `list`/`tuple`...

In [None]:
print(a[-1][0])

... or via the `numpy` way:

In [None]:
print(a[-1, 0])

Slicing works, too:

In [None]:
print(a[:, 0])
print(a[0, :])

Even for assignments!

In [None]:
a[:, -1] *= -1
print(a)

You can (implicitly) iterate over the first index:

In [None]:
for b in a:
    print(b)

In [None]:
for b in a.T:
    print(b)

Can we add/subtract/... somthing to/from an array?

In [None]:
a = np.arange(5)
print(a)

In [None]:
print(a + 1)

In [None]:
print(a - 1.0)

In [None]:
print(a * 1+0j)

In [None]:
print(a / 2)

In [None]:
print(a // 2)

In [None]:
print(a**2)

In [None]:
print(a % 2)

What about adding/... two arrays?

In [None]:
b = np.ones(a.size) * 2
print(b)

In [None]:
print(a + b)

In [None]:
print(a - b)

In [None]:
print(a * b)

In [None]:
print(a / b)

We can evaluate function on the whole array in one step:

In [None]:
print(np.sqrt(a))

In [None]:
print(np.exp(a))

In [None]:
print(np.log(a + 1))

In [None]:
print(np.sin(a))

Summations/multiplications over the whole array or selected axes are possible:

In [None]:
a = np.ones((3, 5))
print(a)
print(a.sum())
print(a.sum(axis=0))
print(a.sum(axis=1))

In [None]:
a = np.ones((3, 5)) * 2
print(a.prod())
print(a.prod(axis=0))
print(a.prod(axis=1))

In [None]:
a = np.ones((5, 3))
print(np.sqrt(np.sum(a**2, axis=-1)))

In [None]:
print(np.linalg.norm(a, axis=-1))

## Vectorisation
Computing distances can be an expensive task as it is $\mathcal{O}(N^2)$.

In [None]:
def get_distances(coordinates):
    distances = np.zeros((len(coordinates), len(coordinates)))
    for i in range(len(coordinates)):
        for j in range(len(coordinates)):
            distances[i, j] = np.linalg.norm(
                coordinates[i] - coordinates[j],
                axis=-1)
    return distances


coordinates = np.random.rand(1000, 3)
%timeit get_distances(coordinates)

We can, of course, exploit symmetry:

In [None]:
def get_distances2(coordinates):
    distances = np.zeros((len(coordinates), len(coordinates)))
    for i in range(1, len(coordinates)):
        for j in range(i):
            distances[i, j] = np.linalg.norm(
                coordinates[i] - coordinates[j],
                axis=-1)
            distances[j, i] = distances[i, j]
    return distances


%timeit get_distances2(coordinates)

But **vectorisation** is much faster and easier to write:

In [None]:
def get_distances3(coordinates):
    return np.linalg.norm(
        coordinates[:, None, :] - coordinates[None, :, :],
        axis=-1)


%timeit get_distances3(coordinates)

In the above example, we traded loops against higher memory requirement. To see how that works, let's look at what a `None` does for array indexing:

In [None]:
a = np.arange(5)
print(a)

In [None]:
print(a[:, None])

In [None]:
print(a[None, :])

In [None]:
print(a[None, :, None])

In [None]:
a = np.arange(16).reshape(4, -1)
b = a [1:-1, 1:-1]
print(a)
print(b)

In [None]:
b *= -1
print(a)

## Plotting
Let's try to visualise a function:

In [None]:
x = np.linspace(-np.pi, np.pi, 100)
s = np.sin(x)

plt.plot(x, s)
plt.xlabel('$x$ / rad', fontsize=15)
plt.ylabel('$\sin(x)$', fontsize=15)

In [None]:
c = np.cos(x)

plt.plot(x, s, label='sin')
plt.plot(x, c, label='cos')
plt.xlabel('$x$ / rad', fontsize=15)
plt.legend(fontsize=15)

Let's revisit the $\pi$ sampling exercise.

In [None]:
def sample_pi2(n):
    n_hits = np.sum(np.linalg.norm(np.random.rand(n, 2), axis=1) < 1.0)
    return 4.0 * n_hits / n


%timeit sample_pi2(1000)
%timeit sample_pi2(10000)
%timeit sample_pi2(100000)

In [None]:
n_values = [
    10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000]

std = [np.std([sample_pi2(n) for _ in range(10000)])
       for n in n_values]

In [None]:
f = np.sqrt(n_values[0]) * std[0]
inv_sqrt = f / np.sqrt(n_values)

fig, ax = plt.subplots()
ax.plot(n_values, std, linewidth=2, label='data')
ax.plot(n_values, inv_sqrt, 'o', label='model')
ax.fill_between(n_values, 0, std, alpha=0.3)
ax.semilogx()
ax.legend()
ax.set_xlabel(r'sample size for $\pi$ estimation')
ax.set_ylabel(r'standard deviation')
fig.tight_layout()

## Exercise: scalar product

Implement a function

```Python
def scalar_product(a, b):
    pass
```

which implements the scalar product

$$\left\langle \mathbf{a},\mathbf{b} \right\rangle = \sum\limits_{n=0}^{N-1} a_n b_n$$

where $N$ is the number of elements in each $\mathbf{a}$ and $\mathbf{b}$. Both variables `a` and `b` can be `list`s or `tuple`s, and their elements should be numerical (`float` or `int`).

**Bonus**: the function should not return a numerical result if both variables have different lengths or contain non-numerical elements.

In [None]:
def scalar_product(a, b):
    pass

In [None]:
assert scalar_product([0] * 100, [1] * 100) == 0
assert scalar_product([1] * 100, [1, -1] * 50) == 0
assert scalar_product([1] * 100, range(100)) == 99 * 50

## Exercise: arithmetic mean

Implement a function
```Python
def mean(a):
    pass
```
which computes the arithmetic mean of a sequence:

$$\bar{a} = \frac{\sum_{n=0}^{N-1} a_n}{N}$$

where $N$ is the number of elements $a_0,\dots,a_{N-1}$. The parameter `a` may be any type of `iterable` with only numerical elements.

**Bonus**: for a sequence of length 0, e.g., an empty list, the function should return 0.

In [None]:
def mean(a):
    pass

In [None]:
assert mean(range(100)) == 99 * 0.5
assert mean([]) == 0
assert mean([1] * 1000) == 1

## Exercise: linear regression

Implement a function
```Python
def linear_regression(x, y):
    slope = None
    const = None
    return slope, const
```
which performs a simple linear regression

$$\begin{eqnarray*}
\textrm{slope} & = & \frac{\sum_{n=0}^{N-1} \left( x_n - \bar{x} \middle) \middle( y_n - \bar{y} \right)}{\sum_{n=0}^{N-1} \left( x_n - \bar{x} \right)^2} \\[0.5em]
\textrm{const} & = & \bar{y} - \textrm{slope } \bar{x}
\end{eqnarray*}$$

for value pairs $(x_0, y_0),\dots,(x_{N-1},y_{N-1})$. The parameters `x` and `y` may be any type of `iterable` with only numerical elements; both must have the same length.

In [None]:
def linear_regression(x, y):
    slope = None
    const = None
    return slope, const

In [None]:
x = [10, 14, 16, 15, 16, 20]
y = [ 1,  3,  5,  6,  5, 11]
slope, const = linear_regression(x, y)
assert 0.97 < slope < 0.99
assert -9.72 < const < -9.70