# Introduction to Python VI: pytorch basics

## Content
- tensors vs arrays
- automatic differentiation

## Prequisites
Visit [pytorch.org](http://pytorch.org) and follow the installation instructions.

## Remember jupyter notebooks
- To run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>.
- To get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch.utils import data

## `torch.Tensor` vs `numpy.ndarray`
We shall see in the next few cells how to create `pytorch`'s main data structure: tensors. We will also see that the syntax is really close to that of `numpy`.

In [None]:
print(torch.ones(3, 5))

In [None]:
print(torch.zeros(4, 2))

In [None]:
print(torch.arange(5))

This is the standard `Tensor`:

In [None]:
a = torch.Tensor()
print(a)
print(a.dim())
print(a.shape)
print(a.type())

It can be initialised with (nested) lists...

In [None]:
a = torch.Tensor([0, 1, 2])
print(a)
print(a.dim())
print(a.shape)
print(a.type())

In [None]:
a = torch.Tensor([[0, 1, 2], [3, 4, 5]])
print(a)
print(a.dim())
print(a.shape)
print(a.type())

... or with `numpy.ndarray` objects:

In [None]:
a = torch.Tensor(np.asarray([[0, 1, 2], [3, 4, 5]]))
print(a)
print(a.dim())
print(a.shape)
print(a.type())

The standard `Tensor` defaults to single precision (32 bit), independent of the initial data.

The `tensor()` function, however, uses the same type as the supplied data:

In [None]:
a = torch.tensor(np.array([[0, 1, 2], [3, 4, 5]]))
print(a)
print(a.dim())
print(a.shape)
print(a.type())

In [None]:
a = torch.tensor(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]))
print(a)
print(a.dim())
print(a.shape)
print(a.type())

The exists a special function to create a `Tensor` from a `numpy.ndarray`:

In [None]:
a = torch.from_numpy(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]], dtype=np.float32))
print(a)
print(a.dim())
print(a.shape)
print(a.type())

And, like in `numpy`, you can change a `Tensor`'s type:

In [None]:
a = torch.from_numpy(np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]])).float()
print(a)
print(a.dim())
print(a.shape)
print(a.type())

A `torch.Tensor` is actually a wrapper around a `numpy.ndarray`:

In [None]:
a = np.arange(6).reshape(-1, 3)
b = torch.from_numpy(a)
print(a)
print(b)

In [None]:
a[:, 1] *= -1
print(a)
print(b)

A type cast, however, disconnects array and tensor:

In [None]:
a = np.arange(6).reshape(-1, 3)
b = torch.from_numpy(a.astype(np.float))
c = torch.from_numpy(a).float()

a[:, 1] = -1

print(a)
print(b)
print(c)

You can use `torch.Tensor`s (nearly) like `numpy.ndarray`s:

In [None]:
a = torch.arange(6).float()

print(a + 1)
print(a - 1)
print(a * 2)
print(a / 2)
print(a // 2)
print(a % 2)
print(a**2)

Note that, unlike arrays, tensors do not change their `dtype`:

In [None]:
a = torch.LongTensor([[1, 2, 3], [4, 5, 6]])
print(a / 2.0)

`pytorch` is **really** strict about using the right data type:

In [None]:
a = torch.Tensor([[1, 2, 3], [4, 5, 6]])
b = torch.Tensor([[1, 2, 3], [4, 5, 6]]).double()

try:
    print(a + b)
except Exception as e:
    print(type(e))
    print(e)

Operating on `numpy.ndarray`s usually creates new objects:

In [None]:
a = np.ones((3, 5))
b = np.exp(a)
print(id(a))
print(id(b))

To make operations **inplace**, we have to make some (small) effort:

In [None]:
b = np.exp(a, out=a)
print(id(a))
print(id(b))

In [None]:
print(id(a))
a[:] = np.exp(a)
print(id(a))

In `pytorch`, the situation is similar:

In [None]:
a = torch.ones(3, 5)
b = torch.exp(a)
print(id(a))
print(id(b))

In [None]:
b = torch.exp(a, out=a)
print(id(a))
print(id(b))

In [None]:
print(id(a))
a[:] = torch.exp(a)
print(id(a))

There are, however, (non-)inplace operations available as methods for `torch.Tensor`s:

In [None]:
print(id(a))
print(id(a.exp_()))

In [None]:
print(id(a))
print(id(a.exp()))

If we have a GPU available, using the `cuda()` method moves a `torch.Tensor` onto the GPU and all subsequent calculations are performed there. With the `cpu()` method, we can get our `torch.Tensor` back from the GPU.

In [None]:
a = torch.randn(1000, 1000)

if torch.cuda.is_available():
    print('We have CUDA!')
    a = a.cuda()
else:
    print('No CUDA :(')

a.exp_().cpu()

What happens, though, if you call `.cuda()` on a `torch.Tensor` without having a CUDA-compatible GPU at your disposal?

In [None]:
try:
    a.cuda()
except Exception as e:
    print(type(e))
    print(e)

Remember the earlier programming exercises  `mean(a)`, `scalar_product(a, b)`, and `linear_regression(x, y)`?

Here, we refactor them for `torch.Tensor`s. We only use methods of already existing `torch.Tensor`s as well as the operators `=`, `-`, `*`, and `/`. For `linear_regression(x, y)` we then use `mean(a)` and `scalar_product(a, b)`:

In [None]:
def mean(a):
    return a.sum().div(len(a))

def scalar_product(a, b):
    return a.mul(b).sum()

def linear_regression(x, y):
    x_mean = mean(x)
    y_mean = mean(y)
    x = x.sub(x_mean)
    y = y.sub(y_mean)
    slope = scalar_product(x, y) / x.pow(2).sum()
    const = y_mean - slope * x_mean
    return slope, const

In [None]:
assert -0.1 < mean(torch.randn(1000)) < 0.1
assert scalar_product(torch.Tensor([0, 1, 2]), torch.Tensor([1, 1, 1])) == 3

x = torch.Tensor([10, 14, 16, 15, 16, 20])
y = torch.Tensor([ 1,  3,  5,  6,  5, 11])
slope, const = linear_regression(x, y)
assert 0.97 < slope < 0.99
assert -9.72 < const < -9.70

We can make `pytorch` track which operations are performed on a `torch.Tensor` by using the `requires_grad=True` parameter:

In [None]:
a = torch.rand(3)
b = torch.rand_like(a, requires_grad=True)
c = torch.sum(a * b**2)
print(a)
print(b)
print(c)

`pytorch` is now able to differentiate `c` with respect to `b`.

Let's make use of that to solve an actual optimisation problem:

In [None]:
def rbf(x, y):
    """Rosenbrock function"""
    return (1 - x)**2 + 100 * (y - x**2)**2


xx, yy = np.meshgrid(np.linspace(-2, 2, 100), np.linspace(-1, 3, 100))
zz = rbf(xx, yy)

fig, ax = plt.subplots(figsize=(5, 5))
ax.contour(xx, yy, zz, np.linspace(51, 2000, 20), colors='k', linewidths=0.1)
ax.contourf(xx, yy, zz, np.linspace(0, 50, 20))
ax.plot([-2, 2], [1, 1], '--', linewidth=1, color='C1')
ax.plot([1, 1], [-1, 3], '--', linewidth=1, color='C1')
ax.set_aspect('equal')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
fig.tight_layout()

We create a starting point and require a gradient for this tensor. Then, we evaluate the Rosenbrock function for this position and obtain the gradient via differentiation of the function at this position. And then, we follow the negative gradient and repeat until we converge to the global minimum.

In short, we locate the minimum of the Rosenbrock function via steepest descent:

In [None]:
xy = torch.tensor([-0.3, 2.8], requires_grad=True)

path, conv = [], []
while True:
    f = rbf(*xy)
    path.append(xy.data.numpy().copy())
    conv.append(f.item())
    if conv[-1] < 0.00001:
        break
    f.backward()
    xy.data.sub_(xy.grad.data.mul_(0.0005))
    xy.grad.zero_()
path = np.asarray(path)

fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].plot(conv)
axes[0].semilogx()
axes[0].semilogy()
axes[0].set_xlabel('steps')
axes[0].set_ylabel('function value')
axes[1].contour(xx, yy, zz, np.linspace(51, 2000, 20), colors='k', linewidths=0.1)
axes[1].contourf(xx, yy, zz, np.linspace(0, 50, 20))
axes[1].plot(*path.T, linewidth=3, color='C3')
axes[1].plot([-2, 2], [1, 1], '--', linewidth=1, color='C1')
axes[1].plot([1, 1], [-1, 3], '--', linewidth=1, color='C1')
axes[1].set_aspect('equal')
axes[1].set_xlabel('$x$')
axes[1].set_ylabel('$y$')
fig.tight_layout()

In [None]:
fig, axes = plt.subplots(2, 4, figsize=(12, 6))
for ax, cut in zip(axes.flat, [2, 4, 11, 101, 1001, 2501, 5001, 10001]):
    ax.contour(xx, yy, zz, np.linspace(51, 2000, 20), colors='k', linewidths=0.1)
    ax.contourf(xx, yy, zz, np.linspace(0, 50, 20))
    ax.plot(*path[:cut].T, '-o', markersize=3, linewidth=1, color='C3')
    ax.plot([-2, 2], [1, 1], '--', linewidth=1, color='C1')
    ax.plot([1, 1], [-1, 3], '--', linewidth=1, color='C1')
    ax.set_aspect('equal')
    ax.text(-1.9, -0.8, f'steps: {cut - 1}', fontsize=15)
    ax.set_axis_off()
fig.tight_layout()