# PyTorch

[PyTorch](https://en.wikipedia.org/wiki/PyTorch) is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing. PyTorch provides two high-level features:

- Tensor computing with acceleration via graphics processing units (GPUs)
- Deep neural networks built on a tape-based automatic differentiation system

In [1]:
import torch

## Tensors

The central data abstraction in PyTorch is given by the `torch.tensor` class. It represents the counterpart of the `numpy.ndarray` class in NumPy, and many of the respective class methods have similar syntax.

### Tensor creation

Ways to create PyTorch tensors include:

- `torch.tensor()`
- `torch.empty()`
- `torch.zeros()`
- `torch.ones()`
- `torch.rand()`

In [2]:
a = torch.rand(3, 3, dtype=torch.float32)

In [3]:
print(a)

tensor([[0.2332, 0.4575, 0.9422],
        [0.1645, 0.8829, 0.9889],
        [0.0236, 0.0347, 0.6368]])


By default, PyTorch tensors are populated with 32-bit (single precision) floating point numbers suitable for arithmetic operations on GPUs, but many other data types are available and include:

- `torch.bool`
- `torch.int8`
- `torch.int16`
- `torch.int32`
- `torch.int64`
- `torch.half` or `torch.float16`
- `torch.float`
- `torch.double` or `torch.float64`

A PyTorch tensor can be converted to a regular Python list.

In [4]:
a.tolist()

[[0.23320060968399048, 0.4574585556983948, 0.9422464370727539],
 [0.16446655988693237, 0.8828853368759155, 0.9889333844184875],
 [0.02362501621246338, 0.03473007678985596, 0.6367791295051575]]

Conversely, a Python list can be converted to a PyTorch tensor.

In [5]:
torch.tensor(a.tolist())

tensor([[0.2332, 0.4575, 0.9422],
        [0.1645, 0.8829, 0.9889],
        [0.0236, 0.0347, 0.6368]])

### Tensor operations

PyTorch tensors have over three hundred operations that can be performed on them, including:

- `torch.abs()`
- `torch.max()`
- `torch.mean()`
- `torch.std()`
- `torch.prod()`
- `torch.unique()`
- `torch.matmul()`
- `torch.svd()`
- `torch.sin()`
- `torch.cos()`
- `torch.flatten()`

In [6]:
a.mean()

tensor(0.4849)

Note that a tensor with a scalar number is given in return. To instead get a Python number in return, we can perform

In [7]:
a.mean().item()

0.4849250018596649

### NumPy bridge

In [8]:
import numpy as np

In [9]:
np_array = np.ones((2, 3))
pth_tensor = torch.from_numpy(np_array)

In [10]:
print(pth_tensor)

tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)


We note that the NumPy array default data type of float64 (double precision) is preserved. In fact, we merely created a pointer to the same data in memory such that a change in one object is reflected in both.

In [11]:
np_array[1, 2] = 2

In [12]:
print("Modified numpy array:\n", np_array)
print("Bridged pytorch tensor:\n", pth_tensor)

Modified numpy array:
 [[1. 1. 1.]
 [1. 1. 2.]]
Bridged pytorch tensor:
 tensor([[1., 1., 1.],
        [1., 1., 2.]], dtype=torch.float64)


A reason to create a bridge between data can e.g. be to take advantage of the easy accessible  GPU acceleration available in PyTorch for scientific codes developed with NumPy. 

## Neural networks

The machine learning models in PyTorch are built as neural networks with layers of neurons. The input layers received input data; hidden layers transforms the data; and the output layer provides the results upon which model predictions are made.

Every neuron has an associated activation level. The input level apart, activation levels in a given level, say $L$, are determined from those in the previous layer by use of weights that are collected in a matrix $\boldsymbol{W}$ and biases that are collected in a row vector $\boldsymbol{b}$. 

A layer is referred to as *linear* if the weights and biases are applied in a linear transformation

$$
\boldsymbol{a}^{(L)} = f(\boldsymbol{a}^{(L-1)} \boldsymbol{W}^T 
+ \boldsymbol{b})
$$

As indicated, to get the final activation levels also involves the elementwise operation of a  (typically) nonlinear activation function, $f$.

![Neural Network](../images/neural_network.svg)

### Flatten tensors

PyTorch receives input data in the form of batches of PyTorch tensors of rank 1. If data is stored as tensors of higher rank, they first need to be flattened. 

Let us assume having three $2\times 2$ tensors as input, e.g. three greyscale images of two-by-two pixels.

In [13]:
batch_size = 3

tensor_batch = torch.rand(batch_size, 2, 2)

In [14]:
flatten = torch.nn.Flatten()

In [15]:
a0 = flatten(tensor_batch)

These have now been flattened to become three row vectors of dimension four.

In [16]:
print(a0)

tensor([[0.6447, 0.9236, 0.2894, 0.3067],
        [0.6588, 0.4814, 0.0097, 0.1238],
        [0.0132, 0.8080, 0.1293, 0.4597]])


### Layer transformations

The linear layer transformation described above is achieved with the `torch.nn.Linear` class.

In [17]:
linear = torch.nn.Linear(4, 2, bias=True)

Here we consider a transformation from an input layer with four neurons, $n_0 = 4$, to a hidden layer with only two, $n_1 = 2$.

When instantiated, the hidden layer object receives weight and bias attributes that are initialized randomly with values

$$
-1/\sqrt{n_0} < w_{ij}, b_i < 1 / \sqrt{n_0}
$$

In [18]:
linear.weight

Parameter containing:
tensor([[ 0.2367,  0.4125,  0.0317, -0.1664],
        [-0.0286, -0.0914, -0.1212, -0.2656]], requires_grad=True)

In [19]:
linear.bias

Parameter containing:
tensor([0.0288, 0.2220], requires_grad=True)

Use PyToch to perform the layer transoformation.

In [20]:
linear(a0)

tensor([[0.5205, 0.0026],
        [0.3630, 0.1251],
        [0.2928, 0.0100]], grad_fn=<AddmmBackward0>)

Check the transformation with an explicit calculation of the linear transformation:

$$
\boldsymbol{a}^{(0)} \boldsymbol{W}^T 
+ \boldsymbol{b}
$$

In [21]:
torch.matmul(a0, linear.weight.T) + linear.bias

tensor([[0.5205, 0.0026],
        [0.3630, 0.1251],
        [0.2928, 0.0100]], grad_fn=<AddBackward0>)

We note that the two results are identical.

Now remains the application of the nonlinear activation function, $f$, according to

$$
\boldsymbol{a}^{(1)} =f( 
\boldsymbol{a}^{(0)} \boldsymbol{W}^T 
+ \boldsymbol{b})
$$

A common choice in machine learning is to adopt the rectifier linear unit function

$$
\mathrm{ReLU}(x) = \max(0,x) = \frac{x + |x|}{2}
$$

In [22]:
relu = torch.nn.ReLU()

In [23]:
a1 = relu(linear(a0))

In [24]:
a1

tensor([[0.5205, 0.0026],
        [0.3630, 0.1251],
        [0.2928, 0.0100]], grad_fn=<ReluBackward0>)

The effect of the ReLU function is as anticipated.

## Loss function

In the process of training the network, we need a measure of closeness between the prediction in the output layer and the correct result. This measure is given by a *loss function*. Several [loss functions are available in PyTorch](https://pytorch.org/docs/stable/nn.html#loss-functions) for different purposes. In binary classification networks, the `CrossEntropyLoss()` is a typical choice.

In [26]:
loss_func = torch.nn.CrossEntropyLoss()

Let us assume that we have four classes in the output layer and that we are concerned with a specific item in the data set for which the correct answer is class number three.

In [93]:
correct_answer = torch.tensor([0.0, 0.0, 1.0, 0.0])

Let us further assume that we have made two separate predictions (one good and one bad) in the output layer leading to the following activity levels. 

In [99]:
good_prediction = torch.tensor([0.2, 0.5, 3.1, -0.1])

bad_prediction = torch.tensor([2.0, 2.5, 1.1, -0.5])

The associated loss function values (errors) are given by:

In [100]:
print("good prediction loss =", loss_func(good_prediction, correct_answer))
print("bad prediction loss  =", loss_func(bad_prediction, correct_answer))

good prediction loss = tensor(0.1571)
bad prediction loss  = tensor(2.0434)


As expected, the error is deemed much larger for the bad prediction.

Let us see how PyTorch came this conclusion. 

In a first step, the predictions are exponentialized, promoting large positive numbers.

In [101]:
good_p1 = torch.exp(good_prediction)
bad_p1 = torch.exp(bad_prediction)

print("step 1: good prediction loss =", good_p1)
print("step 1: bad prediction loss  =", bad_p1)

step 1: good prediction loss = tensor([ 1.2214,  1.6487, 22.1979,  0.9048])
step 1: bad prediction loss  = tensor([ 7.3891, 12.1825,  3.0042,  0.6065])


In a second step, a normalization is performed.

In [102]:
good_p2 = good_p1 / good_p1.sum()
bad_p2 = bad_p1 / bad_p1.sum()

print("step 2: good prediction loss =", good_p2)
print("step 2: bad prediction loss  =", bad_p2)

step 2: good prediction loss = tensor([0.0470, 0.0635, 0.8547, 0.0348])
step 2: bad prediction loss  = tensor([0.3187, 0.5255, 0.1296, 0.0262])


In a third step, we take the negative logarithm so that a values close to one become close to zero (low loss).

In [103]:
good_p3 = -torch.log(good_p2)
bad_p3 = -torch.log(bad_p2)

print("step 3: good prediction loss =", good_p3)
print("step 3: bad prediction loss  =", bad_p3)

step 3: good prediction loss = tensor([3.0571, 2.7571, 0.1571, 3.3571])
step 3: bad prediction loss  = tensor([1.1434, 0.6434, 2.0434, 3.6434])


In a forth step, we pick out the loss for the binary correct answer by means of a product.

In [104]:
good_p4 = good_p3 * correct_answer
bad_p4 = bad_p3 * correct_answer

print("step 4: good prediction loss =", good_p4)
print("step 4: bad prediction loss  =", bad_p4)

step 4: good prediction loss = tensor([0.0000, 0.0000, 0.1571, 0.0000])
step 4: bad prediction loss  = tensor([0.0000, 0.0000, 2.0434, 0.0000])


In a fifth step, a summation is performed to produce a scalar loss value.

In [105]:
print("good prediction loss =", good_p4.sum())
print("bad prediction loss  =", bad_p4.sum())

good prediction loss = tensor(0.1571)
bad prediction loss  = tensor(2.0434)


We note that the resulting losses are identical to those obtained with the PyTorch loss function.

## Backward propagation