# PyTorch

[PyTorch](https://en.wikipedia.org/wiki/PyTorch) is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing. PyTorch provides two high-level features:

- Tensor computing with acceleration via graphics processing units (GPUs)
- Deep neural networks built on a tape-based automatic differentiation system

In [1]:
import torch

## Tensors

The central data abstraction in PyTorch is given by the `torch.tensor` class. It represents the counterpart of the `numpy.ndarray` class in NumPy, and many of the respective class methods have similar syntax.

### Tensor creation

Ways to create PyTorch tensors include:

- `torch.tensor()`
- `torch.empty()`
- `torch.zeros()`
- `torch.ones()`
- `torch.rand()`

In [2]:
a = torch.rand(3, 3, dtype=torch.float32)

In [3]:
print(a)

tensor([[0.1264, 0.2128, 0.4394],
        [0.1689, 0.6541, 0.3996],
        [0.9370, 0.2361, 0.0902]])


By default, PyTorch tensors are populated with 32-bit (single precision) floating point numbers suitable for arithmetic operations on GPUs, but many other data types are available and include:

- `torch.bool`
- `torch.int8`
- `torch.int16`
- `torch.int32`
- `torch.int64`
- `torch.half` or `torch.float16`
- `torch.float`
- `torch.double` or `torch.float64`

A PyTorch tensor can be converted to a regular Python list.

In [4]:
a.tolist()

[[0.12643754482269287, 0.2127731442451477, 0.4393724799156189],
 [0.16894280910491943, 0.6540544033050537, 0.39956939220428467],
 [0.9369684457778931, 0.23605263233184814, 0.09021276235580444]]

Conversely, a Python list can be converted to a PyTorch tensor.

In [5]:
torch.tensor(a.tolist())

tensor([[0.1264, 0.2128, 0.4394],
        [0.1689, 0.6541, 0.3996],
        [0.9370, 0.2361, 0.0902]])

### Tensor operations

PyTorch tensors have over three hundred operations that can be performed on them, including:

- `torch.abs()`
- `torch.max()`
- `torch.mean()`
- `torch.std()`
- `torch.prod()`
- `torch.unique()`
- `torch.matmul()`
- `torch.svd()`
- `torch.sin()`
- `torch.cos()`
- `torch.flatten()`

In [6]:
a.mean()

tensor(0.3627)

Note that a tensor with a scalar number is given in return. To instead get a Python number in return, we can perform

In [7]:
a.mean().item()

0.36270925402641296

### NumPy bridge

In [8]:
import numpy as np

In [9]:
np_array = np.ones((2, 3))
pth_tensor = torch.from_numpy(np_array)

In [10]:
print(pth_tensor)

tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)


We note that the NumPy array default data type of float64 (double precision) is preserved. In fact, we merely created a pointer to the same data in memory such that a change in one object is reflected in both.

In [11]:
np_array[1, 2] = 2

In [12]:
print("Modified numpy array:\n", np_array)
print("Bridged pytorch tensor:\n", pth_tensor)

Modified numpy array:
 [[1. 1. 1.]
 [1. 1. 2.]]
Bridged pytorch tensor:
 tensor([[1., 1., 1.],
        [1., 1., 2.]], dtype=torch.float64)


A reason to create a bridge between data can e.g. be to take advantage of the easy accessible  GPU acceleration available in PyTorch for scientific codes developed with NumPy. 

## Neural networks

The machine learning models in PyTorch are built as neural networks with layers of neurons. The input layers received input data; hidden layers transforms the data; and the output layer provides the results upon which model predictions are made.

Every neuron has an associated activation level. The input level apart, activation levels in a given level, say $L$, are determined from those in the previous layer by use of weights that are collected in a matrix $\boldsymbol{W}$ and biases that are collected in a row vector $\boldsymbol{b}$. 

A layer is referred to as *linear* if the weights and biases are applied in a linear transformation

$$
\boldsymbol{a}^{(L)} = f(\boldsymbol{a}^{(L-1)} \boldsymbol{W}^T 
+ \boldsymbol{b})
$$

As indicated, to get the final activation levels also involves the elementwise operation of a  (typically) nonlinear activation function, $f$.

![Neural Network](../images/neural_network.svg)

### Flatten tensors

PyTorch receives input data in the form of batches of PyTorch tensors of rank 1. If data is stored as tensors of higher rank, they first need to be flattened. 

Let us assume having three $2\times 2$ tensors as input, e.g. three greyscale images of two-by-two pixels.

In [13]:
batch_size = 3

tensor_batch = torch.rand(batch_size, 2, 2)

In [14]:
flatten = torch.nn.Flatten()

In [15]:
a0 = flatten(tensor_batch)

These have now been flattened to become three row vectors of dimension four.

In [16]:
print(a0)

tensor([[0.7347, 0.9150, 0.3196, 0.1653],
        [0.3985, 0.5302, 0.5364, 0.3005],
        [0.2823, 0.5272, 0.7644, 0.2547]])


### Layer transformations

The linear layer transformation described above is achieved with the `torch.nn.Linear` class.

In [31]:
linear = torch.nn.Linear(4, 2, bias=True)

Here we consider a transformation from an input layer with four neurons, $n_0 = 4$, to a hidden layer with only two, $n_1 = 2$.

When instantiated, the hidden layer object receives weight and bias attributes that are initialized randomly with values

$$
-1/\sqrt{n_0} < w_{ij}, b_i < 1 / \sqrt{n_0}
$$

In [32]:
linear.weight

Parameter containing:
tensor([[ 0.2979, -0.1393, -0.1280,  0.2627],
        [ 0.0665,  0.0518,  0.2140,  0.1802]], requires_grad=True)

In [33]:
linear.bias

Parameter containing:
tensor([ 0.3289, -0.3964], requires_grad=True)

Use PyToch to perform the layer transoformation.

In [34]:
linear(a0)

tensor([[ 0.4229, -0.2020],
        [ 0.3840, -0.1735],
        [ 0.3086, -0.1409]], grad_fn=<AddmmBackward0>)

Check the transformation with an explicit calculation of the linear transformation:

$$
\boldsymbol{a}^{(0)} \boldsymbol{W}^T 
+ \boldsymbol{b}
$$

In [35]:
torch.matmul(a0, linear.weight.T) + linear.bias

tensor([[ 0.4229, -0.2020],
        [ 0.3840, -0.1735],
        [ 0.3086, -0.1409]], grad_fn=<AddBackward0>)

We note that the two results are identical.

Now remains the application of the nonlinear activation function, $f$, according to

$$
\boldsymbol{a}^{(1)} =f( 
\boldsymbol{a}^{(0)} \boldsymbol{W}^T 
+ \boldsymbol{b})
$$

A common choice in machine learning is to adopt the rectifier linear unit function

$$
\mathrm{ReLU}(x) = \max(0,x) = \frac{x + |x|}{2}
$$

In [36]:
relu = torch.nn.ReLU()

In [37]:
a1 = relu(linear(a0))

In [38]:
a1

tensor([[0.4229, 0.0000],
        [0.3840, 0.0000],
        [0.3086, 0.0000]], grad_fn=<ReluBackward0>)

The effect of the ReLU function is as anticipated.

## Loss function

## Backward propagation