# Introduction to PyTorch

In this notebook we are going to learn about the open source machine learning framework [PyTorch](https://pytorch.org/). There are many machine learning framework of varying popularity and functionality, such as [TensorFlow](https://www.tensorflow.org/) and [MXNet](https://mxnet.apache.org/). We will be using PyTorch because it is flexible with a low barrier of entry. For those reasons it is extensively used in research with many papers being accompanied by PyTorch code. All machine learning frameworks are broadly similar, so learning one will allow you to easily pick up another one later.

If you already worked with PyTorch, feel free to skip this tutorial.

Lets start with importing all the stuff we need.

In [1]:
# standard libraries
import math, os, time
import numpy as np

# plotting
import matplotlib.pyplot as plt

# progress bars
from tqdm.notebook import trange, tqdm

# PyTorch
import torch

### Reproducibility

When we perform scientific experiments we should ensure others can reproduce those experiments. But that is a problem when we use stochastic functions. The workaround is fixing the seed of the random number generator so that everyone else gets the same sequence of random numbers. We can fix the RNG seed in PyTorch as follows.

In [2]:
torch.manual_seed(0)

<torch._C.Generator at 0x20cea1c2f50>

## Tensors

Tensors in PyTorch are multi-dimensional arrays of numbers. They work the same as numpy arrays or numeric arrays in Mathematica but add functionality such as GPU and backpropagation support that we will need for neural networks. There are several ways to create tensors, the  most straightforward is calling `torch.empty` with the desired dimensions of the tensor.

In [3]:
a = torch.empty(2, 3, 4)
a

tensor([[[1.0102e-38, 9.0919e-39, 1.0102e-38, 8.9082e-39],
         [8.4489e-39, 1.0102e-38, 1.0561e-38, 1.0010e-38],
         [1.0653e-38, 1.0102e-38, 8.4490e-39, 9.6429e-39]],

        [[8.4490e-39, 9.6429e-39, 9.2755e-39, 1.0286e-38],
         [9.0919e-39, 8.9082e-39, 9.2755e-39, 8.4490e-39],
         [1.0194e-38, 9.0919e-39, 8.4490e-39, 1.0653e-38]]])

You can see this has created a $2 \times 3 \times 4$ array, the dimensions of a tensor are called its __shape__.

In [4]:
print(a.shape) # get the shape of a tensor
dim1, dim2, dim3 = a.shape
print(f"dim1={dim1}, dim2={dim2}, dim3={dim3}")

torch.Size([2, 3, 4])
dim1=2, dim2=3, dim3=4


The reason the numbers in `a` are random is because calling `torch.empty` allocates memory for the tensor but does not initialize it. Meaning that whatever was left inside that part of the memory is now interpreted as the contents of the tensor.

We can instead provide our own contents by passing a Python array to `torch.tensor`:

In [5]:
b = torch.tensor([[1,2],[3,4]])
b

tensor([[1, 2],
        [3, 4]])

PyTorch provides many other functions for creating new tensors, such as
- `torch.zeros` : all values set to zero
- `torch.ones` : all values set to 1
- `torch.full` : all values set to a constant
- `torch.rand` : random values in $[0,1]$
- `torch.randn` : normal distributed values with mean 0 and variance 1
- `torch.arange` : 1D tensor with values from an equidistant partition of an interval

In [6]:
c = torch.zeros(2, 3)
print(f"c = \n{c}")

d = torch.ones(2, 3)
print(f"d = \n{d}")

e = torch.full((2, 3), 5.)
print(f"e = \n{e}")

f = torch.rand(2, 3)
print(f"f = \n{f}")

g = torch.randn(2, 3)
print(f"g = \n{g}")

h = torch.arange(1, 10)
print(f"h = \n{h}")

c = 
tensor([[0., 0., 0.],
        [0., 0., 0.]])
d = 
tensor([[1., 1., 1.],
        [1., 1., 1.]])
e = 
tensor([[5., 5., 5.],
        [5., 5., 5.]])
f = 
tensor([[0.4963, 0.7682, 0.0885],
        [0.1320, 0.3074, 0.6341]])
g = 
tensor([[ 1.2645, -0.6874,  0.1604],
        [-0.6065, -0.7831,  1.0622]])
h = 
tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])


## Numpy Compatibility

PyTorch tensors and Numpy arrays can be converted to each other. This is sometimes handy since many packages are built to handle Numpy arrays, we can use those packages with PyTorch aswell then.

In [7]:
# Converting a Numpy array to a PyTorch tensor:
numpy_array = np.array([[1.,2.],[3.,4.]])
tensor = torch.from_numpy(numpy_array)

print(tensor)

# Converting a PyTorch tensor to a Numpy array:
tensor = torch.rand(2, 2)
numpy_array = tensor.numpy()

print(numpy_array)

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)
[[0.16102946 0.28226858]
 [0.68160856 0.915194  ]]


## Operations

Operating with tensors is straightforward, the usual arithmetic operators are supported:

In [8]:
a = torch.tensor([[1,2],[3,4]])
b = torch.tensor([[-1,3],[0,1]])
print(100*a)
print(a+b)
print(a*b)
print(a**b)
print(a-b)
print(a/b)

tensor([[100, 200],
        [300, 400]])
tensor([[0, 5],
        [3, 5]])
tensor([[-1,  6],
        [ 0,  4]])
tensor([[1, 8],
        [1, 4]])
tensor([[ 2, -1],
        [ 3,  3]])
tensor([[-1.0000,  0.6667],
        [    inf,  4.0000]])


PyTorch contains a large collection of operations implementing common mathematical functions, see <https://pytorch.org/docs/stable/tensors.html>.

In [9]:
# We can pass the tensor to an operator:
print(torch.sin(a))
print(torch.cos(a))
print(torch.sqrt(a))
print(torch.abs(b))
print(torch.t(a)) # transpose

tensor([[ 0.8415,  0.9093],
        [ 0.1411, -0.7568]])
tensor([[ 0.5403, -0.4161],
        [-0.9900, -0.6536]])
tensor([[1.0000, 1.4142],
        [1.7321, 2.0000]])
tensor([[1, 3],
        [0, 1]])
tensor([[1, 3],
        [2, 4]])


In [10]:
# Alternatively, we can call the operation on the tensor itself:
print(a.sin())
print(a.cos())
print(a.sqrt())
print(b.abs())
print(a.t())

tensor([[ 0.8415,  0.9093],
        [ 0.1411, -0.7568]])
tensor([[ 0.5403, -0.4161],
        [-0.9900, -0.6536]])
tensor([[1.0000, 1.4142],
        [1.7321, 2.0000]])
tensor([[1, 3],
        [0, 1]])
tensor([[1, 3],
        [2, 4]])


## Reshaping & Selecting

Sometimes is is necessary to change the shape of a tensor.

In [11]:
a = torch.arange(6)
print(a)

b = a.reshape(2, 3) # new shape
print(b)

c = a.reshape(-1, 2) # one dimension can be automatically infered by specifying it as -1
print(c)

tensor([0, 1, 2, 3, 4, 5])
tensor([[0, 1, 2],
        [3, 4, 5]])
tensor([[0, 1],
        [2, 3],
        [4, 5]])


Selections in tensors can be made in the same way as in Numpy, indexing starts at 0.

In [12]:
print(a[2:5]) # items [2, 5) in the first dimension
print(b[0,:]) # first item of the first dimension, all in the second dimension
print(c[-1,1:]) # last item of the first dimensions, all except the first in the second dimensions

tensor([2, 3, 4])
tensor([0, 1, 2])
tensor([5])


## Datatypes

Another important property of a tensor is its datatype, or _dtype_. To ensure performance, tensors only handle native numeric types. The default datatype of a tensor is _single-precision floating-point_, refered to in PyTorch as `torch.float32` since it occupies 32 bits. We will do most if not all our computations in single-precision floating-point format. The reason we do not use double-precision floating point (like in most scientific computing work) is because we simply do not need that level of precision. The added memory capacity and bandwith needed to handle 64-bit floating point numbers is another reason to stick with 32-bit floats. In fact modern GPUs support various 16-bit floating point formats (`torch.float16`, `torch.bfloat16`) specifically for deep learning. But since 16-bit support is not universal we will stick to 32-bit floats for our work.

More information can be found at <https://pytorch.org/docs/stable/tensor_attributes.html>.

In [13]:
a = torch.empty(2, 3, 4) # float32 is the default
print(a.dtype)

b = torch.arange(4) # yields int64 by default
print(b.dtype)

c = a + b # operations with different dtypes casts to the 'largest' dtype
print(c.dtype)

# we can also explicitly cast to another dtype:
d = b.to(torch.float64)
print(d.dtype)

torch.float32
torch.int64
torch.float32
torch.float64


## Broadcasting

Broadcasting is what happens when we do operations on tensors with different shapes. Subject to certain restrictions, the smaller tensor will be _broadcast_ to match the size of the larger one, this happens without physically copying data. Broadcasting in PyTorch works the same as in Numpy, see <https://pytorch.org/docs/stable/notes/broadcasting.html> for details.

In [14]:
a = torch.tensor([[1,2,3],[4,5,6]])
b = torch.tensor([[1],[2]])
c = torch.tensor([-1,-2,-3])
d = torch.tensor([-10])

# shape of a = [ 2 , 3 ]
# shape of b = [ 2 , 1 ]
# shape of c = [     3 ]
# shape of d = [     1 ]

print(a+b)
print(a+c)
print(a*d)
print(b+c)

tensor([[2, 3, 4],
        [6, 7, 8]])
tensor([[0, 0, 0],
        [3, 3, 3]])
tensor([[-10, -20, -30],
        [-40, -50, -60]])
tensor([[ 0, -1, -2],
        [ 1,  0, -1]])


## Gradients

So far we have done nothing with PyTorch that we cannot do with Numpy or any other scientific computing program. What distinguishes PyTorch is its support for calculating gradients automatically. We have to let PyTorch know which tensors we want to calculate gradients for, we do this by setting `requires_grad=True`. We can pass `requires_grad=True` during construction of a new tensor or call `.requires_grad(True)` on an existing tensor.

PyTorch implements _Autograd_ to computate gradients, which is distinct from finite differences or symbolic derivatives. Autograd will only give us the gradient at one particular point and will only do so after having compute the function at that point. First evaluating the function at our point of interest is called the _forward pass_, let us do that for a simple function.

In [15]:
# set up our tensors and indicate which we want to compute the gradient for (just the derivative in 1D)
x = torch.tensor([1.0], requires_grad=True)
a = torch.tensor([2.0])
b = torch.tensor([3.0])

# execute the forward pass
y = a*x + b

# lets see whats in y
print(y)

tensor([5.], grad_fn=<AddBackward0>)


As expected `y` contains the right answer $2*1+3=5$. But it contains an additional field `grad_fn=<AddBackward0>`. When we did a computation with a tensor that has `requires_grad=True`, PyTorch tracks everything we do with that tensor so that it can later do backpropagation to calculate the gradient. The `grad_fn` property is part of the mechanism, the details of this mechanism are not important for us to understand (if you are interest: see <https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html>) but we can now use backpropagation to calculate $\frac{\partial y}{\partial x}(1)$.

We execute the backward pass by calling `.backward()` on the output tensor `y`, afterward we can look up the gradient of `x` by accessing `x.grad`.

In [16]:
# execute the backward pass
y.backward()

# gradient of y w.r.t. to x:
print(x.grad)

tensor([2.])


Indeed $\frac{\partial y}{\partial x}(1)=2$.

Let us do something a little more complicated:
$$
    y = - x_1^2 + 0\, x_2^2 + 2\, x_3^2 .
$$

In [17]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
a = torch.tensor([-1.0, 0.0, 2.0])

y = (a*(x**2)).sum()
print(y)

y.backward()
print(x.grad)

tensor(17., grad_fn=<SumBackward0>)
tensor([-2.,  0., 12.])


You can verify that this is indeed the correct answer.

Implementing gradient descent would now come down to doing `x = x - learning_rate * x.grad`. We will not have to do this manually as PyTorch has optimizers that handle all these details but it is good to know what is happening when we are running an optimizer.