# Introduction to PyTorch

The purpose of this notebook is to introduce you to the basics of [PyTorch](https://pytorch.org), the deep learning framework we will be using for the labs. Many good introductions to PyTorch are available online. This notebook focuses on those basics that you will encounter in the labs. Beyond it, you will also need to get comfortable with the [PyTorch documentation](https://pytorch.org/docs/stable/).

We start by importing the PyTorch module:

In [2]:
pip install torch

Collecting torch
  Downloading torch-2.4.1-cp38-cp38-manylinux1_x86_64.whl.metadata (26 kB)
Collecting filelock (from torch)
  Downloading filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting sympy (from torch)
  Downloading sympy-1.13.3-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch)
  Downloading networkx-3.1-py3-none-any.whl.metadata (5.3 kB)
Collecting fsspec (from torch)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nv

In [1]:
import torch

The following code prints the current version of the module:

In [2]:
print(torch.__version__)

2.5.1+cu124


The version of PyTorch at the time of writing this notebook was 2.1.

## Tensors

The fundamental data structure in PyTorch is the **tensor**, a multi-dimensional matrix containing elements of a single numerical data type. Tensors are similar to *arrays* as you may know them from NumPy or MATLAB.

### Creating tensors

One way to create a tensor is to call the function [`torch.tensor()`](https://pytorch.org/docs/stable/generated/torch.tensor.html) on a Python list or NumPy array.

The code in the following cell creates a 2-dimensional tensor with 4 elements.

In [3]:
x = torch.tensor([[0, 1], [2, 3]])
x

tensor([[0, 1],
        [2, 3]])

Each tensor has a *shape*, which specifies the number and sizes of its dimensions:

In [4]:
x.shape

torch.Size([2, 2])

Each tensor also has a *data type* for its elements. [More information about data types](https://pytorch.org/docs/stable/tensors.html#data-types)

In [5]:
x.dtype

torch.int64

When creating a tensor, you can explicitly pass the intended data type as a keyword argument:

In [6]:
y = torch.tensor([[0, 1], [2, 3]], dtype=torch.float)
y.dtype

torch.float32

For many data types, there also exists a specialised constructor:

In [7]:
z = torch.FloatTensor([[0, 1], [2, 3]])
z.dtype

torch.float32

### More creation operations

Create a 3D-tensor of the specified shape and filled with the scalar value zero:

In [8]:
x = torch.zeros(2, 3, 5)
x

tensor([[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]])

Create a 3D-tensor filled with random values:

In [9]:
x = torch.rand(2, 3, 5)
x

tensor([[[0.8331, 0.0285, 0.2408, 0.7361, 0.8328],
         [0.0200, 0.1057, 0.9195, 0.5148, 0.6081],
         [0.9909, 0.8657, 0.9227, 0.7304, 0.5469]],

        [[0.5788, 0.2761, 0.7405, 0.0853, 0.1384],
         [0.9075, 0.2957, 0.8944, 0.5108, 0.2113],
         [0.3106, 0.3052, 0.6089, 0.4745, 0.8937]]])

Create a tensor with the same shape as another one, but filled with ones:

In [10]:
y = torch.ones_like(x)
y    # shape: [2, 3, 5]

tensor([[[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]],

        [[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]]])

For a complete list of tensor-creating operations, see [Creation ops](https://pytorch.org/docs/stable/torch.html#creation-ops).

### Embrace vectorisation!

Iteration or “looping” is of one the most useful techniques for processing data in Python. However, you should **not loop over tensors**. Instead, try to *vectorise* any operations. Looping over tensors is slow, while vectorised operations on tensors are fast (and can be made even faster when the code runs on a GPU). To illustrate this point, let us create a 1D-tensor containing the first 1M integers:

In [11]:
x = torch.arange(1000000)
x

tensor([     0,      1,      2,  ..., 999997, 999998, 999999])

Summing up the elements of the tensor using a loop is relatively slow:

In [12]:
sum(x)

tensor(499999500000)

Doing the same thing using a tensor operation is much faster:

In [13]:
x.sum()

tensor(499999500000)

### Indexing and slicing

To access the contents of a tensor, you can use an extended version of Python’s syntax for indexing and slicing. Essentially the same syntax is used by NumPy. For more information, see [Indexing on ndarrays](https://numpy.org/doc/stable/user/basics.indexing.html).

To illustrate indexing and slicing, we create a 3D-tensor with random numbers:

In [14]:
x = torch.rand(2, 3, 5)
x

tensor([[[0.3796, 0.2778, 0.2722, 0.2055, 0.6587],
         [0.0524, 0.5954, 0.7191, 0.3592, 0.1064],
         [0.3551, 0.4273, 0.8624, 0.5856, 0.2218]],

        [[0.3764, 0.4098, 0.6999, 0.0573, 0.6372],
         [0.6945, 0.4628, 0.0162, 0.5217, 0.1795],
         [0.2321, 0.5784, 0.9938, 0.8904, 0.5163]]])

Index an element by a 3D-coordinate; this gives a 0D-tensor:

In [15]:
x[0,1,2]

tensor(0.7191)

(If you want the result as a non-tensor, use the method [`item()`](https://pytorch.org/docs/stable/generated/torch.Tensor.item.html#torch.Tensor.item).)

Index the second element; this gives a 2D-tensor:

In [16]:
x[1]

tensor([[0.3764, 0.4098, 0.6999, 0.0573, 0.6372],
        [0.6945, 0.4628, 0.0162, 0.5217, 0.1795],
        [0.2321, 0.5784, 0.9938, 0.8904, 0.5163]])

Index the second-to-last element:

In [20]:
x[-2]

tensor([[0.3796, 0.2778, 0.2722, 0.2055, 0.6587],
        [0.0524, 0.5954, 0.7191, 0.3592, 0.1064],
        [0.3551, 0.4273, 0.8624, 0.5856, 0.2218]])

Slice out the sub-tensor with elements from index 1 onwards; this gives a 3D-tensor:

In [21]:
x[1:]

tensor([[[0.3764, 0.4098, 0.6999, 0.0573, 0.6372],
         [0.6945, 0.4628, 0.0162, 0.5217, 0.1795],
         [0.2321, 0.5784, 0.9938, 0.8904, 0.5163]]])

Here is a more complex example of slicing. As in Python, the colon `:` selects all indices of a dimension.

In [22]:
x[:,:,2:4]

tensor([[[0.2722, 0.2055],
         [0.7191, 0.3592],
         [0.8624, 0.5856]],

        [[0.6999, 0.0573],
         [0.0162, 0.5217],
         [0.9938, 0.8904]]])

The syntax for indexing and slicing is very powerful. For example, the same effect as in the previous cell can be obtained with the following code, which uses the ellipsis (`...`) to match all dimensions but the ones explicitly mentioned:

In [23]:
x[...,2:4]

tensor([[[0.2722, 0.2055],
         [0.7191, 0.3592],
         [0.8624, 0.5856]],

        [[0.6999, 0.0573],
         [0.0162, 0.5217],
         [0.9938, 0.8904]]])

### Creating views

You will sometimes want to use a tensor with a different shape than its initial shape. In these situations, you can **re-shape** the tensor or create a **view** of the tensor. The latter is preferable because views can share the same data as their base tensors and thus do not require copying.

We create a 3D-tensor of 12 random values:

In [24]:
x = torch.rand(2, 3, 2)
x

tensor([[[0.0045, 0.2899],
         [0.0932, 0.5359],
         [0.1924, 0.3770]],

        [[0.8246, 0.1700],
         [0.4106, 0.2826],
         [0.9137, 0.0094]]])

Create a view of this tensor as a 2D-tensor:

In [25]:
x.view(3, 4)

tensor([[0.0045, 0.2899, 0.0932, 0.5359],
        [0.1924, 0.3770, 0.8246, 0.1700],
        [0.4106, 0.2826, 0.9137, 0.0094]])

When creating a view, the special size `-1` is inferred from the other sizes:

In [26]:
x.view(3, -1)

tensor([[0.0045, 0.2899, 0.0932, 0.5359],
        [0.1924, 0.3770, 0.8246, 0.1700],
        [0.4106, 0.2826, 0.9137, 0.0094]])

Modifying a view affects the data in the base tensor:

In [28]:
y = torch.rand(2, 3, 2)
z = y.view(3, 4)
z[2, 3] = 42
y

tensor([[[ 0.9914,  0.0751],
         [ 0.9203,  0.9408],
         [ 0.6629,  0.6393]],

        [[ 0.4338,  0.2184],
         [ 0.7391,  0.0696],
         [ 0.3290, 42.0000]]])

### More viewing operations

There are a few other useful methods that create views. [More information about views](https://pytorch.org/docs/stable/tensor_view.html)

In [29]:
x = torch.rand(2, 3, 5)
x

tensor([[[0.1049, 0.1715, 0.1327, 0.1321, 0.3301],
         [0.3387, 0.9927, 0.8197, 0.2893, 0.2409],
         [0.0671, 0.8600, 0.3635, 0.2731, 0.9266]],

        [[0.4652, 0.4348, 0.2471, 0.6105, 0.3925],
         [0.6466, 0.2843, 0.4745, 0.7827, 0.1334],
         [0.7219, 0.7289, 0.7846, 0.7203, 0.9696]]])

The [`permute()`](https://pytorch.org/docs/stable/generated/torch.permute.html) method returns a view of the base tensor with some of its dimensions permuted. In the example, we maintain the first dimension but swap the second and the third dimension:

In [34]:
y = x.permute(1, 0, 2)
print(y)
y.shape

tensor([[[0.1049, 0.1715, 0.1327, 0.1321, 0.3301],
         [0.4652, 0.4348, 0.2471, 0.6105, 0.3925]],

        [[0.3387, 0.9927, 0.8197, 0.2893, 0.2409],
         [0.6466, 0.2843, 0.4745, 0.7827, 0.1334]],

        [[0.0671, 0.8600, 0.3635, 0.2731, 0.9266],
         [0.7219, 0.7289, 0.7846, 0.7203, 0.9696]]])


torch.Size([3, 2, 5])

The [`unsqueeze()`](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html) method returns a tensor with a dimension of size one inserted at the specified position. This is useful e.g. in the training of neural networks when you want to create a batch with just one example.

In [35]:
y = x.unsqueeze(0)
print(y)
y.shape

tensor([[[[0.1049, 0.1715, 0.1327, 0.1321, 0.3301],
          [0.3387, 0.9927, 0.8197, 0.2893, 0.2409],
          [0.0671, 0.8600, 0.3635, 0.2731, 0.9266]],

         [[0.4652, 0.4348, 0.2471, 0.6105, 0.3925],
          [0.6466, 0.2843, 0.4745, 0.7827, 0.1334],
          [0.7219, 0.7289, 0.7846, 0.7203, 0.9696]]]])


torch.Size([1, 2, 3, 5])

The inverse operation to [`unsqueeze()`](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html) is [`squeeze()`](https://pytorch.org/docs/stable/generated/torch.squeeze.html):

In [36]:
y = y.squeeze(0)
print(y)
y.shape

tensor([[[0.1049, 0.1715, 0.1327, 0.1321, 0.3301],
         [0.3387, 0.9927, 0.8197, 0.2893, 0.2409],
         [0.0671, 0.8600, 0.3635, 0.2731, 0.9266]],

        [[0.4652, 0.4348, 0.2471, 0.6105, 0.3925],
         [0.6466, 0.2843, 0.4745, 0.7827, 0.1334],
         [0.7219, 0.7289, 0.7846, 0.7203, 0.9696]]])


torch.Size([2, 3, 5])

### Re-shaping tensors

In some cases, you cannot create a view and need to explicitly re-shape a tensor. In particular, this happens when the data in the base tensor and the view are not in contiguous memory regions.

In [37]:
x = torch.rand(2, 3, 5)
x

tensor([[[0.0397, 0.7922, 0.8714, 0.1917, 0.6529],
         [0.9475, 0.9698, 0.2215, 0.0835, 0.8347],
         [0.0510, 0.2833, 0.4865, 0.0013, 0.5803]],

        [[0.6776, 0.9788, 0.4542, 0.5153, 0.7025],
         [0.5484, 0.9998, 0.8732, 0.7258, 0.5052],
         [0.3438, 0.8520, 0.0194, 0.7828, 0.6367]]])

We permute the tensor `x` to create a new tensor `y` in which the data is no longer consecutive in memory:

In [38]:
y = x.permute(0, 2, 1)
# y = y.view(-1)    # raises a runtime error
y

tensor([[[0.0397, 0.9475, 0.0510],
         [0.7922, 0.9698, 0.2833],
         [0.8714, 0.2215, 0.4865],
         [0.1917, 0.0835, 0.0013],
         [0.6529, 0.8347, 0.5803]],

        [[0.6776, 0.5484, 0.3438],
         [0.9788, 0.9998, 0.8520],
         [0.4542, 0.8732, 0.0194],
         [0.5153, 0.7258, 0.7828],
         [0.7025, 0.5052, 0.6367]]])

When it is not possible to create a view of a tensor, you can explicitly re-shape it, which will *copy* the data if necessary:

In [39]:
y = x.permute(0, 2, 1)
y = y.reshape(-1)
y

tensor([0.0397, 0.9475, 0.0510, 0.7922, 0.9698, 0.2833, 0.8714, 0.2215, 0.4865,
        0.1917, 0.0835, 0.0013, 0.6529, 0.8347, 0.5803, 0.6776, 0.5484, 0.3438,
        0.9788, 0.9998, 0.8520, 0.4542, 0.8732, 0.0194, 0.5153, 0.7258, 0.7828,
        0.7025, 0.5052, 0.6367])

Modifying a reshaped tensor *will not necessarily* change the data in the base tensor. This depends on whether the reshaped tensor is a copy of the base tensor or a view.

In [40]:
y = torch.rand(2, 3, 2)
# z = y.permute(0, 1, 2).reshape(-1)    # z is a view of y => data is shared
z = y.permute(0, 2, 1).reshape(-1)    # z is a copy of y => data is not shared
z[0] = 42
y

tensor([[[0.5288, 0.1765],
         [0.7806, 0.7651],
         [0.9727, 0.2944]],

        [[0.3365, 0.5711],
         [0.5636, 0.6276],
         [0.9473, 0.2401]]])

## Computing with tensors

Now that you know how to create tensors and extract data from them, we can turn to actual computations on tensors.

### Element-wise operations

Unary mathematical operations defined on numbers can be “lifted” to tensors by applying them element-wise. This includes multiplication by a constant, exponentiation (`**`), taking roots ([`torch.sqrt()`](https://pytorch.org/docs/stable/generated/torch.sqrt.html)), and the logarithm ([`torch.log()`](https://pytorch.org/docs/stable/generated/torch.sqrt.html)).

In [41]:
x = torch.rand(2, 3)
print(x)
x * 2    # element-wise multiplication with 2

tensor([[0.8290, 0.2129, 0.2680],
        [0.3700, 0.8070, 0.7745]])


tensor([[1.6579, 0.4259, 0.5359],
        [0.7399, 1.6139, 1.5490]])

Similarly, we can apply binary mathematical operations to tensors, as long as they have the same shape. For example, the Hadamard product of two tensors $X$ and $Y$ is the tensor $X \odot Y$ obtained by the element-wise multiplication of the elements of $X$ and $Y$.

In [42]:
x = torch.rand(2, 3)
y = torch.rand(2, 3)
torch.mul(x, y)    # shape: [2, 3]

tensor([[0.1238, 0.2418, 0.2857],
        [0.0967, 0.0110, 0.6991]])

The Hadamard product can be written more succinctly as follows:

In [43]:
x * y

tensor([[0.1238, 0.2418, 0.2857],
        [0.0967, 0.0110, 0.6991]])

### Matrix product

When computing the matrix product between two tensors $X$ and $Y$, the sizes of the last dimension of $X$ and the first dimension of $Y$ must match. The shape of the resulting tensor is the concatenation of the shapes of $X$ and $Y$, with the last dimension of $X$ and the first dimension of $Y$ removed.

In [47]:
x = torch.rand(2, 3)
y = torch.rand(3, 5)
torch.matmul(x, y)    # shape: [2, 5]

tensor([[0.5230, 1.0319, 0.7987, 1.1509, 1.1243],
        [0.4919, 1.0185, 0.7368, 1.0934, 0.9525]])

The matrix product can be written more succinctly as follows:

In [48]:
x @ y

tensor([[0.5230, 1.0319, 0.7987, 1.1509, 1.1243],
        [0.4919, 1.0185, 0.7368, 1.0934, 0.9525]])

### Sum and argmax

Let us define a tensor of random numbers:

In [53]:
x = torch.rand(2, 3, 5)
x

tensor([[[0.8050, 0.3297, 0.7755, 0.0629, 0.2826],
         [0.5261, 0.2465, 0.9033, 0.1597, 0.9989],
         [0.5812, 0.0917, 0.7472, 0.1316, 0.4948]],

        [[0.9821, 0.7539, 0.5887, 0.6504, 0.0081],
         [0.4127, 0.8225, 0.4966, 0.3511, 0.3926],
         [0.8495, 0.1275, 0.4866, 0.8073, 0.7037]]])

You have already seen that we can compute the sum of a tensor:

In [54]:
torch.sum(x)

tensor(15.5701)

There is a second form of the sum operation where we can specify the dimension along which the sum should be computed. This will return a tensor with the specified dimension removed.

In [55]:
torch.sum(x, dim=0)    # shape: [3, 5]

tensor([[1.7871, 1.0836, 1.3642, 0.7133, 0.2907],
        [0.9388, 1.0690, 1.3999, 0.5107, 1.3915],
        [1.4307, 0.2192, 1.2338, 0.9389, 1.1985]])

In [57]:
torch.sum(x, dim=1)   # shape: [2, 5]

tensor([[1.9123, 0.6679, 2.4260, 0.3542, 1.7763],
        [2.2443, 1.7040, 1.5719, 1.8088, 1.1045]])

The same idea also applies to the operation [`torch.argmax()`](https://pytorch.org/docs/stable/generated/torch.argmax.html), which returns the index of the component with the maximal value along the specified dimension.

In [58]:
torch.argmax(x)    # index of the highest component, numbered in consecutive order

tensor(9)

In [59]:
torch.argmax(x, dim=0)   # index of the highest component along the first dimension

tensor([[1, 1, 0, 1, 0],
        [0, 1, 0, 1, 0],
        [1, 1, 0, 1, 1]])

### Concatenating tensors

A list or tuple of tensors can be combined into one long tensor by concatenation.

In [71]:
x = torch.rand(2, 2)
y = torch.rand(4, 2)
z = torch.cat((x, y))
print(z)
z.shape

tensor([[0.3834, 0.2750],
        [0.7931, 0.8744],
        [0.6593, 0.3228],
        [0.2185, 0.4540],
        [0.3980, 0.8587],
        [0.1849, 0.1999]])


torch.Size([6, 2])

In [72]:
print (x)
print (y)

tensor([[0.3834, 0.2750],
        [0.7931, 0.8744]])
tensor([[0.6593, 0.3228],
        [0.2185, 0.4540],
        [0.3980, 0.8587],
        [0.1849, 0.1999]])


You can also concatenate along a specific dimension:

In [73]:
x = torch.rand(2, 2)
y = torch.rand(2, 2)
print(x)
print(y)
print(torch.cat((x, y), dim=0))    # shape: [4, 2]
print(torch.cat((x, y), dim=1))    # shape: [2, 4]

tensor([[0.0624, 0.6167],
        [0.8719, 0.1743]])
tensor([[0.5262, 0.3413],
        [0.9979, 0.7353]])
tensor([[0.0624, 0.6167],
        [0.8719, 0.1743],
        [0.5262, 0.3413],
        [0.9979, 0.7353]])
tensor([[0.0624, 0.6167, 0.5262, 0.3413],
        [0.8719, 0.1743, 0.9979, 0.7353]])


### Broadcasting

The term *broadcasting* describes how PyTorch treats tensors with different shapes. In short, if a PyTorch operation supports broadcasting, then its Tensor arguments can be automatically expanded to be of equal sizes (without making copies of the data). In many situations, this can avoid explicit looping. 

In the simplest case, two tensors have the same shapes. This is the case for the matrix `x @ W` and the bias vector `b` in the linear model below:

In [76]:
x = torch.rand(1, 2)
print(x)
W = torch.rand(2, 3)
print(W)
b = torch.rand(1, 3)
print(b)
z = x @ W    # shape: [1, 3]
print(z)
z = z + b    # shape: [1, 3]
print(z)
z.shape

tensor([[0.8031, 0.2381]])
tensor([[0.8065, 0.3058, 0.6378],
        [0.1298, 0.8368, 0.5933]])
tensor([[0.1408, 0.2541, 0.7891]])
tensor([[0.6786, 0.4448, 0.6534]])
tensor([[0.8194, 0.6989, 1.4425]])


torch.Size([1, 3])

Now suppose that we do not have a single input `x` but a whole batch (a matrix) of inputs `X`. Watch what happens when adding the bias vector `b`:

In [80]:
X = torch.rand(5, 2)
print(X)
print(W)
Z = X @ W    # shape: [5, 3]
print(Z)
Z = Z + b    # shape: [5, 3]    Broadcasting happens here!
print(Z)
Z.shape
print(b)

tensor([[0.9857, 0.6690],
        [0.6373, 0.6645],
        [0.4231, 0.2638],
        [0.3463, 0.5496],
        [0.0192, 0.8854]])
tensor([[0.8065, 0.3058, 0.6378],
        [0.1298, 0.8368, 0.5933]])
tensor([[0.8818, 0.8612, 1.0255],
        [0.6002, 0.7509, 0.8007],
        [0.3755, 0.3501, 0.4263],
        [0.3506, 0.5658, 0.5469],
        [0.1304, 0.7468, 0.5375]])
tensor([[1.0226, 1.1153, 1.8146],
        [0.7410, 1.0050, 1.5898],
        [0.5162, 0.6042, 1.2154],
        [0.4914, 0.8199, 1.3360],
        [0.2711, 1.0009, 1.3266]])
tensor([[0.1408, 0.2541, 0.7891]])


In the example, broadcasting expands the shape of `b` from $[1, 3]$ into $[5, 3]$. The matrix `Z` is formed by effectively adding `b` *to each row* of `X`. However, this is not implemented by a Python loop but happens implicitly through broadcasting.

PyTorch uses the same broadcasting semantics as NumPy. [More information about broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)

## Final note

There is a lot more to learn about PyTorch, but after working through this notebook, you should be in a good position to take on the labs. Have a look at the [PyTorch documentation](https://pytorch.org/docs/stable/) for further details and more examples.