# Introduction to PyTorch

The purpose of this notebook is to introduce you to the basics of [PyTorch](https://pytorch.org), the deep learning framework we will be using for the labs. Many good introductions to PyTorch are available online. This notebook focuses on basics you will need for the labs. Beyond it, you will also need to get comfortable with finding information in the [PyTorch documentation](https://pytorch.org/docs/stable/).

We start by importing the PyTorch module:

In [None]:
import torch

The following code prints the current version of the module:

In [None]:
print(torch.__version__)

The version of PyTorch at the time of writing this notebook was 2.7.0.

## Tensors

The fundamental data structure in PyTorch is the **tensor**, a multi-dimensional matrix containing elements of a single numerical data type. Tensors are similar to *arrays* as you may know them from NumPy or MATLAB.

### Creating tensors

One way to create a tensor is to call the function [`torch.tensor()`](https://pytorch.org/docs/stable/generated/torch.tensor.html) on a Python list or NumPy array. As an example, the code in the following cell creates a 2D-tensor with 6 elements.

In [None]:
x = torch.tensor([[0, 1, 2], [3, 4, 5]])
x

Each tensor has a **shape**, which specifies the number and sizes of its dimensions:

In [None]:
x.shape

Each tensor also has a **data type** for its elements. [More information about data types](https://pytorch.org/docs/stable/tensors.html#data-types)

In [None]:
x.dtype

When creating a tensor, you can explicitly pass the intended data type as a keyword argument:

In [None]:
y = torch.tensor([[0, 1, 2], [3, 4, 5]], dtype=torch.float)
y.dtype

For many data types, there also exists a shorthand constructor:

In [None]:
z = torch.FloatTensor([[0, 1, 2], [3, 4, 5]])
z.dtype

In many examples in this notebook, we will use a special tensor `Q`. This tensor has shape $[2, 3, 5]$ and data type `int64`.

In [None]:
Q = torch.arange(30).reshape(2, 3, 5)

We will refer to the dimensions of `Q` as *batches* (size&nbsp;2), *rows* (size&nbsp;3), and *columns* (size&nbsp;5) and use the shorthands $B$, $R$, and $C$ to refer to these dimensions and their sizes.

### üß© Task 1: More creation operations

Browse the [list of tensor-creating operations](https://pytorch.org/docs/stable/torch.html#creation-ops) in the PyTorch documentation and write code to create a tensor of the same shape as `Q` but filled with:

* the scalar value zero
* random values
* the scalar value one, but without explicitly specifying the shape

### Indexing and slicing

To access the contents of a tensor, you can use an extended version of Python‚Äôs syntax for indexing and slicing. Essentially the same syntax is used by NumPy. For more information, see [Indexing on ndarrays](https://numpy.org/doc/stable/user/basics.indexing.html).

We start by indexing a single element of the tensor `Q` by a 3D-coordinate.

In [None]:
y = Q[0, 1, 2]
y

Note that this gives a 0D-tensor:

In [None]:
y.shape

If you want to unpack the result as a non-tensor, use the method [`item()`](https://pytorch.org/docs/stable/generated/torch.Tensor.item.html#torch.Tensor.item):

In [None]:
y.item()

As in standard Python, we can use negative indexes to index from the end. For example, to index the second-to-last element, we can write:

In [None]:
y = Q[-2]
assert y.shape == (3, 5)
y

If we index a tensor with fewer coordinates than it has dimensions, we get a subtensor with the remaining dimensions. For example:

In [None]:
y = Q[1]
assert y.shape == (3, 5)
y

(The subtensor is actually a *view* on the original tensor, not a copy; see below for more information on this.)

Slicing differs from indexing in that it takes elements from one given index to another given index. For example, we can slice out the subtensor with elements from index 1 onwards:

In [None]:
y = Q[1:]
assert y.shape == (1, 3, 5)
y

Note that the shape of the resulting tensor is the same as that of the original tensor, except that the size of the batch dimension has changed from 2 to 1.

Here is a more complex example of slicing. As in Python, the colon `:` selects all indices of a dimension.

In [None]:
y = Q[:, :, 2:4]
assert y.shape == (2, 3, 2)
y

The syntax for indexing and slicing is very powerful. For example, the same effect as in the previous cell can be obtained with the following code, which uses the ellipsis (`...`) to match all dimensions but the ones explicitly mentioned:

In [None]:
y = Q[..., 2:4]
assert y.shape == (2, 3, 2)
y

### üß© Task 2: Indexing and slicing, part 1

Can you predict the shapes of the following subtensors?

In [None]:
# a = Q[0]
# b = Q[:, 1]
# c = Q[:, 1:2]

### üß© Task 3: Indexing and slicing, part 2

Write code to achieve the following using indexing and slicing on the tensor `Q`:

* Extract the subtensor containing the last two columns. (The resulting shape should be $[B, R, 2]$.)
* Extract the subtensor with elements from index 1 to index 2 (inclusive) in the $R$ dimension and index 2 to index 3 in the $C$ dimension.
* From the first element along the batch dimension only, take the last two entries along dimensions $R$ and $C$.

## Computing with tensors

Now that you know how to create tensors and extract data from them, we can turn to actual computations on tensors.

### Element-wise operations

Unary mathematical operations defined on numbers can be ‚Äúlifted‚Äù to tensors by applying them element-wise. This includes multiplication by a constant, exponentiation (`**`), taking roots ([`torch.sqrt()`](https://pytorch.org/docs/stable/generated/torch.sqrt.html)), and the logarithm ([`torch.log()`](https://pytorch.org/docs/stable/generated/torch.sqrt.html)).

In [None]:
x = torch.rand(2, 3)
print(x)
x * 2    # element-wise multiplication with 2

Similarly, we can apply binary mathematical operations to tensors, as long as they have the same shape. For example, the Hadamard product of two tensors $X$ and $Y$ is the tensor $X \odot Y$ obtained by the element-wise multiplication of the elements of $X$ and $Y$.

In [None]:
x = torch.rand(2, 3)
y = torch.rand(2, 3)
z = torch.mul(x, y)
assert z.shape == (2, 3)
z

The Hadamard product can be written more succinctly as follows:

In [None]:
x * y

### Matrix product

When computing the matrix product between two tensors $X$ and $Y$, the sizes of the last dimension of $X$ and the first dimension of $Y$ must match. The shape of the resulting tensor is the concatenation of the shapes of $X$ and $Y$, with the last dimension of $X$ and the first dimension of $Y$ removed.

In [None]:
x = torch.rand(2, 3)
y = torch.rand(3, 5)
z = torch.matmul(x, y)
assert z.shape == (2, 5)
z

The matrix product can be written more succinctly as follows:

In [None]:
x @ y

### Sum and argmax

Let us define a tensor of random numbers:

In [None]:
x = torch.rand(3)
x

We can compute the sum over all elements of this tensor like this:

In [None]:
torch.sum(x)

We can also compute the ‚Äúargmax‚Äù of a tensor, which is the index of its largest element:

In [None]:
torch.argmax(x)

### üß© Task 4: Sum and argmax over specific dimensions

There is a second form of the sum operation where we can specify the dimension along which the sum should be computed. This will return a tensor with the specified dimension removed.

* Write code to compute the sum over the batch dimension of `Q`.
* Write code to compute the argmax over the column dimension of `Q`.

### Concatenating tensors

A list or tuple of tensors can be combined into one long tensor by concatenation:

In [None]:
x = torch.rand(2, 3)
y = torch.rand(3, 3)
z = torch.cat((x, y))
assert z.shape == (5, 3)
z

You can also concatenate along a specific dimension:

In [None]:
x = torch.rand(2, 2)
y = torch.rand(2, 2)
print(x)
print(y)
print(torch.cat((x, y), dim=0))  # Shape: (4, 2)
print(torch.cat((x, y), dim=1))  # Shape: (2, 4)

### Creating views

We sometimes want to use a tensor with a different shape than its original shape. In these situations, we can **re-shape** the tensor or create a **view** of the tensor. The latter is preferable because views can share the same data as their base tensors and thus do not require copying.

The following line creates a view of the `Q` tensor as a 2D-tensor:

In [None]:
Q.view(2 * 3, 5)

When creating a view, the special size `-1` is inferred from the other sizes:

In [None]:
Q.view(-1, 5)

Updating a tensor through a view affects the data in the base tensor:

In [None]:
y = torch.rand(2, 3, 5)
z = y.view(-1, 5)
z[2, 3] = 42
y

### üß© Task 6: More viewing operations

There are a few other useful methods that create views. Browse the [PyTorch documentation](https://pytorch.org/docs/stable/tensor_view.html) and write code to return views of `X` where:

* The second and the third dimension are swapped.
* A dimension of size 1 is inserted before the first dimension.
* The tensor is flattened into a 1D tensor.

### Re-shaping tensors

In some cases, you cannot create a view and need to explicitly re-shape a tensor. In particular, this happens when the data in the base tensor and the view are not in contiguous memory regions.

The next cell permutes the tensor `X` to create a new tensor `y` in which the data is no longer contiguous in memory:

In [None]:
y = Q.permute(0, 2, 1)
# y = y.view(-1)    # raises a RuntimeError
y

When it is not possible to create a view of a tensor, you can explicitly re-shape it, which will *copy* the data if necessary:

In [None]:
y = Q.permute(0, 2, 1)
y = y.reshape(-1)
y

Modifying a reshaped tensor *will not necessarily* change the data in the base tensor. This depends on whether the reshaped tensor is a copy of the base tensor or a view.

In [None]:
y = torch.rand(2, 3, 5)
# z = y.permute(0, 1, 2).reshape(-1)    # z is a view of y => data is shared
z = y.permute(0, 2, 1).reshape(-1)    # z is a copy of y => data is not shared
z[0] = 42
y

## Embrace vectorisation!

Iteration or ‚Äúlooping‚Äù is of one the most useful techniques for processing data in Python. However, you should **not loop over tensors**. Instead, learn to love *vectorisation*. Looping over tensors is slow, while vectorised operations on tensors are fast (and can be made even faster when the code runs on a GPU). To illustrate this point, let us create a 1D-tensor containing the first 1M integers:

In [None]:
x = torch.arange(1_000_000)
x

Summing up the elements of the tensor using a loop is relatively slow:

In [None]:
s = 0
for i in x:
    s += x[i]
s

Doing the same thing using a tensor operation is much faster:

In [None]:
x.sum()

## Broadcasting

The term **broadcasting** describes how PyTorch treats tensors with different shapes. In short, if a PyTorch operation supports broadcasting, then its Tensor arguments can be automatically expanded to be of equal sizes (without making copies of the data). In many situations, this can avoid explicit looping. 

In the simplest case, two tensors have the same shapes. This is the case for the matrix `x @ W` and the bias vector `b` in the linear model below:

In [None]:
x = torch.rand(1, 2)
W = torch.rand(2, 3)
b = torch.rand(1, 3)
z = x @ W  # Shape: (1, 3)
z = z + b  # Shape: (1, 3)
print(z)
z.shape

Now suppose that we do not have a single input `x` but a whole batch (a matrix) of inputs `X`. Watch what happens when adding the bias vector `b`:

In [None]:
X = torch.rand(5, 2)
Z = X @ W  # Shape: [5, 3]
Z = Z + b  # Shape: [5, 3]  Broadcasting happens here!
print(Z)
Z.shape

In the example, the matrix `Z` is formed by effectively adding `b` *to each row* of `X`. This is not implemented by a Python loop but happens implicitly through broadcasting. The basic rule is this: Right-align the shapes of the tensors and add extra dimensions of size 1 as needed. Two dimensions are compatible when they are equal, or when one of them is 1. The latter cases are situations where implicit looping happens.

In [None]:
# Shape of X @ W      =  (5, 3)
# Shape of b          =  (1, 3)  # 5 and 1 are compatible because one of them is 1
# Shape of X @ W + b  =  (5, 3)

PyTorch uses the same broadcasting semantics as NumPy. [More information about broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)

## Final note

There is a lot more to learn about PyTorch, but after working through this notebook, you should be in a good position to take on the labs. If you want more practice, ask your favourite AI assistant to generate some PyTorch exercises for you. And if you get stuck, do not hesitate to ask your tutor for help!