# Introduction to Machine Learning through a PyTorch Tutorial

By Philippe, mentor on the Quandela team

## Introduction

PyTorch is an open source library for machine learning that is often used in research to define deep learning models and train them efficiently. Some alternatives to PyTorch are [TensorFlow](https://www.tensorflow.org) and [JAX](https://docs.jax.dev/en/latest/notebooks/thinking_in_jax.html). There are many PyTorch tutorials out there, so feel free to explore. Namely, there are many tutorials produced by the PyTorch team [here](https://docs.pytorch.org/tutorials/) and the inspiration for this current tutorial comes from the University of Amsterdam: [here](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial2/Introduction_to_PyTorch.html) is a notebook authored by Philippe Lippe.

As a prerequisite for this tutorial, you should have some knowledge of Numpy. If that is not the case, we recommend you follow a Numpy tutorial like [this one](https://numpy.org/doc/stable/user/quickstart.html) beforehand.

### Why is it so important to know how to use PyTorch ?

#### In the context of classical machine learning:
PyTorch stands as a predominant tool because of its high efficiency, simple usage and high customizability which enables researchers to experiment however they want with a performing framework.

1. **Automatic differentiation**

PyTorch builds a computational graph that handles backpropagation for gradient descent automatically and that is very useful for optimizing models.

2. **GPU acceleration**

PyTorch has integrated many features to effortlessly use GPUs, making large-scale training feasible.

3. **Modular neural network building blocks**

This library provides customizable building blocks to assemble flexible models which ease prototyping and exploration.

4. **Ecosystem**

PyTorch is well established in the machine learning ecosystem so many people use it for several goals. Its popularity also justifies its importance.

#### In the context of quantum machine learning:
PyTorch is a wonderful tool too since its well known optimization engine can be used for quantum circuits in most cases, it allows for research prototyping with quantum/hybrid models and it is integrated into many quantum computing frameworks.

1. **Quantum-classical workflow**

The partition of work between PyTorch and the quantum computing libraries is straightforward. PyTorch handles the optimization whereas the quantum circuit simulator/hardware provides the forward pass (direct computation). Computing gradients of quantum circuits is a complex task, but with a QML library wrapped with PyTorch's autograd, the code to train a quantum model is the same as the one to train a classical model.

2. **Integration with quantum frameworks**

Libraries like [Pennylane](https://docs.pennylane.ai/en/stable/), [Merlin](https://merlinquantum.ai), [TorchQuantum](https://torchquantum.readthedocs.io/en/main/) or [Qiskit Machine Learning](https://qiskit-community.github.io/qiskit-machine-learning/) integrate with PyTorch.

### Imports
We will use a set of standard libraries that are popular in machine learning.

In [None]:
import os
import math
import matplotlib.pyplot as plt
import numpy as np
import time
import torch
from tqdm.notebook import tqdm

The PyTorch library was imported by calling `import torch`. Let's check its version:

In [None]:
print(f'Using torch {torch.__version__}')

Using torch 2.8.0


As soon as you run the previous cell you will see the exact version installed in your environment. These examples target PyTorch 2.x, so a nearby version should work without changes. If you ever need to install a different release, run `pip install torch==<desired version>` before the import block and restart the notebook kernel to load it.

In [None]:
torch.manual_seed(42)

<torch._C.Generator at 0x10f3101d0>

## Tensors

Tensors are the foundational tools of PyTorch. They are equivalent to Numpy arrays, but they support automatic backpropagation and have support for GPU acceleration. A tensor has a certain number of dimensions. A vector would be represented by a 1-D tensor and a matrix would be represented by a 2-D tensor. Each dimension of a tensor has a size that indicates the number of elements present in the tensor along the respective dimension. Furthermore, the shape of a tensor is the size of each of its dimensions. For example:

tensor[[1, 2],
       [3, 4],
       [5, 6]]

would have a shape of (3, 2) because it has 3 rows (first dimension) and 2 columns (second dimension). And a tensor with shape (1, 3, 5) would have a first dimension of size 1, a second dimension of size 3 and a third dimension of size 5.

### Tensor Initialization
There are many ways to initialize a tensor but the simplest is to call `torch.tensor({array})` which converts the input array into a tensor:

In [None]:
x = torch.tensor([1, 3, 5])
print(x)

tensor([1, 3, 5])


Other methods for initialization include:
- `torch.zeros({shape})`: Creates a tensor filled with zeros of shape {shape}
- `torch.ones({shape})`: Creates a tensor filled with ones of shape {shape}
- `torch.rand({shape})`: Creates a tensor of shape {shape} filled with random values uniformly sampled between 0 and 1
- `torch.arange({N}, {M+1})`: Creates a 1-D tensor containing the values N, N+1, N+2, ... M

In [None]:
x_1 = torch.zeros((2, 3))
print(f'- torch.zeros((2, 3)):\n{x_1}')
x_2 = torch.ones((3, 2))
print(f'- torch.ones((3, 2)):\n{x_2}')
x_3 = torch.rand((3, 3))
print(f'- torch.rand((3, 3)):\n{x_3}')
x_4 = torch.arange(10, 15)
print(f'- torch.aragne(10, 15):\n{x_4}')

- torch.zeros((2, 3)):
tensor([[0., 0., 0.],
        [0., 0., 0.]])
- torch.ones((3, 2)):
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
- torch.rand((3, 3)):
tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009],
        [0.2566, 0.7936, 0.9408]])
- torch.aragne(10, 15):
tensor([10, 11, 12, 13, 14])


Note that it is not always necessary to use a second pair of parenthesis when defining the shape for these methods:

In [None]:
x_5 =  torch.zeros(2, 3)
print(x_5)

tensor([[0., 0., 0.],
        [0., 0., 0.]])


The shape of a tensor is its most important characteristic because most operations between tensors require that they share a common size within their shape. For simple examples, we can look at matrix operations as a parallel:

1. Matrix addition ($M_1 + M_2$) requires that $M_1$ and $M_2$ have the same shape.
2. Matrix product ($M_1 \cdot M_2$) requires that the second dimension of $M_1$ is equal to the first dimension of $M_2$.

Moreover, you can obtain the shape of a tensor by simply using `.shape` or `.size()` on a tensor:

In [None]:
x_1_shape = x_1.shape
print(x_1_shape)
x_2_size = x_2.size()
print(x_2_size)

torch.Size([2, 3])
torch.Size([3, 2])


### Tensor to Numpy and Numpy to Tensor

Most functions you know from Numpy that can be applied to arrays also exist for tensors. Additionally, it is possible to convert Numpy arrays to tensors via `torch.tensor({Numpy array})` and vice versa using `tensor.numpy()`. The `.detach().cpu()` present below is a safeguard to ensure that the conversion works, but we will explain it later.

In [None]:
np_array = np.array([1, 2, 3, 4])
tensor = torch.tensor(np_array)
new_np_array = tensor.detach().cpu().numpy()

print(f'- Numpy array: {np_array}')
print(f'- Tensor: {tensor}')
print(f'- New Numpy array: {new_np_array}')

- Numpy array: [1 2 3 4]
- Tensor: tensor([1, 2, 3, 4])
- New Numpy array: [1 2 3 4]


### Tensor Operations

Any function that transforms a tensor or combines tensors is called an operation. We often chain operations to prepare data, clean features, or interpret model outputs.

#### Unary Tensor Operations

Unary operations take a single tensor as input. They usually reshape the data, cast it to another dtype, move it between devices, or apply an elementwise transformation.

Some helpful shape utilities:

- `tensor.reshape(new_shape)`: reshape a tensor to any compatible shape.
- `tensor.unsqueeze(dim)`: insert a dimension of size 1 at the given index.
- `tensor.squeeze(dim)`: remove a dimension if its size is 1.
- `tensor.permute(dims)`: reorder all dimensions at once.
- `tensor.transpose(dim0, dim1)`: swap two dimensions.
- `tensor.flatten(start_dim=0)`: merge dimensions into a single 1-D dimension.
- `tensor.repeat(*sizes)`: tile the tensor along specified dimensions.

Dimensions in PyTorch are zero-indexed: `dim=0` refers to the first dimension (often rows), `dim=1` to the second (often columns), and so on. The meaning of each dimension depends on the tensor: in an image tensor shaped `(batch, channels, height, width)`, for instance, `dim=2` would correspond to the height dimension.

Type and device helpers:

- `tensor.to(device_or_dtype)`: move the tensor to a device or cast its dtype.
- `tensor.cpu()` / `tensor.cuda()`: explicit device moves between CPU and GPU.
- `tensor.clone()`: copy the tensor while keeping the original intact.
- `tensor.detach()`: get a tensor that shares storage but is disconnected from autograd.

A **device** tells PyTorch where the tensor's data lives (CPU vs. GPU), while a **dtype** specifies the data type of the elements (e.g., `torch.float32`, `torch.int64`). Choosing the right device enables hardware acceleration, and choosing the right dtype balances precision and memory usage.

Elementwise math and reductions:

- `tensor.abs()`, `tensor.exp()`, `tensor.sqrt()`: apply functions to each value.
- `tensor.mean()`, `tensor.sum()`, `tensor.max()`: summarise values across dimensions.
- `tensor.norm()`: compute vector or matrix norms.

These building blocks cover most shape and value manipulations you will need before stepping into multi-tensor operations.

In [None]:
tensor = torch.arange(6).reshape(2, 3)
print('- Original tensor of shape:', tensor.shape, '\n', tensor)

reshaped = tensor.reshape(3, 2)
print('- Reshaped to 3x2:\n', reshaped)

unsqueezed = tensor.unsqueeze(0)
print('- Unsqueezed shape:\n', unsqueezed.shape, '\nUnsqueezed tensor:\n', unsqueezed)

squeezed = unsqueezed.squeeze(0)
print('- Squeezed back shape:\n', squeezed.shape)

transposed = tensor.transpose(0, 1)
print('- Transposed tensor:\n', transposed)

flattened = tensor.flatten()
print('- Flattened tensor:\n', flattened)

float_tensor = tensor.float()
print('- Casted to float dtype:', float_tensor.dtype)

cloned = float_tensor.clone()
print('- Clone shares values with original:', torch.equal(float_tensor, cloned))

absolute = torch.abs(torch.tensor([-2.0, 0.0, 3.5]))
print('- Absolute value example:\n', absolute)

- Original tensor of shape: torch.Size([2, 3]) 
 tensor([[0, 1, 2],
        [3, 4, 5]])
- Reshaped to 3x2:
 tensor([[0, 1],
        [2, 3],
        [4, 5]])
- Unsqueezed shape:
 torch.Size([1, 2, 3]) 
Unsqueezed tensor:
 tensor([[[0, 1, 2],
         [3, 4, 5]]])
- Squeezed back shape:
 torch.Size([2, 3])
- Transposed tensor:
 tensor([[0, 3],
        [1, 4],
        [2, 5]])
- Flattened tensor:
 tensor([0, 1, 2, 3, 4, 5])
- Casted to float dtype: torch.float32
- Clone shares values with original: True
- Absolute value example:
 tensor([2.0000, 0.0000, 3.5000])


#### Multiple Tensors Operations

Operations can also take several tensors as inputs to combine or compare them.

Elementwise arithmetic (`+`, `-`, `*`, `/` or `torch.add`) works when the tensors have the same shape or when broadcasting can align them. Matrix operations such as `torch.matmul` or the `@` operator (matrix multiplication) follow the familiar linear algebra rules. There are also utilities for concatenating or stacking tensors along a chosen dimension (`torch.cat`, `torch.stack`) and for comparisons (`torch.eq`, `torch.max`, etc.).

**Broadcasting**

Broadcasting automatically expands smaller tensors so that elementwise operations can run without copying data manually. PyTorch compares shapes from the last dimension backward and inserts size-1 dimensions when necessary. This is handy when, for instance, you add a bias vector to every row of a matrix.

In [None]:
a = torch.tensor([[1., 2., 3.],
                  [4., 5., 6.]])
b = torch.tensor([[10., 20., 30.],
                  [40., 50., 60.]])

print('- Elementwise addition:\n', a + b)
print('- Elementwise multiplication:\n', a * b)

matrix = torch.tensor([[1., 2.], [3., 4.]])
vector = torch.tensor([0.5, 1.0])
print('- Matrix @ vector:\n', matrix @ vector)

print('- Broadcasted addition (matrix + vector):\n', matrix + vector)
# Without broadcasting, this addition would fail because matrix.shape is not equal to vector.shape.

stacked = torch.stack([vector, vector])
print('- Stacked vectors:\n', stacked)

- Elementwise addition:
 tensor([[11., 22., 33.],
        [44., 55., 66.]])
- Elementwise multiplication:
 tensor([[ 10.,  40.,  90.],
        [160., 250., 360.]])
- Matrix @ vector:
 tensor([2.5000, 5.5000])
- Broadcasted addition (matrix + vector):
 tensor([[1.5000, 3.0000],
        [3.5000, 5.0000]])
- Stacked vectors:
 tensor([[0.5000, 1.0000],
        [0.5000, 1.0000]])


### Indexing Tensors

Indexing lets you read or write specific elements of a tensor. PyTorch adopts NumPy's slicing rules, so you can select rows, columns, or sub-blocks with `tensor[start:stop:step]`. Use commas to index multiple dimensions at once, ellipses (`...`) to skip middle dimensions, and integer or list indices for fancy indexing. Boolean masks (tensors of `True`/`False`) let you keep only the elements that satisfy a condition. Mastering indexing is key when you prepare batches or extract predictions.

In [None]:
grid = torch.arange(1, 13).reshape(3, 4)
print('- Grid:', grid)

first_row = grid[0]
print('- First row:', first_row)

last_column = grid[:, -1]
print('- Last column:', last_column)

center_block = grid[1:, 1:3]
print('- Center block:', center_block)

every_other = grid[:, ::2]
print('- Every other column:', every_other)

mask = grid > 6
print('- Boolean mask:', mask)
print('- Values > 6:', grid[mask])

- Grid: tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
- First row: tensor([1, 2, 3, 4])
- Last column: tensor([ 4,  8, 12])
- Center block: tensor([[ 6,  7],
        [10, 11]])
- Every other column: tensor([[ 1,  3],
        [ 5,  7],
        [ 9, 11]])
- Boolean mask: tensor([[False, False, False, False],
        [False, False,  True,  True],
        [ True,  True,  True,  True]])
- Values > 6: tensor([ 7,  8,  9, 10, 11, 12])


## Dynamic Computation Graph and Backpropagation

PyTorch builds a computation graph on the fly as you run operations. Each tensor can remember how it was created, and when `requires_grad=True`, PyTorch tracks the operations so that it can compute derivatives later.

To compute gradients you call `.backward()` (usually on a scalar loss). PyTorch walks the graph in reverse (backpropagation) and fills the `.grad` attribute of leaf tensors (typically model parameters or inputs). If a tensor should participate in gradient computation, set `tensor.requires_grad_()` or create it with `requires_grad=True`. When you are finished using the gradients, make sure to reset them with `tensor.grad.zero_()` or `optimizer.zero_grad()` to avoid accidental accumulation.

In [None]:
x = torch.tensor([2.0, -1.0], requires_grad=True)

y = (x ** 2).sum()
print('Loss value:', y.item())

y.backward()
print('- Gradient stored in x.grad:', x.grad)  # x.grad returns the derivative of y (because we called y.backward()) with respect to x
# Since y = x[0] ** 2 + x[1] ** 2
# x.grad should be tensor([2 * x[0], 2 * x[1]])
# which is tensor([4, -2])

x.grad.zero_()  # Important to reset the gradient
z = (3 * x).sum()
z.backward()
print('- New gradient after second backward:', x.grad)  # x.grad returns the derivative of z (because we called z.backward()) with respect to x
# Here y = 3 * x[0] + 3 * x[1]
# x.grad should be tensor([3, 3])

Loss value: 5.0
- Gradient stored in x.grad: tensor([ 4., -2.])
- New gradient after second backward: tensor([3., 3.])


## GPU Support

One of PyTorch's strengths is the ability to run the same code on CPUs or GPUs. GPU execution is much faster for large models because thousands of operations can run in parallel. Before moving tensors or models, check whether a CUDA-enabled GPU is available with `torch.cuda.is_available()`. Then create a `torch.device` that points to `"cuda"` or falls back to `"cpu"`. Moving tensors or modules to the device is explicit, which keeps you aware of where the computation happens.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('- Selected device:', device)

data = torch.arange(5)
print('- Default device:', data.device)

data_on_device = data.to(device)
print('- After move:', data_on_device.device)

- Selected device: cpu
- Default device: cpu
- After move: cpu


Using a GPU means every tensor that interacts in an operation must live on the same device. Mixing CPU and GPU tensors raises an error because PyTorch cannot implicitly copy data for you. When you want to visualize or convert a GPU tensor to a NumPy array, first move it to the CPU using .cpu(). If the tensor requires gradients, call .detach() beforehand to remove it from the computation graph. The .detach() method returns a new tensor that shares the same data but is no longer tracked by PyTorch’s autograd system.

## Models

Models in PyTorch are subclasses of `torch.nn.Module`. A module bundles parameters, buffers, and the computations that produce outputs. You can assemble models from prebuilt layers found in `torch.nn` or write your own subclass by defining `__init__` and `forward`. Pretrained models from libraries such as `torchvision` or `torch.hub` are also `nn.Module` instances, so the usage pattern is the same: create or load the module, send inputs through it, and optimise its parameters.

Common building blocks include:

- `nn.Linear(in_features, out_features)`: affine transformation that multiplies inputs by a weight matrix and adds a bias term.
- `nn.ReLU()`, `nn.Sigmoid()`, `nn.Softmax(dim)`: activation layers that apply nonlinear functions elementwise (or across a dimension for `Softmax`). You can also call functional counterparts such as `torch.nn.functional.relu` if you prefer not to instantiate modules.
- `nn.Dropout(p)` and `nn.BatchNorm1d(num_features)`: regularisation layers that behave differently during training and evaluation.

When subclassing `nn.Module`, implement:

- `__init__(self)`: define submodules or parameters and register them as attributes. Calling `super().__init__()` first ensures PyTorch tracks them.
- `forward(self, x)`: describe the computation performed at every call. This method receives input tensors, applies submodules or operations, and returns outputs. You normally avoid heavy side effects here because `forward` runs during every training and evaluation step.


In [None]:
import torch.nn as nn

prebuilt_mlp = nn.Sequential(
    nn.Linear(4, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
)
# nn.Sequential chains multiple nn.Module layers together. The order they are listed determines the computation order.
# In our prebuilt_mlp, the data first traverses the nn.Linear(4, 8) layer, then the nn.ReLU() and finally the nn.Linear(8, 1)

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 8)
        self.activation = nn.ReLU()
        self.output = nn.Linear(8, 1)

    def forward(self, x):
        x = self.activation(self.hidden(x))
        return self.output(x)

custom_model = MLP()

sample_input = torch.rand(2, 4)
print('- Sequential model output:', prebuilt_mlp(sample_input))
print('- Custom model output:', custom_model(sample_input))

- Sequential model output: tensor([[-0.1587],
        [-0.1259]], grad_fn=<AddmmBackward0>)
- Custom model output: tensor([[-0.2893],
        [-0.3136]], grad_fn=<AddmmBackward0>)


### Parameters

Every learnable weight inside a module is stored as an instance of `torch.nn.parameter.Parameter`. Modules register parameters automatically when you assign `nn.Module` layers inside `__init__`, but you can also create standalone parameters with `nn.Parameter(tensor)`.

Helpful utilities:

- Count parameters with `sum(p.numel() for p in model.parameters())` or inspect only trainable ones by adding `if p.requires_grad`.
- Custom initialisation can be done with functions from `torch.nn.init`, e.g. `nn.init.xavier_uniform_(layer.weight)` or by manipulating `parameter.data` directly (inside a `torch.no_grad()` block).
- `model.named_parameters()` yields `(name, parameter)` pairs, which is useful for logging or applying different learning rates.


In [None]:
total_params = sum(p.numel() for p in custom_model.parameters() if p.requires_grad)
print('Trainable parameters of our previously created custom_model:', total_params)

for name, param in custom_model.named_parameters():
    print(f'{name}: shape={tuple(param.shape)}, requires_grad={param.requires_grad}')


Trainable parameters of our previously created custom_model: 49
hidden.weight: shape=(8, 4), requires_grad=True
hidden.bias: shape=(8,), requires_grad=True
output.weight: shape=(1, 8), requires_grad=True
output.bias: shape=(1,), requires_grad=True


## Data Manipulation

Training rarely uses raw tensors directly. The `torch.utils.data.TensorDataset` abstraction lets you describe how to load a single sample (`__getitem__`) and how many samples you have (`__len__`). The `DataLoader` wraps a dataset to create batches, optionally shuffles the order, and can load data in parallel. Batching is essential because it keeps memory usage under control while still providing stable gradient estimates.

In [None]:
from torch.utils.data import TensorDataset, DataLoader

# Generation of example inputs and targets
inputs = torch.linspace(-1, 1, steps=12).unsqueeze(1)
targets = inputs.pow(2)

# Initialize TensorDataset and DataLoader
dataset = TensorDataset(inputs, targets)
loader = DataLoader(dataset, batch_size=4, shuffle=True)

# You can access each batch of data iteratively
batch = 0
for batch_inputs, batch_targets in loader:
    print('Batch:', batch)
    print('Batch inputs:\n', batch_inputs)
    print('Batch targets:\n', batch_targets)
    batch += 1

Batch: 0
Batch inputs:
 tensor([[-0.6364],
        [-0.2727],
        [ 0.0909],
        [-0.0909]])
Batch targets:
 tensor([[0.4050],
        [0.0744],
        [0.0083],
        [0.0083]])
Batch: 1
Batch inputs:
 tensor([[ 1.0000],
        [ 0.8182],
        [-1.0000],
        [-0.8182]])
Batch targets:
 tensor([[1.0000],
        [0.6694],
        [1.0000],
        [0.6694]])
Batch: 2
Batch inputs:
 tensor([[ 0.6364],
        [ 0.4545],
        [ 0.2727],
        [-0.4545]])
Batch targets:
 tensor([[0.4050],
        [0.2066],
        [0.0744],
        [0.2066]])


## Optimization

Training a model is an iterative optimisation process. For each batch you:

1. Load a batch from the `DataLoader`.
2. Run the model to obtain predictions.
3. Evaluate a loss function that compares predictions with targets.
4. Call `loss.backward()` to compute gradients.
5. Update the parameters with an optimiser such as SGD or Adam.

### Loss Modules

Loss modules in `torch.nn` wrap common objective functions (`nn.MSELoss`, `nn.CrossEntropyLoss`, etc.). They usually accept predictions and targets and return a scalar tensor with `requires_grad=True`, ready for backpropagation.

### Optimizers

Optimisers in `torch.optim` (e.g. `SGD`, `Adam`, `RMSprop`) manage how parameters are updated. They expect the model's parameters and hyperparameters like the learning rate. The standard pattern is `optimizer.zero_grad()`, `loss.backward()`, and `optimizer.step()` every iteration. All in all, the optimizer updates the parameters of the model in order to minimize the loss.

In [None]:
model = torch.nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
loss_fn = torch.nn.MSELoss()

# Generation of data + labels
x_batch = torch.tensor([[0.0], [1.0], [2.0]])
y_batch = 3 * x_batch + 1

predictions = model(x_batch)
loss = loss_fn(predictions, y_batch)
print('Initial loss:', loss.item())
print('Initial parameters:', model.weight.data, model.bias.data)

optimizer.zero_grad()
loss.backward()
optimizer.step()

updated_predictions = model(x_batch)
updated_loss = loss_fn(updated_predictions, y_batch)
print('Updated loss:', updated_loss.item())
print('Updated parameters:', model.weight.data, model.bias.data)

Initial loss: 29.755882263183594
Initial parameters: tensor([[-0.5987]]) tensor([0.0028])
Updated loss: 8.413677215576172
Updated parameters: tensor([[0.8003]]) tensor([0.9220])


## Training a Model

A full training loop repeats the optimisation steps for many epochs (passes over the dataset). During training you may log metrics, adjust the learning rate, or validate on a separate dataset. Switching between `model.train()` and `model.eval()` toggles behaviours such as dropout or batch-normalisation, so remember to call `model.train()` before the training loop.

In [None]:
from torch.utils.data import DataLoader, TensorDataset

torch.manual_seed(0)

# Generate data + labels
x_values = torch.linspace(-1, 1, steps=100).unsqueeze(1)
y_values = 2 * x_values + 1 + 0.1 * torch.randn_like(x_values)
#        weight=2    bias=1 + noise

# Initialize TensorDataset and DataLoader
training_dataset = TensorDataset(x_values, y_values)
training_loader = DataLoader(training_dataset, batch_size=16, shuffle=True)

# Initialize model, optimizer, loss function and put model in training mode
linear_model = torch.nn.Linear(1, 1)
optimizer = torch.optim.SGD(linear_model.parameters(), lr=0.2)
loss_fn = torch.nn.MSELoss()
linear_model.train()

# Training loop
for epoch in range(1, 51):  # Use all the data for gradient descent 50 times
    epoch_loss = 0.0
    for batch_x, batch_y in training_loader:  # For every batch of data
        optimizer.zero_grad()
        preds = linear_model(batch_x)
        loss = loss_fn(preds, batch_y)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    if epoch % 10 == 0:  # Print updates every 10 epochs
        avg_loss = epoch_loss / len(training_loader)
        print(f'Epoch {epoch:02d} - average loss: {avg_loss:.4f}')

print('Learned weight:', linear_model.weight.item())
print('Learned bias:', linear_model.bias.item())
trained_model = linear_model

Epoch 10 - average loss: 0.0129
Epoch 20 - average loss: 0.0120
Epoch 30 - average loss: 0.0100
Epoch 40 - average loss: 0.0105
Epoch 50 - average loss: 0.0101
Learned weight: 2.005248546600342
Learned bias: 1.025797724723816


### Saving and Loading a Model

After training you usually want to persist the model's learned parameters. The recommended approach is to save the `state_dict`, a simple mapping from parameter names to tensors. Saving the full module with `torch.save(model.state_dict(), path)` keeps files lightweight and device agnostic. Later, create the same model architecture, load the saved dictionary with `load_state_dict`, and optionally move it to the desired device.

In [None]:
model_path = 'linear_regression.pth'
torch.save(trained_model.state_dict(), model_path)
print(f'State dict saved to {model_path}')

# Recreate model structure
reloaded_model = torch.nn.Linear(1, 1)
# Load saved parameters
reloaded_model.load_state_dict(torch.load(model_path))
# Put model in evaluation mode
reloaded_model.eval()

sample_input = torch.tensor([[0.5]])
with torch.no_grad():
    original_pred = trained_model(sample_input)
    reloaded_pred = reloaded_model(sample_input)
print('- Original model prediction:', original_pred.item())
print('- Reloaded model prediction:', reloaded_pred.item())

State dict saved to linear_regression.pth
- Original model prediction: 2.0284218788146973
- Reloaded model prediction: 2.0284218788146973


### Evaluating a Model

Evaluation mode disables training-specific layers and stops tracking gradients, which makes inference faster and safer. Wrap evaluation code in `with torch.no_grad():` to skip autograd bookkeeping. Compute metrics such as accuracy, mean absolute error, or F1 score to understand how well the model generalises.

In [None]:
trained_model.eval()

with torch.no_grad():
    predictions = trained_model(x_values)
    mse = torch.mean((predictions - y_values) ** 2)
    print(f'Mean squared error on training data: {mse:.4f}')

    example_points = torch.tensor([[-0.2], [0.0], [0.8]])
    example_preds = trained_model(example_points)
print('Example predictions:', example_preds.squeeze().tolist())

Mean squared error on training data: 0.0110
Example predictions: [0.6247479915618896, 1.025797724723816, 2.6299965381622314]


## Example: simple use case

To tie everything together, let's train a tiny neural network to classify 2D points into two classes. We'll generate synthetic data, wrap it in a `DataLoader`, define a model, train it on a chosen device, and measure accuracy. The pattern mirrors what you would do with a real dataset—only the data loader and model complexity grow.

In [None]:
torch.manual_seed(1)

num_samples = 200
class0 = torch.randn(num_samples, 2) - 1.0
class1 = torch.randn(num_samples, 2) + 1.0

features = torch.cat([class0, class1], dim=0)
labels = torch.cat([torch.zeros(num_samples), torch.ones(num_samples)], dim=0).long()

perm = torch.randperm(features.size(0))  # randomise order
features = features[perm]
labels = labels[perm]

dataset = TensorDataset(features, labels)
loader = DataLoader(dataset, batch_size=32, shuffle=True)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
classification_model = torch.nn.Sequential(
    torch.nn.Linear(2, 16),
    torch.nn.ReLU(),
    torch.nn.Linear(16, 1)
).to(device)

loss_fn = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(classification_model.parameters(), lr=0.01)

# Training loop
classification_model.train()
for epoch in range(1, 21):
    epoch_loss = 0.0
    for batch_features, batch_labels in loader:
        batch_features = batch_features.to(device)
        batch_labels = batch_labels.to(device).float()

        optimizer.zero_grad()
        logits = classification_model(batch_features).squeeze(1)
        loss = loss_fn(logits, batch_labels)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    if epoch % 5 == 0:
        print(f'Epoch {epoch:02d} - loss: {epoch_loss / len(loader):.4f}')

# Evaluation on the whole dataset
classification_model.eval()
with torch.no_grad():
    logits = classification_model(features.to(device)).squeeze(1)
    predictions = (torch.sigmoid(logits) > 0.5).long().cpu()
    accuracy = (predictions == labels).float().mean().item()

print(f'Accuracy on the generated dataset: {accuracy:.3f}')
print('Sample predictions:', predictions[:10].tolist())
print('Sample labels:     ', labels[:10].tolist())

Epoch 05 - loss: 0.2053
Epoch 10 - loss: 0.2027
Epoch 15 - loss: 0.2074
Epoch 20 - loss: 0.1985
Accuracy on the generated dataset: 0.915
Sample predictions: [1, 1, 0, 0, 0, 1, 0, 1, 1, 1]
Sample labels:      [1, 1, 1, 0, 0, 1, 0, 1, 1, 1]


## Conclusion

You now know how to create and manipulate tensors, build models, prepare data loaders, train with optimisation loops, leverage GPUs, save checkpoints, and evaluate results. PyTorch's dynamic graph and clean API make experimentation approachable, so keep iterating on these building blocks with your own datasets and architectures. Whenever you need deeper details, refer to the official [PyTorch documentation](https://pytorch.org/docs/stable/index.html).