# Lecture 13: Introduction to Machine Learning with Python

## Learning Goals
* Understand the basic concepts of machine learning.
* Know core PyTorch concepts: tensors, datasets, modules, autograd, optimisers.
* Apply PyTorch to build and train a simple machine learning model.

## Basic Concepts of Machine Learning
* Machine learning involves training models to make predictions or decisions based on data.

<center><img src="images/ML_overview.png" alt="ML Overview schematic"/></center>

* In this introduction, we will focus on **supervised learning**.

## Background

### What is Machine Learning?
* Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from and make decisions based on data.

* Unlike traditional programming, where explicit instructions are provided, machine learning algorithms identify patterns in data and make predictions or decisions without being explicitly programmed for specific tasks.

* Common applications of machine learning include:
  - Image and speech recognition
  - Natural language processing
  - Recommender systems
  - Fraud detection
  - Autonomous vehicles

### Regard Machine Learning as Function Approximation
* At its core, machine learning can be viewed as a function approximation problem. (Here we take supervised learning as an example.)

* Given a dataset $D$ consisting of input-output pairs $(x_i, y_i)$, the goal is to learn a function $f(x; \theta)$ parameterised by $\theta$ that maps inputs $x$ to outputs $y$.

* The learning process involves finding the optimal parameters $\theta^*$ that minimise a loss function $L(y, f(x; \theta))$, which quantifies the difference between the predicted outputs and the true outputs.

* The optimisation problem can be formulated as:
$$\theta^* = \arg\min_{\theta} \sum_{i=1}^{N} L(y_i, f(x_i; \theta))$$

* Common loss functions include Mean Squared Error (MSE) for regression tasks and Cross-Entropy Loss for classification tasks.

* The optimisation is typically performed using gradient-based methods, such as Stochastic Gradient Descent (SGD) or Adam.

### Gradient Descent
* Gradient descent is an optimisation algorithm used to minimise the loss function by iteratively updating the model parameters in the direction of the negative gradient.

* The update rule for gradient descent is given by:
$$\theta^{(t+1)} = \theta^{(t)} - \eta \nabla_{\theta} L(y, f(x; \theta^{(t)}))$$
where $\eta$ is the learning rate, and $\nabla_{\theta} L(y, f(x; \theta^{(t)}))$ is the gradient of the loss function with respect to the parameters $\theta$ at iteration $t$.

* Variants of gradient descent include:
  - Batch Gradient Descent: Uses the entire dataset to compute the gradient.
  - Stochastic Gradient Descent (SGD): Uses a single data point to compute the gradient.
  - Mini-batch Gradient Descent: Uses a small batch of data points to compute the gradient.

* In practice, mini-batch gradient descent is commonly used as it balances the efficiency of batch gradient descent and the noise reduction of SGD.

### Machine Learning Framework

* Key components of a machine learning workflow:
  - Data Collection: Gathering relevant data for training.
  - Data Preprocessing: Cleaning and transforming data into a suitable format.
  - Model Selection: Choosing an appropriate algorithm or architecture.
  - Training: Optimising the model parameters using training data.
  - Validation: Assessing model performance on unseen data.
  - Deployment: Integrating the model into a production environment.

### Machine Learning Framework
* A machine learning framework provides tools and libraries to facilitate the development, training, and deployment of machine learning models.
* Popular machine learning frameworks include:
  - PyTorch
  - TensorFlow
  - Scikit-learn
  - Keras
* In this introduction, we will focus on **PyTorch**, which is nowadays widely used for deep learning applications.

* In this introduction, we will focus on the **training** and **validation** steps using PyTorch.

## Environment Setup

* We need to first set up a Python environment with the necessary libraries.
* It is good practice to use virtual environments to manage dependencies. We will use `venv` to create a virtual environment and install PyTorch.
* Note: the exclamation mark `!` below is used to run shell commands in Jupyter notebooks. When running in a terminal, you must **NOT** include it.

In [None]:
!python -m venv ml_env
!source ml_env/bin/activate
!pip install torch torchvision

* You can run PyTorch on both CPU and GPU. If you have a compatible NVIDIA GPU, you can install the CUDA version for better performance.
* The following code block checks if PyTorch is installed correctly and whether a GPU is available.

In [None]:
import torch

In [None]:
print("PyTorch version:", torch.__version__)

In [None]:
print("CUDA available:", torch.cuda.is_available())

## PyTorch Basics
* PyTorch is a popular deep learning library that provides tools for building and training neural networks.

* Key concepts in PyTorch:
  - Tensors: Multi-dimensional arrays that are the basic building blocks of PyTorch.
  - Datasets and DataLoaders: Utilities for handling and loading data.
  - Modules: Building blocks for neural networks (e.g., layers).
  - Autograd: Automatic differentiation for computing gradients.
  - Optimisers: Algorithms for updating model parameters during training.

### Tensors
* Tensors are similar to NumPy arrays but can also be used on GPUs for acceleration.
* You can create tensors in various ways, such as from lists or using random values.

* Here is an example of creating a random tensor and checking its properties.
* You can specify the device (CPU or GPU) and data type (e.g., float32, int64) when creating tensors.

In [None]:
x = torch.randn(32, 3, 244, 244, device="cpu", dtype=torch.float32)
print("Tensor:", x)

In [None]:
print("Tensor shape:", x.shape)

In [None]:
print("Tensor device:", x.device)

In [None]:
print("Tensor dtype:", x.dtype)

### Datasets and DataLoaders
* Datasets are used to represent a collection of data samples.
* DataLoaders provide an efficient way to iterate over datasets in batches, with options for shuffling and parallel loading.
* You can create custom datasets by subclassing `torch.utils.data.Dataset` with `__len__` and `__getitem__` methods.

* Here is an example of a simple custom dataset and using a DataLoader to iterate over it in batches.

In [None]:
from torch.utils.data import Dataset, DataLoader
class SimpleDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

In [None]:
data = torch.randn(100, 10)
print("Data shape:", data.shape)

In [None]:
labels = torch.randint(0, 2, (100,))
print("Labels shape:", labels.shape)

In [None]:
dataset = SimpleDataset(data, labels)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

In [None]:
for batch_data, batch_labels in dataloader:
    print("Batch data shape:", batch_data.shape)
    print("Batch labels shape:", batch_labels.shape)

* Note that the last batch may be smaller than the specified batch size if the total number of samples is not divisible by the batch size.
* In practice, you would typically use built-in datasets from libraries like `torchvision` for image data or `torchtext` for text data.

### Modules
* Modules are the building blocks of neural networks in PyTorch.
* You can create custom modules by subclassing `torch.nn.Module` and defining the `__init__` and `forward` methods.
* Here is an example of a simple feedforward neural network module.

In [None]:
import torch.nn as nn
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) # Fully connected layer 
        self.relu = nn.ReLU() # Activation function
        self.fc2 = nn.Linear(hidden_size, output_size) # Fully connected layer 
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

In [None]:
model = SimpleNN(input_size=10, hidden_size=20, output_size=2)

In [None]:
print(model)

* You can combine multiple modules to create more complex architectures.

### Autograd
* Autograd is PyTorch's automatic differentiation engine that computes gradients for tensor operations.
* When you perform operations on tensors with `requires_grad=True`, PyTorch builds a computation graph to track these operations.
* You can compute gradients by calling the `backward()` method on a tensor.

* Here is an example of using autograd to compute gradients.

In [None]:
x = torch.randn(5, requires_grad=True)
print("Input tensor:", x)

In [None]:
y = x ** 2 + 3 * x + 2
print("Output tensor:", y)

In [None]:
y_sum = y.sum()
print("Sum of output tensor:", y_sum)

In [None]:
y_sum.backward()
print("Gradients:", x.grad)

### Optimisers
* Optimisers are algorithms used to update model parameters based on computed gradients.
* PyTorch provides several built-in optimisers in the `torch.optim` module, such as SGD and Adam.
* You need to create an optimiser instance by passing the model parameters and learning rate.

* Here is an example of using the Adam optimiser to update model parameters.

In [None]:
import torch.optim as optim
model = SimpleNN(input_size=10, hidden_size=20, output_size=2)
optimizer = optim.Adam(model.parameters(), lr=0.001)
print("Initial model parameters:")
for param in model.parameters():
    print(param)

In [None]:
# Create dummy input and target
input_data = torch.randn(32, 10)
target = torch.randint(0, 2, (32,))
print("Input data shape:", input_data.shape)
print("Target shape:", target.shape)

In [None]:
# Forward pass
output = model(input_data)
print("Output shape:", output.shape)

In [None]:
# Compute loss
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print("Loss:", loss.item())

In [None]:
# Backward pass and optimisation step
optimizer.zero_grad()
loss.backward()
optimizer.step()
print("Updated model parameters:")
for param in model.parameters():
    print(param)

## Training and Validation Loop
* A typical training loop involves iterating over the dataset, performing forward and backward passes, and updating model parameters.
* After training, you should evaluate the model on a validation or test dataset to assess its performance.

* Here is a simplified example of a training and validation loop.

In [None]:
# Create dummy input and target
input_data = torch.randn(100, 10)
target = torch.sum(input_data, dim=1).long() % 2  # Binary classification target
print("Input data shape:", input_data.shape)
print("Target shape:", target.shape)

In [None]:
# Create dataset and dataloader
dataset = SimpleDataset(input_data, target)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

In [None]:
# Create model, criterion, and optimiser
model = SimpleNN(input_size=10, hidden_size=20, output_size=2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

In [None]:
# Training and validation loop
num_epochs = 10
for epoch in range(num_epochs):
    # Training phase
    model.train()
    for batch_data, batch_labels in dataloader:
        optimizer.zero_grad()
        output = model(batch_data)
        loss = criterion(output, batch_labels)
        loss.backward()
        optimizer.step()

    # Validation phase
    model.eval()
    with torch.no_grad():
        total_correct = 0
        total_samples = 0
        for batch_data, batch_labels in dataloader:
            output = model(batch_data)
            _, predicted = torch.max(output, 1)
            total_correct += (predicted == batch_labels).sum().item()
            total_samples += batch_labels.size(0)
        accuracy = total_correct / total_samples

    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {loss.item():.4f}, Accuracy: {accuracy * 100:.2f}%.")

## Saving and Loading Models
* After training a model, you may want to save its parameters for later use.
* PyTorch provides functions to save and load model state dictionaries.
* Here is an example of saving and loading a model.

In [None]:
# Save model
torch.save(model.state_dict(), "simple_nn.pth")

In [None]:
# Load model
loaded_model = SimpleNN(input_size=10, hidden_size=20, output_size=2)
loaded_model.load_state_dict(torch.load("simple_nn.pth"))
print("Loaded model parameters:")
for param in loaded_model.parameters():
    print(param)

## Example: Training a MLP classifier on MNIST dataset.
* Let's put everything together and train a simple MLP classifier on the MNIST dataset using PyTorch.
* We will use the `torchvision` library to load the MNIST dataset.
* Note: In case you get errors when downloading the datasets, you can download them from [here](https://github.com/golbin/TensorFlow-MNIST/tree/master/mnist/data) and place them under `./mnist_data/MNIST/raw`.
* Note: The training process may take some time depending on your hardware. You can prepare some coffee and enjoy!

In [None]:
import torchvision
import torchvision.transforms as transforms

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = torchvision.datasets.MNIST(root='./mnist_data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.MNIST(root='./mnist_data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Define MLP model
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Instantiate model, criterion, and optimiser
model = MLP()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training and validation loop
num_epochs = 10
for epoch in range(num_epochs):
    # Training phase
    model.train()
    for batch_data, batch_labels in train_loader:
        optimizer.zero_grad()
        output = model(batch_data)
        loss = criterion(output, batch_labels)
        loss.backward()
        optimizer.step()

    # Validation phase
    model.eval()
    with torch.no_grad():
        total_correct = 0
        total_samples = 0
        for batch_data, batch_labels in test_loader:
            output = model(batch_data)
            _, predicted = torch.max(output, 1)
            total_correct += (predicted == batch_labels).sum().item()
            total_samples += batch_labels.size(0)
        accuracy = total_correct / total_samples

    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {loss.item():.4f}, Accuracy: {accuracy * 100:.2f}%.")


In [None]:
# Show some example predictions
import matplotlib.pyplot as plt

dataiter = iter(test_loader)
images, labels = next(dataiter)
outputs = model(images)
_, predicted = torch.max(outputs, 1)
fig, axes = plt.subplots(2, 4, figsize=(10, 5))
for i in range(8):
    ax = axes[i // 4, i % 4]
    ax.imshow(images[i].squeeze(), cmap='gray')
    ax.set_title(f"Pred: {predicted[i].item()}, True: {labels[i].item()}")
    ax.axis('off')
plt.show()

## Common PyTorch Pitfalls

* Mixing up CPU and GPU tensors, leading to runtime errors.
* Here is an example that raises an error due to device mismatch.
* Note: This example only behaves as expected if you have a CUDA-capable GPU and the CUDA version of PyTorch installed.

In [None]:
x = torch.randn(5, device="cpu")
# y = torch.randn(5, device="cuda") # NOTE, I don't have cuda installed on my Mac
y = torch.randn(5, device="mps") # This is the apple silicon device
z = x + y  # RuntimeError: mismatched devices

* Wrong data type for labels, especially in classification tasks (e.g., using float instead of long for class indices).
* Here is an example that raises an error due to incorrect label data type.

In [None]:
output = torch.randn(3, 5)  # Batch size 3, 5 classes
labels = torch.tensor([1.0, 0.0, 4.0])  # Float tensor instead of Long
criterion = nn.CrossEntropyLoss()
loss = criterion(output, labels)  # RuntimeError: expected Long but got Float

* Other common pitfalls include:
  - Not setting up the correct mode with `model.train()` or `model.eval()`. This may affect layers like dropout and batch normalisation.
  - Forgetting to call `optimizer.zero_grad()` before computing the loss, which will errorfully accumulate gradients.
  - Forgetting to shuffle the training data. This may lead to poor generalisation.
  - Using an inappropriate learning rate. This can cause slow convergence or divergence.
  - Incorrectly reshaping tensors. This can cause dimension mismatches.