## Machine Learning vs Deep Learning

### Introduction
Machine learning is a subset of Artificial Intelligence (AI) that enables computers to learn from data without being explicitly programmed. It involves algorithms that identify patterns in data and make predictions or decisions based on them. Common machine learning techniques include:
- *Supervised Learning* (e.g., linear regression, decision trees, support vector machines)
- *Unsupervised Learning* (e.g., clustering, principal component analysis)
- *Reinforcement Learning* (e.g., Q-learning, policy gradients)

*Deep Learning (DL)* is a specialised subset of machine learning that uses artificial neural networks with multiple layers to model complex data representations. Deep learning models can automatically extract features from raw data, reducing the need for manual feature engineering like that we have seen with traditional models.

The "deep" in deep learning refers to the number of layers in a neural network. Traditional machine learning models, like logistic regression or decision trees, operate with a few layers of computation. In contrast, deep learning models use multiple hidden layers (often more than three), allowing them to learn hierarchical and complex features from the data. For example, in image recognition:

- The *first layers* detect simple patterns like edges and textures.
- The *middle layers* recognise shapes and structures.
- The *final layers* identify objects and their relationships.

### When should you move to Deep Learning?

You should consider using deep learning when:
- Data size is large – deep learning thrives on large datasets (millions of examples). If you have limited data, traditional machine learning models often perform better.
- Feature engineering is complex – if manually extracting features from data is challenging (e.g., images, speech, raw text), deep learning can automatically learn these representations.
- High computational power is available – training deep networks requires powerful GPUs/TPUs. If you lack resources, simpler machine learning models might be more efficient.
- Problem complexity is high – tasks such as natural language processing, image recognition, and reinforcement learning benefit from deep learning because of its ability to capture intricate patterns.
- End-to-End learning is required – when you want the model to learn directly from raw data (e.g., pixels in an image, audio waveforms) without extensive preprocessing.

### When to stick with traditional machine learning?
- Small to medium datasets – if your dataset is small, simpler models (e.g., random forests, SVMs) often generalise better.
- Interpretable models – deep learning models are often "black boxes", whereas decision trees and linear models provide clear decision-making insights.
- Faster training & deployment – traditional machine learning models train faster and require fewer computational resources.

In summary, deep learning is a powerful tool for complex problems with large datasets, but traditional machine learning is often preferable for smaller, structured data and interpretable models.

### Understanding TensorFlow, Keras, and PyTorch
Deep learning is a subset of machine learning that mimics the way the human brain processes information using artificial neural networks. These networks consist of layers of interconnected neurons that can learn complex patterns from data. We will cover the fundamentals of deep learning by looking at TensorFlow, Keras, and PyTorch, and demonstrate their use on a real dataset.

### Install Python libraries

In [None]:
!pip install tensorflow torch

### MNIST Dataset

The *MNIST* dataset is one of the most well-known resources in machine learning, especially for image classification tasks. It consists of 70,000 grayscale images of handwritten digits ranging from 0 to 9. These images are already split into two sets: 60,000 are used for training a model, and the remaining 10,000 are reserved for testing how well the model performs on unseen data.

Each image in the dataset is 28 by 28 pixels in size and contains a single handwritten digit, with the corresponding label provided. Because of its clean and consistent format, the MNIST dataset is especially helpful for beginners. The images are already centred and size-normalised, which means there’s very little preprocessing needed before feeding them into a model.

One of the main reasons MNIST is so widely used is that it allows developers and researchers to quickly prototype and test ideas without needing a large or complex dataset. It’s often used to compare the performance of different algorithms, from basic classifiers like logistic regression to more advanced deep learning models like convolutional neural networks (CNNs). While MNIST is relatively simple by today’s standards, it remains a great first step in learning how to work with image data and deep learning.

### Load the data
The dataset contains both a training set and a test set, so the data has already been resampled as it were:

In [None]:
from tensorflow.keras.datasets import mnist

# Load dataset (MNIST)
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

### Preprocess
The main step we perform is to normalise the pixel values:

In [None]:
X_train, X_test = X_train / 255.0, X_test / 255.0  # Normalising pixel values

You can think of a *neural network* as a flexible, powerful extension of simpler models like *logistic regression*. In logistic regression, we’re essentially drawing a straight or curved line (depending on the input space) through the data to separate categories like “yes” vs “no” or “cat” vs “dog”. This line is learned by adjusting weights so that the model fits the training data as well as possible, without overfitting.

A *neural network* works in a similar way, but instead of just one line or curve, it’s like drawing a very flexible, squiggly line that can twist and bend through the data in much more complex ways. Each *neuron* in the network contributes a little curve or bend to this overall shape. As you stack more layers and add more neurons, the network gets better at shaping the decision boundary in ways that simpler models can't, it becomes able to separate data that’s tangled, overlapping, or follows patterns too complex for straight lines.

In this sense, deep learning is like giving your model a pencil that can draw not just straight lines or simple curves, but intricate, detailed shapes that thread through the data, adapting to the underlying structure in order to classify or predict more accurately. The process of training, adjusting all those weights, is what sculpts that squiggly line to best fit the data.

But just like with logistic regression, there's a balance. Too much bending (overfitting), and the model memorises the training data instead of learning general rules. The challenge of using neural networks is to get the line just flexible enough to learn real patterns without chasing the noise in our data.

> In the next few sections we will discuss the main deep learning Python libraries you can use. We will not cover the architecture involved in constructing these models. For now, we will focus on looking at the key differences between the available APIs. After this gentle introduction, we will look deeper into the architecture and methods needed to work with deep learning models.

### Introduction to TensorFlow
*TensorFlow* is an open-source platform developed by Google for building and training machine learning and deep learning models. It provides a powerful yet flexible way to create systems that can learn from data, recognise patterns, and make predictions, such as classifying images, translating languages, or detecting spam emails.

TensorFlow allows developers and researchers to build models using a set of modular building blocks, like layers of artificial neurons, mathematical operations, and training algorithms. These components can be combined to create simple models for beginners or complex neural networks for advanced applications like image recognition or natural language processing.

One of TensorFlow’s strengths is that it efficiently manages computations using CPUs or GPUs (or even TPUs), and supports automatic differentiation (automatically computes derivatives), which is crucial for training models. It also integrates well with high-level APIs like *Keras*, making it easier to design, train, and evaluate models with minimal code.

TensorFlow is widely used in both academia and industry due to its flexibility and, scalability. Let's demonstrate how to work with tensorflow and our dataset:

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist # Load the dataset

# Build a simple feedforward neural network model
model = tf.keras.Sequential([

    # This layer flattens the 28x28 input images into 1D vectors of length 784
    keras.layers.Flatten(input_shape=(28, 28)),

    # A hidden layer with 128 neurons and ReLU activation function
    keras.layers.Dense(128, activation='relu'),

    # Output layer with 10 neurons (one for each digit 0–9) and softmax activation
    # Softmax turns the output into a probability distribution
    keras.layers.Dense(10, activation='softmax')
])

# Compile the model with appropriate settings:
# Adam optimiser: adjusts the weights efficiently
# Sparse categorical crossentropy: suitable for integer class labels
# Accuracy: track how often predictions match the labels
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train the model for 5 passes (epochs) through the training data:
# Also evaluate accuracy on the test data during training
history = model.fit(X_train, Y_train, epochs=5, validation_data=(X_test, Y_test))

# Print a summary of the model's layers and parameters
print(model.summary())

The model has three layers. The first step (flatten) takes the input (for example, a small image) and unrolls it into a long line of numbers. The second step (dense) connects these numbers to 128 small "units" that help the model learn patterns. And the third step (`dense_1`) reduces everything down to just 10 numbers, usually one for each possible result (like the numbers 0–9 as we are recognising handwritten digits).

In total, the model has about 305,000 small adjustable parts (parameters) that help it learn. About 101,000 of these are actively adjusted during training to improve the model’s performance. The rest are extra pieces the model’s optimiser uses to help improve learning.

We can now evaluate the model by visually inspecting the training and validation loss (more on this later). however, in short, the next part of our code creates two clear charts that help us visualise how well the model is learning as it trains. We will create two plots showing how the model’s performance changes over time, to help us better understand if it’s learning well or if there are problems.

The first thing we do is set up a wide figure space so it can place two plots side by side. On the *left-hand plot*, we draw two lines: one showing the *training loss* (how wrong the model is on the examples it was taught with) and one showing the *validation loss* (how wrong it is on new, unseen examples). Ideally, both of these lines should go down over time, meaning the model is making fewer mistakes as it trains.

On the *right-hand plot*, we draw another pair of lines: one showing *training accuracy* (how often the model gets the training examples right) and the other showing *validation accuracy* (how often it gets the new examples right). Here, we want the lines to go up, showing that the model is improving and getting more correct answers over time.

Viewed together, these two charts give us a clear picture of whether the model is learning effectively or if it’s starting to overfit, that is, doing well on training examples, but struggling on new ones. This kind of visual check is a simple way to monitor how the model is doing:


In [None]:
import matplotlib.pyplot as plt  

plt.figure(figsize=(10, 4))

# Plot the loss (how wrong the model is) over training epochs
plt.subplot(1, 2, 1)  # Create the first subplot (1 row, 2 columns, first plot)

# Plot the training loss over time
plt.plot(history.history['loss'], marker='o', label='Train Loss')
# Plot the validation loss (on unseen data) over time
plt.plot(history.history['val_loss'], marker='o', label='Val Loss')

plt.title('Loss over Epochs')  # Set the plot title
plt.xlabel('Epoch')            # Label the x-axis (number of training rounds)
plt.ylabel('Loss')             # Label the y-axis (error)
plt.grid(True)                 # Add a grid for easier reading
plt.legend()                   # Show the legend (labels for the lines)

# Plot the accuracy (how often the model is right) over epochs
plt.subplot(1, 2, 2)  # Create the second subplot (1 row, 2 columns, second plot)

# Plot the training accuracy over time
plt.plot(history.history['accuracy'], marker='o', label='Train Accuracy')
# Plot the validation accuracy (on unseen data) over time
plt.plot(history.history['val_accuracy'], marker='o', label='Val Accuracy')

plt.title('Accuracy over Epochs')  # Set the plot title
plt.xlabel('Epoch')                # Label the x-axis
plt.ylabel('Accuracy')             # Label the y-axis (percentage correct)
plt.grid(True)                     # Add a grid
plt.legend()                       # Show the legend

# Adjust layout so plots don't overlap and display the figure
plt.tight_layout()
plt.show()


Now the model has been trained we can test it out by passing in an image and predict the label, and create a furter plot to show the result:

In [None]:
import numpy as np
# Choose an image index from the test set to examine
sample_index = 0

# Extract the image and its corresponding true label
sample_image = X_test[sample_index]         # Shape: (28, 28)
true_label = Y_test[sample_index]           # Ground truth digit

# Display the selected image using matplotlib
plt.imshow(sample_image, cmap='gray')       # Show grayscale image
plt.title(f"True Label: {true_label}")      # Title with actual label
plt.axis('off')                             # Hide axis ticks for cleaner look
plt.show()

# Prepare the image for prediction:
# The model expects a batch dimension → input shape should be (1, 28, 28)
input_image = np.expand_dims(sample_image, axis=0)  # Add batch dimension

# Use the trained model to predict probabilities for each digit class (0–9)
predictions = model.predict(input_image)

# Find the class with the highest predicted probability
predicted_label = np.argmax(predictions[0])  # Returns the digit with highest confidence

# Print out the predicted digit
print(f"Predicted Label: {predicted_label}")


### Introduction to Keras
*Keras* is a high-level API that sits on top of lower-level deep learning frameworks like TensorFlow. Its main goal is to make the process of building, training, and testing neural networks much simpler and more intuitive. Instead of writing complex code to define each part of a neural network manually, Keras allows you to build models using easy to understand, modular building blocks like layers, activation functions, and optimisers with just a few lines of code.

In earlier versions, Keras was a separate library that could be used with different backends (like TensorFlow, or Theano). However, it is now fully integrated into TensorFlow itself, which means you no longer need to install or import it separately. You can access all of Keras’s functionality simply by using `tensorflow.keras`.

You may notice that the code written using Keras inside TensorFlow looks almost identical to previous standalone Keras examples. This is because the core principles and syntax haven’t changed, only now, you have the full power and flexibility of TensorFlow behind the scenes, including better performance, access to lower-level tools if needed, and compatibility with TensorFlow’s ecosystem (like data pipelines, visualisation, and deployment tools).

For most users using `tf.keras` is recommended. It offers the simplicity of Keras with the scalability of TensorFlow, making it easier to get started without sacrificing performance or flexibility:

In [None]:
from tensorflow import keras

# Define a simple feedforward neural network using Keras Sequential API:
model_keras = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),      # Input layer: flattens 28x28 image to 1D vector of size 784
    keras.layers.Dense(128, activation='relu'),      # Hidden layer: 128 neurons with ReLU activation
    keras.layers.Dense(10, activation='softmax')     # Output layer: 10 neurons for 10 classes with softmax for probability output
])

# Compile the model:
# Optimiser: Adam (adaptive learning rate)
# Loss function: sparse categorical crossentropy (for integer labels in classification)
# Metric: accuracy (track performance during training)
model_keras.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train the model on the training data for 5 epochs:
# Also evaluate performance on validation (test) data during each epoch
history = model_keras.fit(
    X_train, Y_train,
    epochs=5,
    validation_data=(X_test, Y_test)
)

# Print a summary of the model architecture and parameter count
print(model_keras.summary())


Above we see the same summary as before, showing the structure of the network and the number of parameters. Just like before, we plot our evaluation metrics (training and validation loss, and accuracy) from the model:

In [None]:
import matplotlib.pyplot as plt

# Create a figure with two side-by-side subplots (1 row, 2 columns)
plt.figure(figsize=(10, 4))  # Set the overall figure size

# Plot training and validation loss
plt.subplot(1, 2, 1)  # First subplot (left side)

# Plot training loss over epochs
plt.plot(history.history['loss'], marker='o', label='Train Loss')

# Plot validation loss over epochs
plt.plot(history.history['val_loss'], marker='o', label='Val Loss')

plt.title('Loss over Epochs')  # Title for the plot
plt.xlabel('Epoch')            # X-axis label
plt.ylabel('Loss')             # Y-axis label
plt.grid(True)                 # Add gridlines
plt.legend()                   # Display legend

# Plot training and validation accuracy
plt.subplot(1, 2, 2)  # Second subplot (right side)

# Plot training accuracy over epochs
plt.plot(history.history['accuracy'], marker='o', label='Train Accuracy')

# Plot validation accuracy over epochs
plt.plot(history.history['val_accuracy'], marker='o', label='Val Accuracy')

plt.title('Accuracy over Epochs')  # Title for the plot
plt.xlabel('Epoch')                # X-axis label
plt.ylabel('Accuracy')             # Y-axis label
plt.grid(True)                     # Add gridlines
plt.legend()                       # Display legend

# Automatically adjust layout so plots don't overlap
plt.tight_layout()
plt.show()

Let's predict the label of an image fed into our model for demonstration:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Pick an image from the test set by index
sample_index = 1
sample_image = X_test[sample_index]          # Extract a 28x28 image from the test set
true_label = Y_test[sample_index]            # Extract the true label for reference

# Display the selected image using matplotlib
plt.imshow(sample_image, cmap='gray')        # Show the image in grayscale:
plt.title(f"True Label: {true_label}")       # Display the true label in the plot title
plt.axis('off')                              # Remove axis ticks for cleaner display
plt.show()

# Prepare the image for prediction:
# The model expects input in batch format: shape (batch_size, 28, 28)
# So we add a new first dimension (batch of size 1)
input_image = np.expand_dims(sample_image, axis=0)  # New shape: (1, 28, 28)

# Use the trained model to make predictions on the input image:
# The output is a vector of 10 probabilities (one for each digit class)
predictions = model_keras.predict(input_image)

# Get the index of the class with the highest predicted probability
predicted_label = np.argmax(predictions[0])

# Print out the predicted label
print(f"Predicted Label: {predicted_label}")


So far we have defined and trained the same neural network using the Keras API within TensorFlow, and they behave identically in terms of functionality and performance. The only difference lies in how the model is referenced. In the first example, the model is created using `tf.keras.Sequential`, which calls the Keras API through the main TensorFlow namespace.

In the second example, the model is created using `keras.Sequential`, having imported Keras directly from `tensorflow`. Although the import styles differ slightly, both refer to the same `tf.keras` module, the version of Keras that is fully integrated into TensorFlow. Since TensorFlow 2.0, this is the recommended way to build models, as it ensures compatibility with other TensorFlow tools and simplifies the workflow.

In practice, there is no behavioural or performance difference between the two approaches. Using `tf.keras.Sequential` is generally preferred for consistency and clarity, especially when working on larger projects that use the full TensorFlow ecosystem.

### Introduction to PyTorch

*PyTorch* is an open-source deep learning framework developed by Facebook’s AI Research lab. It is widely used for building and training neural networks, particularly in research and academic settings. PyTorch is known for its simplicity, flexibility, and Pythonic design, making it easy to learn and intuitive to use. Unlike some other frameworks, PyTorch allows for dynamic computation meaning models can be built and modified on the fly, which is especially helpful for tasks that require flexibility or experimentation.

It provides a range of tools for building deep learning models, managing data, and optimising performance, all while integrating smoothly with Python libraries like NumPy and matplotlib. PyTorch also includes built-in support for GPUs, enabling faster training on large datasets.

In PyTorch, all custom models must inherit from nn.Module. Calling `super().__init__()` ensures that all the internal mechanisms provided by `nn.Module` (like parameter tracking, saving/loading models, etc.) are correctly set up. Without this line, your model won’t function properly as a PyTorch model:

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

from tensorflow.keras.datasets import mnist

# Load MNIST handwritten digits data from TensorFlow
# X_train, Y_train → 60,000 training images and labels
# X_test, Y_test → 10,000 test images and labels
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

# Normalise pixel values from [0, 255] to [0, 1] for better neural network performance
X_train, X_test = X_train / 255.0, X_test / 255.0

# Split off part of the test set (first 5000 images) to use as a validation set
X_val, Y_val = X_test[:5000], Y_test[:5000]             # Validation set (5000 examples)
X_test_final, Y_test_final = X_test[5000:], Y_test[5000:]  # Remaining test set (5000 examples)

# Convert to PyTorch tensors
# Convert training data to PyTorch tensors and flatten each 28x28 image into a 784-length vector
x_train_torch = torch.tensor(X_train, dtype=torch.float32).view(-1, 28 * 28)
y_train_torch = torch.tensor(Y_train, dtype=torch.long)

# Convert validation data
x_val_torch = torch.tensor(X_val, dtype=torch.float32).view(-1, 28 * 28)
y_val_torch = torch.tensor(Y_val, dtype=torch.long)

# Create datasets and DataLoaders
# Combine input and label tensors into TensorDataset objects
train_dataset = TensorDataset(x_train_torch, y_train_torch)
val_dataset = TensorDataset(x_val_torch, y_val_torch)

# Create DataLoaders to load data in mini-batches:
# - Training loader shuffles the data each epoch for better learning
# - Validation loader does not shuffle (order doesn’t matter, but consistency helps)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=64, shuffle=False)


We define a simple feedforward neural network using PyTorch. The goal of this model, as before, is to classify handwritten digits (0–9) from the MNIST dataset. 

We start by creating a custom class called `SimpleNN`, which inherits from `nn.Module`. This is the standard way to define neural networks in PyTorch, as it allows us to build and manage model layers and behaviour.

Inside the `__init__` method (the constructor), we define the layers of our network:

- The first layer is a *fully connected layer* (also called a dense layer) that takes in 784 inputs, this corresponds to the 28×28 pixels of the flattened image. It outputs 128 values, which represent learned features.
- Next, we apply a *ReLU activation function*, which introduces non-linearity into the model. This allows the network to learn more complex patterns.
- Finally, we have an *output layer* with 10 neurons, one for each digit class (0 to 9). The output values represent the model’s predictions.

The `forward()` method defines how data moves through the network during prediction. Input `x` is passed through the first dense layer, transformed by the activation function, and then passed to the output layer. The result is a vector of 10 values, one for each class. During training, these values will be used to compute a loss and adjust the model’s weights accordingly:

In [None]:
# Define a simple feedforward neural network (inherits from nn.Module)
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()  # Call the parent class constructor to set up

        # Define the first fully connected (dense) layer:
        # Input size is 28x28 (flattened image) → output size 128 neurons
        self.fc1 = nn.Linear(28 * 28, 128)

        # Define a ReLU activation function (adds non-linearity)
        self.relu = nn.ReLU()

        # Define the second fully connected (dense) layer:
        # Input size 128 neurons → output size 10 neurons (for 10 digit classes)
        self.fc2 = nn.Linear(128, 10)

    # Define the forward pass (how data flows through the network)
    def forward(self, x):
        x = self.fc1(x)   # Pass input through first dense layer
        x = self.relu(x)  # Apply ReLU activation
        x = self.fc2(x)   # Pass through second dense layer (outputs raw class scores)
        return x          # Return the final output (logits, before softmax)


Once we have defined a class, we can instantiate it and set up the rest of the model:

In [None]:
# Instantiate the model, loss function, and optimiser
model_pytorch = SimpleNN()                         # Create an instance of the neural network
criterion = nn.CrossEntropyLoss()                 # Use cross-entropy loss for multi-class classification
optimiser = optim.Adam(model_pytorch.parameters(), lr=0.001)  # Adam optimiser with learning rate 0.001

# Set training parameters
epochs = 10                                       # Number of times to loop over the full training data
epoch_losses = []                                 # To store average training loss per epoch
epoch_accuracies = []                             # To store average training accuracy per epoch
val_epoch_losses = []                             # To store average validation loss per epoch
val_epoch_accuracies = []                         # To store average validation accuracy per epoch

# Main training loop
for epoch in range(epochs):
    model_pytorch.train()                         # Set the model to training mode
    running_loss = 0.0                            # Accumulate total training loss this epoch
    correct = 0                                   # Count correct predictions on training data
    total = 0                                     # Count total training samples

    # Loop over batches from the training DataLoader
    for data, labels in train_dataloader:
        optimiser.zero_grad()                     # Reset gradients from the previous batch
        outputs = model_pytorch(data)             # Forward pass: compute predictions
        loss = criterion(outputs, labels)         # Compute the loss against true labels
        loss.backward()                           # Backward pass: compute gradients
        optimiser.step()                          # Update model weights

        running_loss += loss.item()               # Add batch loss to total
        _, predicted = torch.max(outputs, 1)      # Get predicted class (index of max logit)
        correct += (predicted == labels).sum().item()  # Count correct predictions
        total += labels.size(0)                   # Count total samples seen

    avg_loss = running_loss / len(train_dataloader)    # Average loss over epoch
    accuracy = correct / total                          # Average accuracy over epoch
    epoch_losses.append(avg_loss)                       # Save for plotting later
    epoch_accuracies.append(accuracy)                  # Save for plotting later

    # Validation loop
    model_pytorch.eval()                           # Set model to evaluation mode (no dropout, etc.)
    val_loss = 0.0                                # Accumulate total validation loss this epoch
    val_correct = 0                               # Count correct predictions on validation data
    val_total = 0                                 # Count total validation samples

    with torch.no_grad():                         # Disable gradient tracking for validation
        for val_data, val_labels in val_dataloader:
            val_outputs = model_pytorch(val_data)        # Forward pass on validation data
            v_loss = criterion(val_outputs, val_labels)  # Compute validation loss
            val_loss += v_loss.item()                   # Add to total validation loss

            _, val_predicted = torch.max(val_outputs, 1)  # Get predicted class
            val_correct += (val_predicted == val_labels).sum().item()  # Count correct
            val_total += val_labels.size(0)                        # Count total samples

    avg_val_loss = val_loss / len(val_dataloader)        # Average validation loss
    val_accuracy = val_correct / val_total               # Average validation accuracy
    val_epoch_losses.append(avg_val_loss)               # Save for plotting
    val_epoch_accuracies.append(val_accuracy)          # Save for plotting

    # Print summary of the epoch’s performance
    print(f"Epoch {epoch + 1}/{epochs}, "
          f"Train Loss: {avg_loss:.4f}, Train Acc: {accuracy * 100:.2f}%, "
          f"Val Loss: {avg_val_loss:.4f}, Val Acc: {val_accuracy * 100:.2f}%")


As with our other models, we can now plot the training and validation loss, and accuracy to evaluate the model:

In [None]:
import matplotlib.pyplot as plt

# Plotting loss and accuracy over epochs
plt.figure(figsize=(10, 4)) 

# Plot loss (left-hand side)
plt.subplot(1, 2, 1)  # Create the first subplot (1 row, 2 columns, position 1)

# Plot training loss over epochs
plt.plot(epoch_losses, marker='o', label='Train Loss')

# Plot validation loss over epochs
plt.plot(val_epoch_losses, marker='o', label='Val Loss')

# Add title and axis labels
plt.title('Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')

# Add gridlines for easier reading
plt.grid(True)

# Show the legend to label the lines
plt.legend()

# Plot accuracy (right-hand side)
plt.subplot(1, 2, 2)  # Create the second subplot (1 row, 2 columns, position 2)

# Plot training accuracy over epochs
plt.plot(epoch_accuracies, marker='o', label='Train Accuracy')

# Plot validation accuracy over epochs
plt.plot(val_epoch_accuracies, marker='o', label='Val Accuracy')

# Add title and axis labels
plt.title('Accuracy over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

# Add gridlines
plt.grid(True)

# Show the legend
plt.legend()

# Adjust layout to prevent overlap between plots and labels
plt.tight_layout()

plt.show()

Let's see what the model achieved, by demonstrating how we make predictions from the model using PyTorch:

In [None]:
import matplotlib.pyplot as plt
import torch
import numpy as np

# Get a single sample from the test set

# Get the 4th image and label (index 3)
sample_img = X_test_final[3]         # Shape: (28, 28), NumPy array
sample_label = Y_test_final[3]       # Integer label

# Convert the image to a PyTorch tensor and flatten it
sample_img_torch = torch.tensor(sample_img, dtype=torch.float32).view(-1)  # Shape: (784,)

# Reshape for visualisation (back to 28x28)
image_2d = sample_img_torch.view(28, 28)

# Display the image
plt.imshow(image_2d, cmap='gray')
plt.title(f"True Label: {sample_label}")
plt.axis('off')
plt.show()

# Prepare the image for prediction
# Add batch dimension: shape becomes (1, 784)
sample_input = sample_img_torch.unsqueeze(0)

# Make a prediction with the trained model
model_pytorch.eval()

with torch.no_grad():
    output = model_pytorch(sample_input)
    predicted_label = torch.argmax(output, dim=1).item()

# Display the model's predicted label
print(f"Predicted Label: {predicted_label}")


This kind of architecture, built manually in PyTorch, is a good starting point because it gives you more transparency and control over what’s happening under the hood. Compared to TensorFlow and Keras, PyTorch tends to be more flexible and intuitive for those who want to understand how neural networks work step by step.

In PyTorch, you write your own training loop, manage gradients manually (via `backward()` and `optimiser.step()`), and can easily inspect intermediate outputs or tweak the computation at any point. This level of granular control is especially helpful when you're learning or debugging, or when you're working on custom research models that don’t fit into standard layers or workflows.

Keras is great for getting things working quickly thanks to its high-level abstractions, but PyTorch gives you a deeper understanding of how learning actually happens, which is why many learners and researchers prefer to start with it before moving to higher-level tools.

### Summary of approaches
*TensorFlow* is a powerful, all-in-one machine learning framework developed by Google. It allows developers to build, train, and deploy machine learning models, not just in research environments, but also in production settings like mobile apps, web services, and cloud platforms. One of its key strengths is its ability to scale from small experiments on a laptop to full-scale deployment across servers or devices. It’s widely used in industry because of its strong performance, deployment tools, and integration with services like TensorFlow Lite and TensorFlow Serving.

*Keras*, which is now built into TensorFlow, is a high-level API that makes creating deep learning models more intuitive and beginner-friendly. Instead of having to write long and complex code, Keras allows users to define models in just a few lines, using simple building blocks like layers, activations, and loss functions. This makes it a great starting point for anyone new to deep learning or for quickly prototyping ideas.

*PyTorch*, developed by Facebook, is often the go-to choice in academic and research settings. Its main appeal lies in how easy it is to use and modify. Unlike TensorFlow, which originally used a more rigid structure, PyTorch allows you to write and test code more like regular Python, which makes it easier to understand and debug. Researchers appreciate this flexibility when trying out new ideas or rapidly iterating on model designs.

In short, TensorFlow is ideal for scaling and deploying models; Keras makes it easier to use; and PyTorch is popular in research because of its flexibility and ease of experimentation.

### What have we learnt?
We've explored how different deep learning frameworks approach the task of building and training models. Both *TensorFlow* and *Keras* provide a structured and user-friendly way to define neural networks. These tools simplify the process of creating complex models and are designed to scale easily from small experiments to full-scale applications. Because of this, they are particularly well suited for real-world use cases, where models need to be deployed in production, whether on websites, apps, or embedded devices.

On the other hand, *PyTorch* offers a more flexible and intuitive coding experience, which makes it very popular in academic research and experimentation. It allows researchers to test new ideas and make changes to model architecture on the fly, using code that closely resembles standard Python. This flexibility makes it a valuable tool for developing new techniques or quickly prototyping novel approaches.

Each framework has its own strengths and ultimately, the choice of framework depends on what you're trying to achieve.

For our purposes, we will primarily use *TensorFlow* with *Keras*, as this combination offers an excellent balance between ease of use and the ability to deploy models efficiently.