## Hyperparameter tuning in Neural Networks

### Introduction
Think of training a neural network like preparing for a marathon. The data you feed into the model is like your training ground, the roads you run, the distances you cover. But how you train, how far you run each day, your pace, how often you rest, determines whether you cross the finish line in record time or burn out halfway through. These “how you train” choices are what we call hyperparameters in machine learning.

Hyperparameters are not learnt by the model; they’re set by you before training begins. And they matter enormously. If we set them right, our model becomes strong and efficient. If we get them wrong, it may struggle, overtrain, or simply never improve. That’s why tuning these values, experimenting to find the best setup, is such an important part of building neural networks as we'll see in this example.

**Note**: This notebook may take a while to complete, as we will be training a model several times with different parameters.

### Hyperparameters
There are many hyperparameters to consider, and each plays a different role in how the network learns:

- *Learning rate*: This controls how quickly the model updates its knowledge. A learning rate that’s too high might cause the model to overshoot the best solution, while a rate that’s too low can make learning painfully slow.

- *Batch size*: This is the number of data samples the model looks at before making an update. Smaller batches may lead to more accurate learning, but they can also be noisier and slower.

- *Number of layers and units per layer*: Just like an athlete might add more types of training sessions to improve performance, adding more layers or more units (neurons) per layer can help a network learn more complex patterns. But it also makes the network heavier and harder to train.

- *Dropout rate*: This is a regularisation technique, like scheduling in rest days to prevent overtraining. By randomly “dropping” certain neurons during training, the model becomes more robust and less likely to memorise the training data.

Tuning hyperparameters isn’t just trial and error, there are structured ways to do it. One method is a *grid search*, where you try out every combination of selected values, like testing every possible training schedule. It’s thorough but can take a lot of time.

A more efficient approach is *random search*, where you try a random sample of combinations. Surprisingly, this often finds good results faster. Then there’s *Bayesian optimisation*, which uses past tuning results to make smarter guesses about what combinations to try next. It’s like an athlete reviewing past training logs to fine-tune future sessions.


#### Installing Python Libraries
Keras Tuner is a library designed to help automate the process of choosing the best hyperparameters for a neural network. Instead of manually tweaking settings like the learning rate, number of layers, or dropout rate, Keras Tuner allows you to define a range of values for each parameter and then runs experiments to discover which combination works best. It’s built to work seamlessly with Keras models and supports several powerful search strategies that balance thoroughness with efficiency.

One of the most basic methods Keras Tuner supports is *grid search*, which tests every possible combination of the parameters you specify. While this is guaranteed to explore all options, it can be very slow and computationally expensive, especially when the number of combinations becomes large. To address this, *random search* offers a more efficient alternative: it samples random combinations from your parameter space. Though it might sound less thorough, random search has been shown to perform surprisingly well in practice, often finding good solutions more quickly.

For more intelligent searching, Keras Tuner also supports *Bayesian optimisation*, a technique that learns from previous results to choose the next set of hyperparameters more strategically. This approach tries to predict which combinations are likely to do well, rather than picking at random.

Finally, *Hyperband* is an advanced method that blends random search with early stopping, it tests many configurations quickly but stops training the worst-performing ones early on, saving time and resources.


In [None]:
!pip install matplotlib numpy tensorflow keras-tuner

### Rock, Paper, Scissors....
To make all of this more concrete, we’ll walk through an example using the *Keras Tuner*, a Python library that helps automate the process of finding the best hyperparameters.

We’ll use something a little more exciting than some of our previous datasets: the TensorFlow “Rock, Paper, Scissors” dataset. This dataset consists of thousands of photos of human hands showing rock, paper, or scissors gestures, collected from people of different ages and backgrounds around the world.

It’s visually rich, culturally recognisable, and instantly relatable, and it turns our hyperparameter tuning into a real-world challenge: can we train a model to see a hand gesture and correctly classify it as rock, paper, or scissors?

We’ll build a neural network to do just that. Along the way, we’ll explore how different tuning choices, learning rate, batch size, number of layers, and more, can affect the model’s accuracy and training time.

When we get to the end of the example, you’ll see how tuning hyperparameters is a great tool that can help our models perform better, faster, and more reliably. Just like how adjusting a runner’s training plan can lead to a personal best, smart tuning can lead to a smarter model once you have already experimented and found a good model to tune.

### Load the dataset
We first prepare the *Rock, Paper, Scissors* dataset for training a neural network. The dataset is loaded from *TensorFlow Datasets (TFDS)*, a library that provides datasets in a standard format. We’re using it to download and prepare images of hand gestures representing "rock", "paper", or "scissors".

Once the dataset is loaded, the code defines a function to format the images: each image is resized to a fixed size (150x150 pixels) and normalised so that pixel values fall between 0 and 1.

The training and validation datasets are batched (processed in groups of 32 images) and shuffled (to ensure the training process sees the data in a different order each epoch, improving learning). The `map()` function applies the formatting function to every image-label pair in the dataset. This setup ensures the data is clean, consistent, and ready for use in training a convolutional neural network.


In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_datasets as tfds

import keras_tuner as kt

import matplotlib.pyplot as plt
import numpy as np
import os

# Load the Rock, Paper, Scissors dataset
(train_ds, val_ds), ds_info = tfds.load(
    'rock_paper_scissors',
    split=['train', 'test'],
    as_supervised=True,
    with_info=True
)

# Define a function to format the images for training
def format_image(image, label):
    image = tf.image.resize(image, (150, 150))
    image = tf.cast(image, tf.float32) / 255.0
    return image, label

# Take only a subset for demonstration
train_ds = train_ds.take(300)   # Take first 300 examples from training
val_ds = val_ds.take(100)       # Take first 100 examples from validation

# Prepare the sampled datasets:
# Apply formatting and Group into batches of 32
# Shuffle the training data
train_ds = train_ds.map(format_image).batch(32).shuffle(300)  # shuffle buffer matches sample size

val_ds = val_ds.map(format_image).batch(32)                   # no shuffle for validation


###  Visualise the data

In [None]:
import matplotlib.pyplot as plt

# Get the class names from the dataset metadata
label_names = ds_info.features['label'].names

# Get a batch of images and labels
for images, labels in train_ds.take(1):  # Take one batch
    
    plt.figure(figsize=(10, 4))
    
    for i in range(10):  # Plot the first 9 images
        ax = plt.subplot(2, 5, i + 1)

        plt.imshow(images[i].numpy())
        plt.title(label_names[labels[i].numpy()])

        plt.axis("off")

    plt.show()


### The model

We define a function where Keras Tuner can try different values for the model’s hyperparameters. This function, `build_model(hp)` (see below), defines a tunable neural network model using Keras and returns it. It's written specifically for use with *Keras Tuner*, which will call this function multiple times with different hyperparameter combinations to find the best-performing model.

The below function creates a *Convolutional Neural Network (CNN)* designed for image classification. It begins with an input layer for 150×150 pixel RGB images. Then, it adds a flexible number of convolutional blocks, where each block includes a convolutional layer followed by a max-pooling layer. The number of blocks and the number of filters (channels) in each convolutional layer are *hyperparameters* that the tuner will experiment with.

After flattening the output of the convolutional layers, the model adds a fully connected (dense) layer. The size of this layer, as well as the dropout rate (used to prevent overfitting), are also tunable. Finally, the model ends with an output layer with 3 neurons one for each class: rock, paper, or scissors using softmax activation for multi-class classification. The model is compiled using the Adam optimiser, where the learning rate is yet another hyperparameter that Keras Tuner will tune.


In [None]:
def build_model(hp):
    model = keras.Sequential()

    # Input layer expects 150x150 RGB images (3 channels)
    model.add(layers.Input(shape=(150, 150, 3)))

    # Add a tunable number of convolutional blocks (1 to 2)
    for i in range(hp.Int('conv_blocks', 1, 3)):
        # Each block has a tunable number of filters (32 to 128, in steps of 32)
        model.add(layers.Conv2D(
            filters=hp.Int(f'filters_{i}', 32, 128, step=32),
            kernel_size=3,        # Use a 3x3 filter
            activation='relu'     # ReLU activation for non-linearity
        ))
        model.add(layers.MaxPooling2D())  # Reduce spatial dimensions

    # Flatten the 2D feature maps into a 1D vector
    model.add(layers.Flatten())

    # Add a dense (fully connected) layer with tunable number of units
    model.add(layers.Dense(
        units=hp.Int('dense_units', 32, 128, step=32),
        activation='relu'
    ))

    # Add dropout with a tunable rate (between 0.2 and 0.5)
    model.add(layers.Dropout(rate=hp.Float('dropout', 0.2, 0.5, step=0.1)))

    # Output layer: 3 units (rock, paper, scissors) with softmax for multi-class output
    model.add(layers.Dense(3, activation='softmax'))

    # Compile the model with a tunable learning rate (sampled logarithmically)
    model.compile(
        optimizer=keras.optimizers.Adam(
            hp.Float('learning_rate', 1e-4, 1e-2, sampling='log')
        ),
        loss='sparse_categorical_crossentropy',  # Suitable for integer labels
        metrics=['accuracy']  # Track accuracy during training
    )

    return model


### The tuner
We’ll use *Hyperband*, a fast search method that tests lots of configurations, but stops the worst ones early. We set up a Keras Tuner object using the Hyperband strategy. Hyperband works by running many different configurations for a few epochs, then focusing training only on the most promising ones. It combines random search with early stopping, making it both broad and efficient.

We pass the `build_model` function into `kt.Hyperband`, to tell Keras Tuner to try out different variations of the model we defined earlier, each with its own unique set of hyperparameters. We also tell the tuner what to optimise for (in this case, validation accuracy). We set the longest training run per model trial (epochs), and control how aggressively underperforming models are stopped early.

We set the directory and a project name to tell Keras Tuner where to store its search results, which is helpful if you want to pause and resume tuning later or compare across different experiments:

In [None]:
tuner = kt.Hyperband(
    build_model,           # The model-building function with tunable hyperparameters
    objective='val_accuracy',  # Optimise for best validation accuracy
    max_epochs=10,         # Maximum number of epochs to train each model
    factor=3,              # Each round, train fewer models for longer (early stopping logic)
    directory='rps_tuning',     # Folder to save tuning logs and models
    project_name='rock-paper-scissors'  # Subfolder name for this specific project
)


### Start tuning the model
We start the hyperparameter search using the Keras Tuner object we previously set up. We also introduce a tool called EarlyStopping, which helps prevent overtraining: it monitors the validation loss during model training and stops the process early if things stop improving for a few consecutive epochs (in this case, after 3).

Below, when we call `tuner.search(...)`, we instruct Keras Tuner to begin testing different combinations of hyperparameters from our `build_model` function. It will train each version of the model using the `train_ds` dataset, evaluate it on `val_ds`, and keep track of which hyperparameter combinations perform best.

This is where the actual learning and evaluation happens, it's the part where the tuner is actively hunting for the most effective architecture and training settings for your model.

> **Warning**: It can take a while to run this next cell! It will run for approximately 30 trials to find the best parameters for our sample of data.

Below we set up an `EarlyStopping` callback. A callback is a tool that lets you intervene during training, in this case, to stop it early if things aren’t improving.  This specific early stopping rule says:

- Monitor `val_loss`: Watch the validation loss (how well the model is doing on unseen data).

- `Patience = 3`: If the validation loss doesn't get better for three consecutive epochs, stop training that model trial early.

This prevents wasting time on models that are clearly not improving. Instead of running all 10 epochs blindly, you allow weaker models to "quit" earlier, saving computation and speeding up the whole tuning process.

In [None]:
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)

tuner.search(train_ds, validation_data=val_ds, epochs=10, callbacks=[stop_early])


The above output is a snapshot of what happens during the hyperparameter tuning process using Keras Tuner. Each *trial* refers to a full training session using a specific combination of hyperparameters, e.g.: how many convolutional layers to use, how many filters in each layer, or what learning rate to apply. You will see several trails. Looking at Trial 2, we see it completes and achieves a validation accuracy of approximately 73.7%, meaning it correctly classified about three-quarters of the images in the validation set. This is currently the best result seen so far across all trials (Trial 1 was not great), which makes it the leading configuration at this point.

Keras Tuner then moves on to Trial 3, where it’s trying a new set of hyperparameter values. This time it’s testing a simpler architecture (fewer convolutional blocks) but with more filters in the first convolutional layer and a much smaller learning rate. The output shows a direct comparison between the values currently being tested and the best values from previous trials. This helps you see which choices are being explored and how they relate to the best-performing configuration so far.

The training log from Trial 3 reveals that the model isn’t performing well with this setup. Accuracy on both the training and validation sets remains low, and the loss remains high, suggesting the model is not learning effectively. Because Keras Tuner uses an early stopping approach (especially in Hyperband), this underperforming model is only trained for two epochs before the tuner moves on. This saves time and resources by not training poor models for longer than necessary.

Overall, this output illustrates the real-time process of automated experimentation. Keras Tuner is actively exploring different architectures and learning strategies to find a configuration that delivers the highest accuracy, keeping what works and discarding what doesn’t.

### Evaluate
After Keras Tuner has finished trying out different model configurations, you can ask it to tell you which combination of hyperparameters worked best, in other words, the one that achieved the highest validation accuracy. The next part of the code prints out the values of those winning hyperparameters:

In [None]:
best_hps = tuner.get_best_hyperparameters(1)[0]

print(f"""
Best number of convolutional blocks: {best_hps.get('conv_blocks')}
Best dense units: {best_hps.get('dense_units')}
Best dropout rate: {best_hps.get('dropout')}
Best learning rate: {best_hps.get('learning_rate')}
""")


The statement `tuner.get_best_hyperparameters(1)` returns a list of the top 1 best-performing configurations. We access the first (and only) item in the list with `[0]`. This gives us a HyperParameters object `best_hps`, which holds all the settings for that top-performing model.

Each call to `best_hps.get(...)` extracts one of the hyperparameters that Keras Tuner identified as part of the best-performing model configuration. These values represent choices that had the most positive impact on validation accuracy during the tuning process, and understanding what each one does helps you interpret how the model was optimised.

The `conv_blocks` parameter tells us how many convolutional blocks (sets of convolution and pooling layers) to use. These blocks are essential for extracting spatial features from images, such as edges, shapes, and textures. More blocks allow the model to learn more complex patterns, but too many can lead to overfitting or slow training. The chosen number gives us insight into how deep our model needs to be for this task.

The `dense_units` value refers to the size of the fully connected layer that comes after flattening the output of the convolutional layers. This layer acts as a decision-making stage, taking all the learned features and combining them to produce final predictions. A larger number of units allows the model to represent more complex combinations of features, while a smaller number forces it to make simpler, potentially more generalisable decisions.

The `dropout` rate is a regularisation technique used to prevent overfitting. During training, dropout randomly disables a percentage of neurons in a layer, which encourages the network to rely on a wider variety of features rather than memorising specific ones. The best dropout value represents a good balance, enough to improve generalisation, but not so much that learning becomes unstable or ineffective.

Finally, the `learning_rate` controls how quickly the model updates its weights as it learns. If this rate is too high, the model may overshoot the optimal solution and struggle to converge. If it's too low, learning may be too slow or get stuck in suboptimal solutions. The learning rate found by the tuner is usually in a “sweet spot” that supports steady and effective training.

Altogether, these optimised hyperparameters define the most effective architecture and training configuration for your model based on the data.

### Use the suggested parameters
Once Keras Tuner has identified the best-performing set of hyperparameters, the next step is to build a final model using those settings and train it properly. This allows us to move beyond the short trial runs used during tuning and give the best configuration a fair chance to learn from the data in full. The goal here is to see how well the model performs when trained consistently with the most promising hyperparameter values.

The first line of code below rebuilds the model from scratch using the optimal configuration. Even though this configuration was tested during tuning, we don’t reuse the previously trained model. Instead, we start fresh to avoid any inconsistencies that might arise from early stopping or partial training. This ensures a clean and unbiased training process.

The second line trains this newly built model for 10 full epochs using the training and validation datasets we prepared earlier. This time, we’re not stopping early unless we explicitly set up early stopping again (which we haven’t here). The training history is stored in the history variable, which can later be used to plot learning curves and evaluate how the model improved over time:

In [None]:
# Build a new model using the best hyperparameter combination found by the tuner
model = tuner.hypermodel.build(best_hps)

# Train the model
history = model.fit(train_ds, validation_data=val_ds, epochs=10)

Over just ten epochs, this model learnt extremely quickly. On the training images its accuracy climbed from about 32 per cent in the first epoch to almost 99 per cent by the ninth, with its training loss plummeting from around 1.13 down to just 0.06. On the unseen validation set it also improved strongly at first, validation accuracy rose from roughly 31 per cent to a peak of 85 per cent in epoch 8, and validation loss dipped from about 1.13 to 0.43. 

However, after that point the gains levelled off and even slipped back slightly, finishing at 78 per cent accuracy and a loss of 0.51. 

In short, the network learned the training data exceptionally well and generalised effectively up to around epoch 8, but beyond that it began to over-fit, with validation performance tailing off a little.

Therefore, we can stop early, or look at tweaking the model further if we need better performance.

### Evaluate
We create a visualisation of how the model's accuracy changed during training, both on the training set and the validation set. It’s a good way to check whether the model is learning effectively, and to spot signs of underfitting or overfitting.

When a model is training well, you’ll typically see the training accuracy improve steadily over epochs. Ideally, validation accuracy should also improve, showing that the model is generalising to unseen data. If the training accuracy keeps rising but validation accuracy stagnates or drops, that could be a sign of overfitting:

In [None]:
# Plot the model's accuracy on the training set and validation set over each epoch
plt.plot(history.history['accuracy'], label='Train Accuracy')        # Training accuracy per epoch
plt.plot(history.history['val_accuracy'], label='Val Accuracy')      # Validation accuracy per epoch

plt.title('Model accuracy over time')

plt.xlabel('Epoch')
plt.ylabel('Accuracy')

plt.legend()

plt.show()

This graph gives you a quick visual check of the model’s learning behaviour. If both training and validation accuracy rise and stay close together, it suggests the model is learning well without too much overfitting. The main takeaway, is that we have empirically tested the model and searched for the optimal parameters to give us a better model accuracy.

### Predict
Now we have a final mode, let's look at the actual data and the predicted versus true label:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Fetch one batch from the validation dataset
val_batch = next(iter(val_ds))
images, true_labels = val_batch

# Get predictions
pred_probs = model.predict(images)
pred_labels = np.argmax(pred_probs, axis=1)

# Define class names
class_names = ['rock', 'paper', 'scissors']

# Plot a 3x3 grid of images with true vs predicted labels
fig, axes = plt.subplots(3, 3, figsize=(8, 8))
axes = axes.flatten()

for idx, ax in enumerate(axes):
    img = images[idx].numpy()

    ax.imshow(img)

    ax.axis('off')

    ax.set_title(f"True: {class_names[true_labels[idx]]}\nPred: {class_names[pred_labels[idx]]}")

plt.tight_layout()

plt.show()


### What have we learnt
We’ve gained a practical understanding of hyperparameter tuning in neural networks, not just what it is, but why it matters, how to do it well, and how to evaluate the results.

We began by treating model training like marathon training: just as athletes need to tune their routines for the best performance, machine learning models require us to tune their hyperparameters. These include settings like the learning rate, number of layers, number of units, and dropout rate. All of these influence how the model learns, generalises, and performs.

Using the *Keras Tuner* library, we saw how to automate the search for the best hyperparameters rather than choosing them manually. We explored the idea of trying many combinations through efficient strategies like *Hyperband*, which quickly discards poor performers and focuses training on promising ones. This makes the process faster and more effective than grid search or random experimentation alone.

We used the *Rock, Paper, Scissors* image dataset an accessible classification task to walk through the full process: loading and preparing the data, building a tunable convolutional model, running tuning trials, retrieving the best hyperparameters, training the final model, and finally, visualising the results to assess learning over time.