# Hyperparameter Tuning with Keras Tuner

This notebook demonstrates how to use Keras Tuner to automatically find optimal hyperparameters for your deep learning models. We'll explore how to:

1. Define a hyperparameter search space
2. Use different search strategies (Random, Hyperband, Bayesian)
3. Analyze and visualize tuning results
4. Build the best model with the optimal hyperparameters

## Keras Tuner Setup

This code prepares our environment for hyperparameter tuning with Keras Tuner, using a reduced dataset size for faster experimentation.

In [1]:
!pip install keras-tuner

Collecting keras-tuner
  Downloading keras_tuner-1.4.7-py3-none-any.whl.metadata (5.4 kB)
Collecting kt-legacy (from keras-tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl.metadata (221 bytes)
Downloading keras_tuner-1.4.7-py3-none-any.whl (129 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras-tuner
Successfully installed keras-tuner-1.4.7 kt-legacy-1.0.5


In [2]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout, BatchNormalization
import keras_tuner as kt
import matplotlib.pyplot as plt

# Load and preprocess the Fashion MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalize pixel values

# Use smaller dataset for faster tuning
train_size = 10000
val_size = 2000
x_train_sample = x_train[:train_size]
y_train_sample = y_train[:train_size]
x_val_sample = x_train[train_size:train_size+val_size]
y_val_sample = y_train[train_size:train_size+val_size]

print(f"Training sample shape: {x_train_sample.shape}")
print(f"Validation sample shape: {x_val_sample.shape}")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
[1m29515/29515[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
[1m26421880/26421880[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
[1m5148/5148[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
[1m4422102/4422102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Training sample shape: (10000, 28, 28)
Validation sample shape: (2000, 28, 28)


## Defining the Hyperparameter Search Space

This function defines a model architecture with tunable hyperparameters, including:
- Layer sizes (units in dense layers)
- Dropout rate
- Whether to include batch normalization
- Learning rate

Keras Tuner will systematically explore this space to find optimal combinations.

In [3]:
# Define the model-building function with hyperparameters to tune
def build_model(hp):
    model = Sequential()
    model.add(Flatten(input_shape=(28, 28)))

    # Tune number of units in first dense layer
    hp_units1 = hp.Int('units_1', min_value=32, max_value=256, step=32)
    model.add(Dense(hp_units1, activation='relu'))

    # Tune dropout rate
    hp_dropout = hp.Float('dropout', min_value=0.0, max_value=0.5, step=0.1)
    model.add(Dropout(hp_dropout))

    # Tune whether to include batch normalization
    if hp.Boolean('batch_normalization'):
        model.add(BatchNormalization())

    # Tune number of units in second dense layer
    hp_units2 = hp.Int('units_2', min_value=16, max_value=128, step=16)
    model.add(Dense(hp_units2, activation='relu'))

    model.add(Dense(10, activation='softmax'))

    # Tune learning rate
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-3, 5e-3, 1e-2])

    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    return model

## Hyperparameter Tuning with Random Search

We use Keras Tuner's RandomSearch to find optimal hyperparameters, limiting to 5 trials for efficiency. The tuner automatically tests different combinations and identifies the best performing model configuration.

In [4]:
# Initialize the Random Search tuner
tuner = kt.RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,  # Limit to 5 trials for brevity
    executions_per_trial=1,
    directory='keras_tuner',
    project_name='fashion_mnist'
)

# Define early stopping callback for each trial
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)

# Start the search
tuner.search(
    x_train_sample, y_train_sample,
    epochs=5,
    validation_data=(x_val_sample, y_val_sample),
    callbacks=[stop_early],
    verbose=1
)

# Get the best hyperparameters and build the best model
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
best_model = tuner.hypermodel.build(best_hps)

# Print the best hyperparameters
print("Best Hyperparameters:")
print(f"Learning rate: {best_hps.get('learning_rate')}")
print(f"Units in first layer: {best_hps.get('units_1')}")
print(f"Units in second layer: {best_hps.get('units_2')}")
print(f"Dropout rate: {best_hps.get('dropout')}")
print(f"Batch normalization: {best_hps.get('batch_normalization')}")

# Train the best model
best_model.fit(
    x_train_sample, y_train_sample,
    epochs=5,
    validation_data=(x_val_sample, y_val_sample),
    verbose=1
)

# Evaluate on test set
test_loss, test_acc = best_model.evaluate(x_test, y_test, verbose=1)
print(f"Test accuracy: {test_acc:.4f}")

Trial 5 Complete [00h 00m 12s]
val_accuracy: 0.8339999914169312

Best val_accuracy So Far: 0.8619999885559082
Total elapsed time: 00h 01m 10s
Best Hyperparameters:
Learning rate: 0.001
Units in first layer: 224
Units in second layer: 32
Dropout rate: 0.1
Batch normalization: False
Epoch 1/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 7ms/step - accuracy: 0.6489 - loss: 1.0209 - val_accuracy: 0.8250 - val_loss: 0.5126
Epoch 2/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8142 - loss: 0.5264 - val_accuracy: 0.8445 - val_loss: 0.4362
Epoch 3/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8485 - loss: 0.4279 - val_accuracy: 0.8520 - val_loss: 0.4316
Epoch 4/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8525 - loss: 0.4172 - val_accuracy: 0.8590 - val_loss: 0.4207
Epoch 5/5
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0