# HyperTuning with KerasTuner and TensorFlow
---

Building machine learning models is an iterative process that involves optimizing the model's performance and compute resources. The settings that you adjust during each iteration are called *hyperparameters*. They govern the training process and are held constant during training. 

The process of searching for optimal hyperparameters is called *hyperparameter tuning* or *hypertuning*, and is essential in any machine learning project. Hypertuning helps boost performance and reduces model complexity by removing unnecessary parameters (e.g., number of units in a dense layer).
There are two type of hyperparameters:
1. *Model hyperparameters* that influence model architecture (e.g., number and width of hidden layers in a DNN)
2. *Algorithm hyperparameters* that influence the speed and quality of training (e.g., learning rate and activation function).

The number of hyperparameter combinations, even in a shallow DNN, can grow insanely large making manually searching for the optimal set simply not feasible nor scalable. 
This post will introduce you to KerasTuner, a library made to automate the hyperparameter search. We'll build a deep learning model and train it on the [Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) with:
* Pre-selected hyperparameters
* Optimized hyperparameters with KerasTuner
* Optimized pre-trained Xception and ResNet models

Let's begin!

## Imports and Preprocessing

In [1]:
import tensorflow as tf
import kerastuner as kt

from tensorflow import keras

print(f"TensorFlow Version: {tf.__version__}")
print(f"KerasTuner Version: {kt.__version__}")

TensorFlow Version: 2.5.0
KerasTuner Version: 1.0.1


In [2]:
# Load and split data into train and test sets
(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

In [3]:
# Normalize pixels to values between 0 and 1
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

## Baseline Performance
As mentioned, we will first train a shallow dense neural network (DNN) with preselected hyperparameters giving us a baseline performance. We'll see later on how simple models, like this our shallow DNN, can take some time to tune.

In [4]:
# Build baseline model with Sequential API
b_model = keras.Sequential()
b_model.add(keras.layers.Flatten(input_shape=(28,28)))
b_model.add(keras.layers.Dense(units=512, activation='relu', name='dense_1'))
b_model.add(keras.layers.Dropout(0.2))
b_model.add(keras.layers.Dense(10, activation='softmax'))

# Print model summary
b_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


2021-08-12 11:39:13.665454: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Notice how we hardcode all of the hyperparameters in the code above. These include the number of hidden layers (in our case there is 1 hidden layer), the number of units in our hidden layer (512), its activation function (ReLu), and the dropout percentage (0.2). We'll tune all of these hyperparameters later.

Let's now setup the optimizer, loss, and metrics. One more hyperparameter we'll tune later on is the learning rate, but for now we'll set it equal to 0.001.

In [5]:
# Set training parameters
b_model.compile(optimizer=keras.optimizers.Adam(lr=0.001),
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])



With our model's settings defined, we are ready to train! We'll set the number of epochs to 20 and use early stopping to interrupt training if there is no performance gained after 5 epochs.

In [6]:
# Number of epochs
NUM_EPOCHS = 20

# Early stopping set after 5 epochs
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

# Train model
b_model.fit(X_train, y_train, epochs=NUM_EPOCHS, validation_split=0.2, callbacks=[stop_early], verbose=2)

Epoch 1/20


2021-08-12 11:39:14.387108: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)


1500/1500 - 4s - loss: 0.5155 - accuracy: 0.8153 - val_loss: 0.4157 - val_accuracy: 0.8507
Epoch 2/20
1500/1500 - 4s - loss: 0.3944 - accuracy: 0.8559 - val_loss: 0.3943 - val_accuracy: 0.8592
Epoch 3/20
1500/1500 - 4s - loss: 0.3547 - accuracy: 0.8689 - val_loss: 0.3409 - val_accuracy: 0.8748
Epoch 4/20
1500/1500 - 4s - loss: 0.3333 - accuracy: 0.8776 - val_loss: 0.3349 - val_accuracy: 0.8768
Epoch 5/20
1500/1500 - 4s - loss: 0.3151 - accuracy: 0.8824 - val_loss: 0.3261 - val_accuracy: 0.8800
Epoch 6/20
1500/1500 - 4s - loss: 0.3050 - accuracy: 0.8877 - val_loss: 0.3347 - val_accuracy: 0.8770
Epoch 7/20
1500/1500 - 4s - loss: 0.2930 - accuracy: 0.8896 - val_loss: 0.3470 - val_accuracy: 0.8790
Epoch 8/20
1500/1500 - 4s - loss: 0.2846 - accuracy: 0.8928 - val_loss: 0.3410 - val_accuracy: 0.8785
Epoch 9/20
1500/1500 - 4s - loss: 0.2719 - accuracy: 0.8979 - val_loss: 0.3175 - val_accuracy: 0.8875
Epoch 10/20
1500/1500 - 4s - loss: 0.2626 - accuracy: 0.8998 - val_loss: 0.3245 - val_accurac

<tensorflow.python.keras.callbacks.History at 0x7fc7cb1653d0>

In [7]:
b_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


Finally, we'll want to see how our model performs against the test set. We'll define a helper function to easily display the results.

In [8]:
import pandas as pd

def evaluate_model(model, X_test, y_test):
    """
    evaluate model on test set and show results in dataframe.
    
    Parameters
    ----------
    model : keras model
        trained keras model.
    X_test : numpy array
        Features of holdout set.
    y_test : numpy array
        Labels of holdout set.
        
    Returns
    -------
    display_df : DataFrame
        Pandas dataframe containing evaluation results.
    """
    eval_dict = model.evaluate(X_test, y_test, return_dict=True)
    
    display_df = pd.DataFrame([eval_dict.values()], columns=[list(eval_dict.keys())])
    
    return display_df

In [9]:
results = evaluate_model(b_model, X_test, y_test)

results.index = ['Baseline']

results.head()



Unnamed: 0,loss,accuracy
Baseline,0.356434,0.8869


There's the results for a single set of hyperparameters. Imagine trying out different learning rates, dropout percentages, number of hidden layers, and number of neurons in each hidden layer. As you can see, manual hypertuning is simply not feasible nor scalable. In the next section you'll see how Keras Tuner solves these problems simply by automating the process and searching the hyperparameter space in an efficient way.

## Keras Tuner
Keras Tuner is a simple, distributable hyperparameter optimization framework that automates the painful process of manually searching for optimal hyperparameters. Keras Tuner comes with Random Search, Hyperband, and Bayesian Optimization built-in search algorithms, and is designed to fit many use cases including:
* Distributed tuning
* Custom training loops (e.g., GANs, reinforcement learning, etc.)
* Adding hyperparameters outside of the model building function (preprocessing, data augmentation, test time augmentation, etc.)

These processes are outside the scope of this write-up, but feel free to read more in the official documentation.
There are four steps to hypertune our shallow DNN using Keras Tuner:
1. Define the model
2. Specify which hyperparameters to tune
3. Define the search space
4. Define the search algorithm

### Define the model
The model we set up for hypertuning is called a hypermodel. We define the hyperparameter search space when we build our hypermodel.
There are two ways to build a hypermodel:
1. By using a model builder function
2. Using a [HyperModel subclass](https://keras.io/guides/keras_tuner/getting_started/#you-can-use-a-hypermodel-subclass-instead-of-a-modelbuilding-function) of the Keras Tuner API

We will be using the first approach to define our DNN in the model building function. You'll Notice how the  hyperparameters are defined inline. Our model building function uses the defined hyperparameters to return a compiled model.

### Specify Which Hyperparameters to Tune
The model we'll be building is very similar to the shallow DNN we trained earlier, except we'll be tuning four of the model's hyperparameters:
* The number of hidden layers
* The number of units in each hidden layer
* The dropout percentage after each hidden layer
* The learning rate of the Adam optimizer

### Define the Search Space
This is done by passing a HyperParameters object as a parameter to the model building function that configures the hyperparameters you'd like to tune. In our function we will use:
* `hp.Int()` to define the search space for the number of hidden layers and units in each hidden layer. This allows you to define minimum and maximum values, as well as a step size to increment by.
* `hp.Float()` to define the search space for the dropout percentage. This is similar to `hp.Int()` except it takes floating values.
* `hp.Choice()` to define the search space of the learning rate. This allows you to define discrete values.

For more information on all the available methods and their usage visit the [official documentation](https://keras.io/api/keras_tuner/hyperparameters/).

In [10]:
def build_model(hp):
    """
    Builds model and sets up hyperparameter space to search.
    
    Parameters
    ----------
    hp : HyperParameter object
        Configures hyperparameters to tune.
        
    Returns
    -------
    model : keras model
        Compiled model with hyperparameters to tune.
    """
    # Initialize sequential API and start building model.
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(28,28)))
    
    # Tune the number of hidden layers and units in each.
    # Number of hidden layers: 1 - 5
    # Number of Units: 32 - 512 with stepsize of 32
    for i in range(1, hp.Int("num_layers", 2, 6)):
        model.add(
            keras.layers.Dense(
                units=hp.Int("units_" + str(i), min_value=32, max_value=512, step=32),
                activation="relu")
            )
        
        # Tune dropout layer with values from 0 - 0.3.
        model.add(keras.layers.Dropout(hp.Float("dropout_" + str(i), 0, 0.3, step=0.1)))
    
    # Add output layer.
    model.add(keras.layers.Dense(units=10, activation="softmax"))
    
    # Tune learning rate for Adam optimizer with values from 0.01, 0.001, or 0.0001
    hp_learning_rate = hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])
    
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss=keras.losses.SparseCategoricalCrossentropy(),
                  metrics=["accuracy"])
    
    return model

### Define the Search Algorithm
After building our model builder function, we can instantiate the tuner and specify a search strategy. For our use case we will use the Hyperband algorithm. Hyperband is a novel bandit-based approach made specifically for hyperparameter optimization. The [research paper](https://jmlr.org/papers/v18/16-558.html) was published in 2018 and details a process that quickly converges on a high-performing model through adaptive resource-allocation and early-stopping.

The idea is simple, Hyperband uses a sports championship style bracket and begins by randomly selecting a large number of models with random hyperparameter permutations from the search space. Each model is trained for a few epochs and only the top-performing half of models moves on to the next round.

To instantiate our tuner, we will need to define the following hyperparameters:
* Our hypermodel (build by our model builder function)
* The objective (the direction (min or max) will be automatically inferred for built-in metrics - for custom metrics we can use kerastuner.Objective)
* Factor and max_epochs are used to calculate the amount of models in each bracket by taking 1 + $\log_{factor}(max\_epochs)$. This number is rounded up to the nearest integer.
* Hyperband iterations is used to control the resource budget you're willing to allocate to hypertuning. Hyperband iterations is the number of times you iterate over the entire search algorithm.
* Directory saves logs and checkpoints for each trial run during the hyperparameter search allowing us to pick up the search where we last left off. You can disable this behavior by setting an additional hyperparameter `overwrite=True`.
* Project_name is used to differentiate with other runs and is a subdirectory under directory.

Please refer to the [official documentation](https://keras.io/api/keras_tuner/tuners/hyperband/) for a list of all available arguments.

In [20]:
# Instantiate the tuner
tuner = kt.Hyperband(build_model,
                     objective="val_accuracy",
                     max_epochs=20,
                     factor=3,
                     hyperband_iterations=1,
                     directory="kt_dir",
                     project_name="kt_hyperband",
                     overwrite=True)

We can see the search space summary with:

In [21]:
# Display search space summary
tuner.search_space_summary()

We can set callbacks like early stopping to stop training early when metrics aren't improving. 

Let's start the search.

In [22]:
# This cell takes a long time to run when hyperband_iterations is large

stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

tuner.search(X_train, y_train, epochs=NUM_EPOCHS, validation_split=0.2, callbacks=[stop_early], verbose=0)

INFO:tensorflow:Oracle triggered exit


After the search is finished, we can get the best hyperparameters and retrain the model.

In [23]:
# Get the optimal hyperparameters from the results
best_hps=tuner.get_best_hyperparameters()[0]

In [24]:
h_model = tuner.hypermodel.build(best_hps)
h_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 192)               49344     
_________________________________________________________________
dropout_2 (Dropout)          (None, 192)               0

In [25]:
# Train the hypertuned model
h_model.fit(X_train, y_train, epochs=NUM_EPOCHS, validation_split=0.2, callbacks=[stop_early], verbose=2)

Epoch 1/20
1500/1500 - 3s - loss: 0.6831 - accuracy: 0.7709 - val_loss: 0.4512 - val_accuracy: 0.8426
Epoch 2/20
1500/1500 - 3s - loss: 0.4436 - accuracy: 0.8457 - val_loss: 0.3972 - val_accuracy: 0.8586
Epoch 3/20
1500/1500 - 3s - loss: 0.3924 - accuracy: 0.8605 - val_loss: 0.3679 - val_accuracy: 0.8693
Epoch 4/20
1500/1500 - 3s - loss: 0.3615 - accuracy: 0.8697 - val_loss: 0.3576 - val_accuracy: 0.8710
Epoch 5/20
1500/1500 - 3s - loss: 0.3368 - accuracy: 0.8808 - val_loss: 0.3339 - val_accuracy: 0.8807
Epoch 6/20
1500/1500 - 3s - loss: 0.3180 - accuracy: 0.8856 - val_loss: 0.3547 - val_accuracy: 0.8696
Epoch 7/20
1500/1500 - 3s - loss: 0.3038 - accuracy: 0.8898 - val_loss: 0.3289 - val_accuracy: 0.8824
Epoch 8/20
1500/1500 - 3s - loss: 0.2934 - accuracy: 0.8930 - val_loss: 0.3240 - val_accuracy: 0.8835
Epoch 9/20
1500/1500 - 3s - loss: 0.2799 - accuracy: 0.8968 - val_loss: 0.3135 - val_accuracy: 0.8860
Epoch 10/20
1500/1500 - 3s - loss: 0.2702 - accuracy: 0.9006 - val_loss: 0.3076 - 

<tensorflow.python.keras.callbacks.History at 0x7fc7ce2529a0>

And then we'll evaluate our hypertuned model on the test set!

In [26]:
hyper_df = evaluate_model(h_model, X_test, y_test)

hyper_df.index = ["Hypertuned"]

results.append(hyper_df)



Unnamed: 0,loss,accuracy
Baseline,0.356434,0.8869
Hypertuned,0.335769,0.8888


Our hypertuned model performed slightly better on the test set despite having 100,000 less parameters.  

## HyperResnet
In addition to defining our own hypermodels, Keras Tuner provides two predefined tunable models, HyperXception and HyperResnet. These models search over the following architectures and hyperparameters:
* The version of the model
* Depth of convolutional layers
* Pooling
* Learning rate
* Optimization algorithm

Let's see how we can use these models with our tuner.

In [27]:
hypermodel = kt.applications.HyperResNet(input_shape=(28, 28, 1), classes=10)

tuner = kt.tuners.BayesianOptimization(
    hypermodel,
    objective='val_accuracy',
    max_trials=3,
    directory="kt_dir",
    project_name="kt_bayes_resnet")

We have to specify the input shape and number of classes to our HyperResnet model. This time we'll use Bayesian Optimization as our search algorithm, which searches the hyperparameter space by focusing on areas that have promise.

In [28]:
tuner.search_space_summary()

Next, we have to preprocess our data to match HyperResnet's requirements. HyperResnet expects features to be the same shape as a convolutional layer and expects one-hot encoded labels.

Finally, we can begin the search.

In [None]:
# ResNet expects shape of convolutional layer
X_train_res = X_train.reshape(len(X_train), 28, 28, 1)
X_test_res = X_test.reshape(len(X_test), 28, 28, 1)

# ResNet expects one-hot encoded labels
y_train_res = keras.utils.to_categorical(y_train)
y_test_res = keras.utils.to_categorical(y_test)

tuner.search(X_train_res, y_train_res, epochs=NUM_EPOCHS, validation_split=0.2, verbose=2)

Epoch 1/20
1500/1500 - 3679s - loss: 1.0165 - accuracy: 0.6395 - val_loss: 0.6298 - val_accuracy: 0.7672
Epoch 2/20
1500/1500 - 1751s - loss: 0.6291 - accuracy: 0.7702 - val_loss: 0.5368 - val_accuracy: 0.8087
Epoch 3/20
1500/1500 - 1807s - loss: 0.5420 - accuracy: 0.8022 - val_loss: 0.4885 - val_accuracy: 0.8203
Epoch 4/20
1500/1500 - 1817s - loss: 0.4926 - accuracy: 0.8202 - val_loss: 0.4420 - val_accuracy: 0.8402
Epoch 5/20
1500/1500 - 9060s - loss: 0.4442 - accuracy: 0.8384 - val_loss: 0.4131 - val_accuracy: 0.8487
Epoch 6/20
1500/1500 - 28919s - loss: 0.4142 - accuracy: 0.8496 - val_loss: 0.4021 - val_accuracy: 0.8524
Epoch 7/20
1500/1500 - 1735s - loss: 0.3858 - accuracy: 0.8591 - val_loss: 0.3953 - val_accuracy: 0.8581
Epoch 8/20
1500/1500 - 1740s - loss: 0.3652 - accuracy: 0.8666 - val_loss: 0.3693 - val_accuracy: 0.8648
Epoch 9/20
1500/1500 - 3160s - loss: 0.3449 - accuracy: 0.8752 - val_loss: 0.3575 - val_accuracy: 0.8684
Epoch 10/20
1500/1500 - 1762s - loss: 0.3258 - accurac

Same as above, we can get the best hyperparameters and retrain the model.

In [None]:
# Get the optimal hyperparameters from the results
best_hps=tuner.get_best_hyperparameters()[0]

In [None]:
resnet_model = tuner.hypermodel.build(best_hps)
resnet_model.summary()

In [None]:
# Train the hypertuned model
resnet_model.fit(X_train_res, y_train_res, epochs=NUM_EPOCHS, validation_split=0.2, callbacks=[stop_early], verbose=2)

In [None]:
resnet_df = evaluate_model(resnet_model, X_test_res, y_test_res)

resnet_df.index = ["HyperResNet"]

results.append(resnet_df)

## Wrap-Up
Hypertuning is an essential part of a machine learning pipeline. In this post, we trained a baseline model showing why manual searching for optimal hyperparameters is hard. We explored Keras Tuner in-depth and how it is used to automate the hyperparameter search. Finally, we hypertuned a predefined HyperResnet model.

Thanks for reading!