# HyperTuning with KerasTuner and TensorFlow
---

Building machine learning models is an iterative process that involves optimizing the model's performance and compute resources. The settings that you adjust during each iteration are called *hyperparameters*. They govern the training process and are held constant during training. 

The process of searching for optimal hyperparameters is called *hyperparameter tuning* or *hypertuning*, and is essential in any machine learning project. Hypertuning helps boost performance and reduces model complexity by removing unnecessary parameters (e.g., number of units in a dense layer).
There are two type of hyperparameters:
1. *Model hyperparameters* that influence model architecture (e.g., number and width of hidden layers in a DNN)
2. *Algorithm hyperparameters* that influence the speed and quality of training (e.g., learning rate and activation function).

The number of hyperparameter combinations, even in a shallow DNN, can grow insanely large making manually searching for the optimal set simply not feasible nor scalable. 
This post will introduce you to KerasTuner, a library made to automate the hyperparameter search. We'll build a deep learning model and train it on the [Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) with:
* Pre-selected hyperparameters
* Optimized hyperparameters with KerasTuner
* Optimized pre-trained Xception and ResNet models

Let's begin!

## Imports and Preprocessing

In [1]:
import tensorflow as tf
import kerastuner as kt

from tensorflow import keras

print(f"TensorFlow Version: {tf.__version__}")
print(f"KerasTuner Version: {kt.__version__}")

TensorFlow Version: 2.5.0
KerasTuner Version: 1.0.1


In [2]:
# Load and split data into train and test sets
(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [3]:
# Normalize pixels to values between 0 and 1
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

## Baseline Performance
Baseline performance will be judged by training a neural network with pre-selected hyperparameters:
* `1` hidden layer with `512` neurons
* `Adam` optimizer with learning rate of `0.001`
* Dropout layer of `0.2`

In [5]:
# Build baseline model with Sequential API
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
model.add(keras.layers.Dense(units=512, activation='relu', name='dense_1'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(10, activation='softmax'))

# Print model summary
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


2021-08-04 19:30:22.950186: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Notice how we hardcoded each hyperparameter.  These include the number and width of hidden layers, activation function, and dropout.  

We will now set the optimizer, learning rate, and loss function.

In [6]:
# Set training parameters
model.compile(optimizer=keras.optimizers.Adam(lr=0.001),
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])



With our model's setting defined, we are ready to train!

In [10]:
# Number of epochs
NUM_EPOCHS = 20

# Early stopping set after 5 epochs
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

# Train model
model.fit(X_train, y_train, epochs=NUM_EPOCHS, validation_split=0.2, callbacks=[stop_early], verbose=2)

Epoch 1/20
1500/1500 - 4s - loss: 0.1425 - accuracy: 0.9457 - val_loss: 0.4141 - val_accuracy: 0.8940
Epoch 2/20
1500/1500 - 4s - loss: 0.1391 - accuracy: 0.9463 - val_loss: 0.3898 - val_accuracy: 0.8983
Epoch 3/20
1500/1500 - 4s - loss: 0.1370 - accuracy: 0.9471 - val_loss: 0.3882 - val_accuracy: 0.8964
Epoch 4/20
1500/1500 - 4s - loss: 0.1401 - accuracy: 0.9466 - val_loss: 0.4148 - val_accuracy: 0.8958
Epoch 5/20
1500/1500 - 4s - loss: 0.1389 - accuracy: 0.9461 - val_loss: 0.4015 - val_accuracy: 0.8978
Epoch 6/20
1500/1500 - 4s - loss: 0.1363 - accuracy: 0.9492 - val_loss: 0.4282 - val_accuracy: 0.8961
Epoch 7/20
1500/1500 - 4s - loss: 0.1311 - accuracy: 0.9503 - val_loss: 0.4158 - val_accuracy: 0.8968
Epoch 8/20
1500/1500 - 4s - loss: 0.1293 - accuracy: 0.9505 - val_loss: 0.4085 - val_accuracy: 0.8907


<tensorflow.python.keras.callbacks.History at 0x7f9ee9c03ac0>

We'll create a helper function to evaluate our model and view the results in a dataframe helping us easily compare models later on.

In [23]:
import pandas as pd

def evaluate_model(model, X_test, y_test):
    """
    evaluate model on test set and show results in dataframe.
    
    model : keras model
        trained keras model.
    X_test : numpy array
        Features of holdout set.
    y_test : numpy array
        Labels of holdout set.
    """
    eval_dict = model.evaluate(X_test, y_test, return_dict=True)
    
    display_df = pd.DataFrame([eval_dict.values()], columns=[list(eval_dict.keys())])
    
    return display_df



In [25]:
baseline_df = evaluate_model(model, X_test, y_test)

baseline_df.index = ['baseline']

baseline_df.head()



Unnamed: 0,loss,accuracy
baseline,0.463813,0.8873


There's the results for a single set of hyperparameters.  Imagine trying out different learning rates, dropout percentages, number of hidden layers, and number of neurons in each hidden layer.  As you can see, manual hypertuning is simply not feasible nor scalable.  In the next section you'll see how KerasTuner solves these problems simply by automating the process and searching the hyperparameter space in an efficient way.  

## Keras Tuner