# Introduction to Keras Tuner
**Hyperparameters** are the variables that govern the training process and the topology of an ML model. These remain constant over the training process and directly impact the performance of the ML program. 

The process of finding the optimal set of hyperparameters is called *hyperparameter tuning* or *hypertuning*, and it is an essential part of a machine learning pipeline.

Hyperparameters are of two types:
1. *Model hyperparameters* which influence model selection such as the number and width of hidden layers

2. *Algorithm hyperparameters* which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent (SGD) and the number of nearest neighbors for a k Nearest Neighbors (KNN) classifier.

For more complex models, the number of hyperparameters can increase dramatically and tuning them manually can be quite challenging.

In [1]:
from tensorflow import keras

In [2]:
# Download the Fasion MNIST dataset
(img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


For preprocessing, normalize the pixel values to make the training converge faster.

In [3]:
# Normalize pixel values between 0 and 1
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0

## Baseline Performance

First, find a baseline performance using arbitrarily handpicked parameters to compare the results later.

We will be building a shallow **Dense Neural Network (DNN)**.

In [4]:
# Build the baseline model using the Sequential API
b_model = keras.Sequential()
b_model.add(keras.layers.Flatten(input_shape=(28, 28)))
b_model.add(keras.layers.Dense(units=512, activation='relu', name='dense_1')) # Will tune this layer later
b_model.add(keras.layers.Dropout(0.2))
b_model.add(keras.layers.Dense(10, activation='softmax'))

# Print model summary
b_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense_1 (Dense)             (None, 512)               401920    
                                                                 
 dropout (Dropout)           (None, 512)               0         
                                                                 
 dense (Dense)               (None, 10)                5130      
                                                                 
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


As shown, we hardcoded all the hyperparameters when declaring the layers. These include the number of hidden units, activation, and dropout.

Let's then setup the loss, metrics, and the optimizer. The learning rate is also a hyperparameter that can be tuned automatically but for now, let's set it at `0.001`.

In [5]:
# Setup the training parameters
b_model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
            loss=keras.losses.SparseCategoricalCrossentropy(),
            metrics=['accuracy'])

In [6]:
# Number of training epochs.
NUM_EPOCHS = 10

# Train the model
b_model.fit(img_train, label_train, epochs=NUM_EPOCHS, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa1677c2090>

In [7]:
# Evaluate model on the test set
b_eval_dict = b_model.evaluate(img_test, label_test, return_dict=True)



Define a helper function for displaying the results so it's easier to compare later. Prints the values of the hyparameters to tune, and the results of model evaluation
- model (Model) - Keras model to evaluate
- model_name (string) - arbitrary string to be used in identifying the model
- eval_dict (dict) -  results of model.evaluate

In [9]:
def print_results(model, model_name, eval_dict):

    print(f'\n{model_name}:')

    print(f'number of units in 1st Dense layer: {model.get_layer("dense_1").units}')
    print(f'learning rate for the optimizer: {model.optimizer.lr.numpy()}')

    for key,value in eval_dict.items():
        print(f'{key}: {value}')

# Print results for baseline model
print_results(b_model, 'BASELINE MODEL', b_eval_dict)


BASELINE MODEL:
number of units in 1st Dense layer: 512
learning rate for the optimizer: 0.0010000000474974513
loss: 0.34340524673461914
accuracy: 0.8791000247001648


## Keras Tuner
To perform hypertuning with Keras Tuner, we need to:

* Define the model
* Select which hyperparameters to tune
* Define its search space
* Define the search strategy

In [10]:
# Install Keras Tuner
!pip install -q -U keras-tuner

You should consider upgrading via the '/Users/minjaegil/miniconda3/bin/python -m pip install --upgrade pip' command.[0m


In [11]:
import tensorflow as tf
import kerastuner as kt

  


### Define the Model
The model we set up for hypertuning is called a *hypermodel*. When we build this model, we define the hyperparameter search space in addition to the model architecture. 

We can define a hypermodel through two approaches:

* By using a model builder function
* By [subclassing the `HyperModel` class](https://keras-team.github.io/keras-tuner/#you-can-use-a-hypermodel-subclass-instead-of-a-model-building-function) of the Keras Tuner API

Here, we will be using the first approach; we will use a model builder function to define the image classification model. This function returns a compiled model and uses hyperparameters you define inline to hypertune the model. 

The function below basically builds the same model we used earlier. The difference is that there are two hyperparameters that are setup for tuning:

* the number of hidden units of the first Dense layer
* the learning rate of the Adam optimizer

For this practice, we will: 

* use its `Int()` method to define the search space for the Dense units. This allows you to set a minimum and maximum value, as well as the step size when incrementing between these values. 

* use its `Choice()` method for the learning rate. This allows you to define discrete values to include in the search space when hypertuning.

All available methods and its sample usage can be found in the [official documentation](https://keras-team.github.io/keras-tuner/documentation/hyperparameters/#hyperparameters).

In [12]:
# Builds the model and sets up the hyperparameters to tune.
# hp - Keras tuner object
# returns model with hyperparameters to tune
def model_builder(hp):

  # Initialize the Sequential API and start stacking the layers
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(28, 28)))

  # Tune the number of units in the first Dense layer
  # Choose an optimal value between 32-512
    hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
    model.add(keras.layers.Dense(units=hp_units, activation='relu', name='dense_1'))

  # Add next layers
    model.add(keras.layers.Dropout(0.2))
    model.add(keras.layers.Dense(10, activation='softmax'))

  # Tune the learning rate for the optimizer
  # Choose an optimal value from 0.01, 0.001, or 0.0001
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

    model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss=keras.losses.SparseCategoricalCrossentropy(),
                metrics=['accuracy'])

    return model

### Instantiate the Tuner and Perform Hypertuning
Now that we have the model builder, we can then define how the tuner can find the optimal set of hyperparameters, also called the **search strategy**. Keras Tuner has [four tuners](https://keras-team.github.io/keras-tuner/documentation/tuners/) available with built-in strategies - `RandomSearch`, `Hyperband`, `BayesianOptimization`, and `Sklearn`. 

Here, we will use the **Hyperband tuner**. Hyperband is an algorithm specifically developed for hyperparameter optimization. It uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket wherein the algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Intuition behind the algorithm can be found in section 3 of [this paper](https://arxiv.org/pdf/1603.06560.pdf).

Hyperband determines the number of models to train in a bracket by computing 1 + log<sub>`factor`</sub>(`max_epochs`) and rounding it up to the nearest integer.

* the hypermodel (built by our model builder function)
* the `objective` to optimize (e.g. validation accuracy)
* a `directory` to save logs and checkpoints for every trial (model configuration) run during the hyperparameter search. If you re-run the hyperparameter search, the Keras Tuner uses the existing state from these logs to resume the search. To disable this behavior, pass an additional `overwrite=True` argument while instantiating the tuner.
* the `project_name` to differentiate with other runs. This will be used as a subdirectory name under the `directory`.

In [14]:
# Instantiate the tuner
tuner = kt.Hyperband(model_builder, 
                     objective='val_accuracy',
                     max_epochs=10,
                     factor=3,
                     directory='kt_dir',
                     project_name='kt_hyperband')

In [15]:
# Display hypertuning settings/summary
tuner.search_space_summary()

Search space summary
Default search space size: 2
units (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': None}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}


We can pass in a callback to stop training early when a metric is not improving. Below, we define an [EarlyStopping](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping) callback to monitor the validation loss and stop training if it's not improving after 5 epochs.

In [16]:
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

You will now run the hyperparameter search. The arguments for the search method are the same as those used for `tf.keras.model.fit` in addition to the callback above. This will take around 10 minutes to run.

In [17]:
# Perform hypertuning
tuner.search(img_train, label_train, epochs=NUM_EPOCHS,
             validation_split=0.2, callbacks=[stop_early])

Trial 30 Complete [00h 00m 52s]
val_accuracy: 0.8493333458900452

Best val_accuracy So Far: 0.8899166584014893
Total elapsed time: 00h 07m 43s
INFO:tensorflow:Oracle triggered exit


Get the optimal hyperparameters from the results using [get_best_hyperparameters()](https://keras-team.github.io/keras-tuner/documentation/tuners/#get_best_hyperparameters-method)  method.

In [18]:
best_hps=tuner.get_best_hyperparameters()[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first 
densely-connected layer is {best_hps.get('units')} and the optimal learning rate 
for the optimizer is {best_hps.get('learning_rate')}.""")


The hyperparameter search is complete. The optimal number of units in the first 
densely-connected layer is 192 and the optimal learning rate 
for the optimizer is 0.001.


## Build and Train Model
Now that we have the best set of hyperparameters, we can rebuild the hypermodel with these values and retrain it.

In [20]:
# Build the model with the optimal hyperparameters
h_model = tuner.hypermodel.build(best_hps)
h_model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_2 (Flatten)         (None, 784)               0         
                                                                 
 dense_1 (Dense)             (None, 192)               150720    
                                                                 
 dropout_2 (Dropout)         (None, 192)               0         
                                                                 
 dense_2 (Dense)             (None, 10)                1930      
                                                                 
Total params: 152,650
Trainable params: 152,650
Non-trainable params: 0
_________________________________________________________________


In [21]:
# Train the hypertuned model
h_model.fit(img_train, label_train, epochs=NUM_EPOCHS, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa157622590>

In [22]:
h_eval_dict = h_model.evaluate(img_test, label_test, return_dict=True)



In [23]:
# Print results of the baseline and hypertuned model
print_results(b_model, 'BASELINE MODEL', b_eval_dict)
print_results(h_model, 'HYPERTUNED MODEL', h_eval_dict)


BASELINE MODEL:
number of units in 1st Dense layer: 512
learning rate for the optimizer: 0.0010000000474974513
loss: 0.34340524673461914
accuracy: 0.8791000247001648

HYPERTUNED MODEL:
number of units in 1st Dense layer: 192
learning rate for the optimizer: 0.0010000000474974513
loss: 0.35585683584213257
accuracy: 0.876800000667572


We have **reduced the model size** (decrease in units) and saved compute resources while still having more/less/same accuracy.

Official document: [Keras Tuner Reference](https://keras.io/guides/keras_tuner/getting_started/#the-search-space-may-contain-conditional-hyperparameters)