<a href="https://colab.research.google.com/github/kanru-wang/coursera_quantization_pruning_distillation/blob/main/Keras_Tuner.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Keras Tuner

## Download and prepare the dataset

In [1]:
# Import keras
from tensorflow import keras

In [2]:
# Download the dataset and split into train and test sets
(img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


Normalize the pixel values to make the training converge faster.

In [3]:
# Normalize pixel values between 0 and 1
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0

## Baseline Performance

In [4]:
# Build the baseline model using the Sequential API
b_model = keras.Sequential()
b_model.add(keras.layers.Flatten(input_shape=(28, 28)))
b_model.add(keras.layers.Dense(units=512, activation='relu', name='dense_1')) # Will tune this layer later
b_model.add(keras.layers.Dropout(0.2))
b_model.add(keras.layers.Dense(10, activation='softmax'))

# Print model summary
b_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense_1 (Dense)             (None, 512)               401920    
                                                                 
 dropout (Dropout)           (None, 512)               0         
                                                                 
 dense (Dense)               (None, 10)                5130      
                                                                 
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


In [5]:
# Setup the training parameters
b_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001), # Will tune learning rate later
    loss=keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy']
)

In [6]:
# Number of training epochs.
NUM_EPOCHS = 10

# Train the model
b_model.fit(img_train, label_train, epochs=NUM_EPOCHS, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f45e042b6d0>

In [7]:
# Evaluate model on the test set
b_eval_dict = b_model.evaluate(img_test, label_test, return_dict=True)



In [8]:
# Define helper function
def print_results(model, model_name, layer_name, eval_dict):
    '''
    Prints the values of the hyparameters to tune, and the results of model evaluation

    Args:
        model (Model) - Keras model to evaluate
        model_name (string) - arbitrary string to be used in identifying the model
        layer_name (string) - name of the layer to tune
        eval_dict (dict) -  results of model.evaluate
    '''
    print(f'\n{model_name}:')

    print(f'number of units in 1st Dense layer: {model.get_layer(layer_name).units}')
    print(f'learning rate for the optimizer: {model.optimizer.lr.numpy()}')

    for key,value in eval_dict.items():
        print(f'{key}: {value}')

# Print results for baseline model
print_results(b_model, 'BASELINE MODEL', 'dense_1', b_eval_dict)


BASELINE MODEL:
number of units in 1st Dense layer: 512
learning rate for the optimizer: 0.0010000000474974513
loss: 0.35893580317497253
accuracy: 0.878600001335144


## Keras Tuner

To perform hypertuning with Keras Tuner, need to:

* Define the model
* Select which hyperparameters to tune
* Define the search space
* Define the search strategy

### Install and import packages

In [9]:
# Install Keras Tuner
!pip install -q -U keras-tuner

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m167.3/167.3 KB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [10]:
# Import required packages
import tensorflow as tf
import keras_tuner as kt

### Define the model

The model for hypertuning is called a *hypermodel*. Need to define the hyperparameter search space in addition to the model architecture. 

Two approaches to define a hypermodel:

* By using a model builder function
* By [subclassing the HyperModel class](https://keras-team.github.io/keras-tuner/#you-can-use-a-hypermodel-subclass-instead-of-a-model-building-function) of the Keras Tuner API


In below we use the first approach: Use a model builder function to define the image classification model. This function returns a compiled model and uses hyperparameters defined inline to hypertune the model.

Two hyperparameters that are setup for tuning:

* the number of hidden units of the first Dense layer
* the learning rate of the Adam optimizer

HyperParameters object configures the hyperparameter:

* use `Int()` to define the search space for the Dense units

* use `Choice()` for the learning rate

In [11]:
def model_builder(hp):
    '''
    Builds the model and sets up the hyperparameters to tune.

    Args:
        hp - Keras tuner object

    Returns:
        model with hyperparameters to tune
    '''
    
    # Initialize the Sequential API and start stacking the layers
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(28, 28)))

    # Tune the number of units in the first Dense layer
    # Choose an optimal value between 32-512
    hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
    model.add(keras.layers.Dense(units=hp_units, activation='relu', name='tuned_dense_1'))

    # Add next layers
    model.add(keras.layers.Dropout(0.2))
    model.add(keras.layers.Dense(10, activation='softmax'))

    # Tune the learning rate for the optimizer
    # Choose an optimal value from 0.01, 0.001, or 0.0001
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
        loss=keras.losses.SparseCategoricalCrossentropy(),
        metrics=['accuracy']
    )

    return model

## Instantiate the Tuner and perform hypertuning

Keras Tuner has four tuners available with built-in strategies - `RandomSearch`, `Hyperband`, `BayesianOptimization`, and `Sklearn`. 

Here we use the Hyperband tuner. Similar to sport championship, the algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round.

Hyperband determines the number of models to train in a bracket by computing 1 + log<sub>`factor`</sub>(`max_epochs`) and rounding it up to the nearest integer.

The `directory` save logs and checkpoints for every trial (model configuration) run during the hyperparameter search. If re-run the hyperparameter search, the Keras Tuner uses the existing state from these logs to resume the search. To disable this behavior, pass an additional `overwrite=True` argument while instantiating the tuner.

In [12]:
# Instantiate the tuner
tuner = kt.Hyperband(
    model_builder, # the hypermodel
    objective='val_accuracy',
    max_epochs=10,
    factor=3,
    directory='kt_dir',
    project_name='kt_hyperband'
)

In [13]:
# Display hypertuning settings
tuner.search_space_summary()

Search space summary
Default search space size: 2
units (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}


In [14]:
# Pass in an EarlyStopping callback to stop training early when a metric is not improving
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

In [15]:
# Perform hypertuning
tuner.search(img_train, label_train, epochs=NUM_EPOCHS, validation_split=0.2, callbacks=[stop_early])

Trial 30 Complete [00h 01m 23s]
val_accuracy: 0.8921666741371155

Best val_accuracy So Far: 0.8921666741371155
Total elapsed time: 00h 13m 51s


In [16]:
# Get the optimal hyperparameters from the results
best_hps=tuner.get_best_hyperparameters()[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")


The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 512 and the optimal learning rate for the optimizer
is 0.001.



## Build and train the model

Now that you have the best set of hyperparameters, you can rebuild the hypermodel with these values and retrain it.

In [17]:
# Build the model with the optimal hyperparameters
h_model = tuner.hypermodel.build(best_hps)
h_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_1 (Flatten)         (None, 784)               0         
                                                                 
 tuned_dense_1 (Dense)       (None, 512)               401920    
                                                                 
 dropout_1 (Dropout)         (None, 512)               0         
                                                                 
 dense_1 (Dense)             (None, 10)                5130      
                                                                 
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


In [18]:
# Train the hypertuned model
h_model.fit(img_train, label_train, epochs=NUM_EPOCHS, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f453b6371f0>

In [19]:
# Evaluate the hypertuned model against the test set
h_eval_dict = h_model.evaluate(img_test, label_test, return_dict=True)



In [20]:
# Print results of the baseline and hypertuned model
print_results(b_model, 'BASELINE MODEL', 'dense_1', b_eval_dict)
print_results(h_model, 'HYPERTUNED MODEL', 'tuned_dense_1', h_eval_dict)


BASELINE MODEL:
number of units in 1st Dense layer: 512
learning rate for the optimizer: 0.0010000000474974513
loss: 0.35893580317497253
accuracy: 0.878600001335144

HYPERTUNED MODEL:
number of units in 1st Dense layer: 512
learning rate for the optimizer: 0.0010000000474974513
loss: 0.3539847731590271
accuracy: 0.878600001335144
