<a href="https://colab.research.google.com/github/luigiselmi/dl_tensorflow/blob/main/best_practices.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Best practices
In this notebook we'll see how to improve a model's performance by tuning its architecture-level hyperparameters such as:

* the number of layers
* the number of units, and filters of a layer
* the activation function of a layer
* the amount of dropout
* batch normalization layers
* the optimizer and its learning rate

The tuning is a search in the hyperparameters space that is much better to do automatically and systematically rather than manually. Keras provides a tool, [KerasTuner](https://keras.io/keras_tuner/getting_started/), to perform the search of the optimal hyperparameters. The tool allows us to set a range of values for each hyperparameter to search for instead of only one. In order to use the tool we have to define a function to build our model and to pass the values of the parameters set by the tool. We start by downloading the tool.

In [1]:
!pip install keras-tuner -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/129.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import keras
from keras import layers

## The model-building function
We define a function that accept a variable that will be used by the Keras Tuner to pass the hyperparameters to build the model. The model will be created, compiled and returned to be fit with the data. A a first example we define a function to build a model with two fully connected layers and

* a variable number of units in the first layer, between 16 and 64, with a step of 16, that is 16, 32, 48, and 64
* two different optimizers: rmsprop and adam

In [3]:
def build_model(hp):
    units = hp.Int(name="units", min_value=16, max_value=64, step=16)
    model = keras.Sequential([
        layers.Dense(units, activation="relu"),
        layers.Dense(10, activation="softmax")
    ])

    optimizer = hp.Choice(name="optimizer", values=["rmsprop", "adam"])

    model.compile(
        optimizer=optimizer,
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"])

    return model

We can achieve the same result in a more modular way by subclassing the HyperModel class and overriding the _build()_ function

In [4]:
import keras_tuner as kt

class SimpleMLP(kt.HyperModel):
    def __init__(self, num_classes):
        self.num_classes = num_classes

    def build(self, hp):
        units = hp.Int(name="units", min_value=16, max_value=64, step=16)
        model = keras.Sequential([
            layers.Dense(units, activation="relu"),
            layers.Dense(self.num_classes, activation="softmax")
        ])

        optimizer = hp.Choice(name="optimizer", values=["rmsprop", "adam"])

        model.compile(
            optimizer=optimizer,
            loss="sparse_categorical_crossentropy",
            metrics=["accuracy"])

        return model

hypermodel = SimpleMLP(num_classes=10)

## The tuner
The hyperparameters search space for our model has 4 * 2 = 8 possible states. The Keras Tuner will build, fit, and assess the performances of the model trying all the possible values of the hyperparameters automatically, and it will finally store the best model in a directory. In our example the tuner will assess the validation accuracy of the models. The tuner can use different algorithms for its search in the hyperparameter space: random search, grid search, Bayesian search, and others.   

In [5]:
tuner = kt.BayesianOptimization(
    build_model,
    objective="val_accuracy",
    max_trials=10, #100
    executions_per_trial=2,
    directory="mnist_kt_test",
    overwrite=True,
)

In [6]:
tuner.search_space_summary()

Search space summary
Default search space size: 2
units (Int)
{'default': None, 'conditions': [], 'min_value': 16, 'max_value': 64, 'step': 16, 'sampling': 'linear'}
optimizer (Choice)
{'default': 'rmsprop', 'conditions': [], 'values': ['rmsprop', 'adam'], 'ordered': False}


## Looking for the best model using the MNIST dataset
We look for the best hyperparameters settings for a model that will be used in a classification task with the MNIST dataset

In [8]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

x_train = x_train.reshape((-1, 28 * 28)).astype("float32") / 255
x_test = x_test.reshape((-1, 28 * 28)).astype("float32") / 255
x_train_full = x_train[:]
y_train_full = y_train[:]

num_val_samples = 10000
x_train, x_val = x_train[:-num_val_samples], x_train[-num_val_samples:]
y_train, y_val = y_train[:-num_val_samples], y_train[-num_val_samples:]

callbacks = [
    keras.callbacks.EarlyStopping(monitor="val_loss", patience=5),
]

num_epochs = 20

tuner.search(
    x_train, y_train,
    batch_size=128,
    epochs=num_epochs, # 100
    validation_data=(x_val, y_val),
    callbacks=callbacks,
    verbose=2,
)

Trial 100 Complete [00h 00m 46s]
val_accuracy: 0.9749000072479248

Best val_accuracy So Far: 0.9772000014781952
Total elapsed time: 01h 18m 24s


## Best hyperparameters configuration
After the hyperparameters search is complete we can use the best model, that with the highest rank. We can retrain it using a higher number of epochs and early stopping to stop the retraining when it starts to overfit. We can select the first 4 set of hyperparamters   

In [9]:
top_n = 4
best_hps = tuner.get_best_hyperparameters(top_n)

The best set, index 0, is the set of hyperparameters with the highest accuracy. We can see its hyperparameters

In [18]:
best_hp = best_hps[0]
best_hp.values

{'units': 64, 'optimizer': 'rmsprop'}

## Best model retraining
We can retrain one or more of the best models using their hypeparameters and a higher number of epochs with early stopping to stop the retraining when it starts to overfit. We can select the first 4 sets of hyperparamters

In [14]:
def get_best_epoch(hp):
    model = build_model(hp)

    callbacks=[
        keras.callbacks.EarlyStopping(
            monitor="val_loss", mode="min", patience=10)
    ]

    history = model.fit(
        x_train, y_train,
        validation_data=(x_val, y_val),
        epochs=num_epochs, #100
        batch_size=128,
        callbacks=callbacks)

    val_loss_per_epoch = history.history["val_loss"]
    best_epoch = val_loss_per_epoch.index(min(val_loss_per_epoch)) + 1
    print(f"Best epoch: {best_epoch}")
    return best_epoch

In [15]:
best_epoch = get_best_epoch(hp)
best_epoch

Epoch 1/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.8111 - loss: 0.7125 - val_accuracy: 0.9317 - val_loss: 0.2355
Epoch 2/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.9362 - loss: 0.2288 - val_accuracy: 0.9515 - val_loss: 0.1739
Epoch 3/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9512 - loss: 0.1720 - val_accuracy: 0.9580 - val_loss: 0.1481
Epoch 4/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9593 - loss: 0.1398 - val_accuracy: 0.9639 - val_loss: 0.1275
Epoch 5/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9662 - loss: 0.1143 - val_accuracy: 0.9638 - val_loss: 0.1214
Epoch 6/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9715 - loss: 0.0988 - val_accuracy: 0.9672 - val_loss: 0.1089
Epoch 7/20
[1m391/391[0m 

17

We use more epochs for the retraining of the best models and we use the full dataset without the validation set

In [11]:
def get_best_trained_model(hp):
    best_epoch = get_best_epoch(hp)
    model = build_model(hp)
    model.fit(
        x_train_full, y_train_full,
        batch_size=128, epochs=int(best_epoch * 1.2))
    return model

We retrain the best models

In [25]:
best_models_retrained = []
for hp in best_hps:
    model = get_best_trained_model(hp)
    model.evaluate(x_test, y_test)
    best_models_retrained.append(model)

Epoch 1/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.8197 - loss: 0.6804 - val_accuracy: 0.9386 - val_loss: 0.2289
Epoch 2/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9373 - loss: 0.2225 - val_accuracy: 0.9499 - val_loss: 0.1778
Epoch 3/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9499 - loss: 0.1733 - val_accuracy: 0.9603 - val_loss: 0.1494
Epoch 4/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9595 - loss: 0.1377 - val_accuracy: 0.9637 - val_loss: 0.1339
Epoch 5/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9674 - loss: 0.1153 - val_accuracy: 0.9658 - val_loss: 0.1237
Epoch 6/20
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.9712 - loss: 0.0999 - val_accuracy: 0.9654 - val_loss: 0.1157
Epoch 7/20
[1m391/391[0m 

In [26]:
best_model_retrained = best_models_retrained[0]
best_model_retrained.summary()

If we do not want to retrain the best models we can simply get them from the tuner

In [27]:
best_models = tuner.get_best_models(top_n)

  saveable.load_own_variables(weights_store.get(inner_path))
  saveable.load_own_variables(weights_store.get(inner_path))
