<a id='top'></a>
<a name="top"></a><!--Need for Colab-->
# Extra: KerasTuner (deep learning models)

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/gbih/ml-notes/blob/main/book_hands_on/20_01_kerastuner_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
</table>

1. [Setup](#setup)
2. [Overview](#2.0)
2. [Examples](#3.0)
    * 3.1 [Keras Tuner with Hyperband](#3.1)
    * 3.2 [Keras Tuner with BayesianOptimization](#3.2)
    * 3.3 [neptune.ai article](#3.3)

---
<a name="setup"></a>
# 1. Setup
<a href="#top">[back to top]</a>

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pprint
import sklearn
import sys
import tensorflow as tf
from tensorflow import keras
from pathlib import Path

# global seed
tf.random.set_seed(42)
np.random.seed(42)

pp = pprint.PrettyPrinter(indent=4)

# Need to install keras_tuner on colab
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    print("Installing Keras Tuner")
    !pip install -q -U keras-tuner 

import keras_tuner as kt
    
def HR():
    print("-"*40)    

print("Loaded libraries..")

Installing Keras Tuner
[K     |████████████████████████████████| 135 kB 32.9 MB/s 
[?25hLoaded libraries..


In [9]:
DATA_ROOT = 'data_chp20_kt'
data_dir = Path() / DATA_ROOT
data_dir.mkdir(parents=True, exist_ok=True)

---
<a name="2.0"></a>
# 2. Overview
<a href="#top">[back to top]</a>

The Keras Tuner is a library that helps you pick the optimal set of hyperparameters for your TensorFlow program. The process of selecting the right set of hyperparameters for your machine learning (ML) application is called hyperparameter tuning or hypertuning.

Hyperparameters are the variables that govern the training process and the topology of an ML model. These variables remain constant over the training process and directly impact the performance of your ML program. Hyperparameters are of two types:

1. **Model hyperparameters** which influence model selection such as the number and width of hidden layers
2. **Algorithm hyperparameters** which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent (SGD) and the number of nearest neighbors for a k Nearest Neighbors (KNN) classifier

In this tutorial, you will use the Keras Tuner to perform hypertuning for an image classification application.


---
<a name="3.0"></a>
# 3. Examples
<a href="#top">[back to top]</a>

Simple framework for setting up data, passing tuner, model fitting and evaluation.

In [6]:
# Define the hypermodel for standard MLP model

# Part 1:

def model_builder(hp):
    print("== Calling model_builder.")
    print("hp.values:")
    pp.pprint(hp.values)

    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(28,28)))

    for i in range(hp.Int('num_layers', 1, 5)):
        model.add(
            keras.layers.Dense(
                units=hp.Int(f"units_{i}", min_value=32, max_value=512, step=32),
                activation='relu'
            )
        )

    if hp.Boolean("dropout"):
        model.add(keras.layers.Dropout(rate=0.25))
    model.add(keras.layers.Dense(10))

    model.compile(
        # Tune the learning rate for the optimizer
        optimizer=keras.optimizers.Adam(hp.Choice('learning_rate', [0.01, 0.001, 0.0001])),
        loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=['accuracy']
    )
    print("== Exiting model_builder.")
    HR()
    return model



def kt_framework(tuner):
    
    # Download and prepare the dataset
    (img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()
        
    # Check if int or float. 
    # If int, model loss is 'SparseCategoricalCrossentropy'
    label_choices = len(np.unique(label_train))

    # Normalize pixel values between 0 and 1
    img_train = img_train.astype('float32') / 255.0
    img_test = img_test.astype('float32') / 255.0

    # Tuner passed as parameter

    HR()

    # Callback to stop training after reaching a certain value for the validation
    stop_early = tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=5
    )
    
    # Run the hyperparameter search
    # Same API to model.fit()
    print("Run the hyperparameter search.")
    tuner.search(
        img_train, 
        label_train,
        epochs=50,
        #epochs=20,
        validation_split=0.2,
        callbacks=[stop_early],
        verbose=2
    )
    HR()
    
    ####################################

    # Part 2:
    
    # Get the optimal hyperparameters
    best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

    print("===== The hyperparameter search is complete =====")
    pp.pprint(best_hps.values)
        
    # Train the model
    # Find the optimal number of epochs to train the model with the 
    # hyperparameters obtained from the search.
    
    # Build the model with the optimal hyperparameters and train it on the data for 50 epochs.
    print("Build the model with the optimal hyperparameters and train it on the data for n-epochs.")
    model = tuner.hypermodel.build(best_hps)
    
    history = model.fit(
        img_train,
        label_train,
        epochs=50,
        #epochs=2,
        validation_split=0.2,
        verbose=2
    )
    
    val_acc_per_epoch = history.history['val_accuracy']
    best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
    print(f"Best epoch: {best_epoch}")
    HR()
    
    
    # Re-instantiate the hypermodel and train it with the optimal number of epochs from above.
    print("Re-instantiate the hypermodel and train it with the optimal number of epochs from above.")
    hypermodel = tuner.hypermodel.build(best_hps)

    # Retrain the model
    print("Retrain the model:\n")
    hypermodel.fit(
        img_train, 
        label_train, 
        epochs=best_epoch, 
        validation_split=0.2,
        verbose=2
    )
    HR()

    print(hypermodel.summary())
    HR() 

    # Evaluate the hypermodel on the test data.
    print("Evaluate the hypermodel on the test data.")
    eval_result = hypermodel.evaluate(img_test, label_test)
    print("[test loss, test accuracy]:", eval_result)

In [7]:
# Quickly test if the model builds successfully.
# keras_tuner.HyperParameters()
# Container for both a hyperparameter space, and current values.
# A HyperParameters instance can be pass to HyperModel.build(hp) as an argument to build a model.

result = model_builder(kt.HyperParameters())
print(type(result)) # keras.engine.sequential.Sequential

== Calling model_builder.
hp.values:
{}
== Exiting model_builder.
----------------------------------------
<class 'keras.engine.sequential.Sequential'>


<a name="3.1"></a>
## 3.1 Keras Tuner with HyperBand
<a href="#top">[back to top]</a>

In [5]:
tuner_hyperband = kt.Hyperband(
    model_builder,
    objective='val_loss',
    max_epochs=10,
    factor=3,
    directory=f"{data_dir}/ky_hyperband",
    project_name='kt_hyperband_project',
)

kt_framework(tuner_hyperband)

Trial 30 Complete [00h 00m 38s]
val_loss: 0.444957971572876

Best val_loss So Far: 0.3163602948188782
Total elapsed time: 00h 10m 08s
INFO:tensorflow:Oracle triggered exit
----------------------------------------
===== The hyperparameter search is complete =====
{   'dropout': True,
    'learning_rate': 0.001,
    'num_layers': 4,
    'tuner/bracket': 0,
    'tuner/epochs': 10,
    'tuner/initial_epoch': 0,
    'tuner/round': 0,
    'units_0': 352,
    'units_1': 384,
    'units_2': 128,
    'units_3': 96,
    'units_4': 64}
Build the model with the optimal hyperparameters and train it on the data for n-epochs.
== Calling model_builder.
hp.values:
{   'dropout': True,
    'learning_rate': 0.001,
    'num_layers': 4,
    'tuner/bracket': 0,
    'tuner/epochs': 10,
    'tuner/initial_epoch': 0,
    'tuner/round': 0,
    'units_0': 352,
    'units_1': 384,
    'units_2': 128,
    'units_3': 96,
    'units_4': 64}
== Exiting model_builder.
----------------------------------------
Epoch 1/5

<a name="3.2"></a>
## 3.2 Keras Tuner with BayesianOptimization
<a href="#top">[back to top]</a>

In [10]:
tuner_bayes_optimization = kt.BayesianOptimization(
    model_builder,
    #objective='val_accuracy',
    objective='val_loss',
    max_trials=10,
    seed=42,
    directory=f'{data_dir}/kt_bayes_optimization',
    project_name='kt_bayes_optimization_project',
    #overwrite=True
)

kt_framework(tuner_bayes_optimization)

Trial 10 Complete [00h 01m 52s]
val_loss: 0.2962571084499359

Best val_loss So Far: 0.2794955372810364
Total elapsed time: 00h 23m 31s
INFO:tensorflow:Oracle triggered exit
----------------------------------------
===== The hyperparameter search is complete =====
{   'dropout': True,
    'learning_rate': 0.0001,
    'num_layers': 1,
    'units_0': 512,
    'units_1': 32,
    'units_2': 32,
    'units_3': 32,
    'units_4': 32}
Build the model with the optimal hyperparameters and train it on the data for n-epochs.
== Calling model_builder.
hp.values:
{   'dropout': True,
    'learning_rate': 0.0001,
    'num_layers': 1,
    'units_0': 512,
    'units_1': 32,
    'units_2': 32,
    'units_3': 32,
    'units_4': 32}
== Exiting model_builder.
----------------------------------------
Epoch 1/50
1500/1500 - 4s - loss: 0.6764 - accuracy: 0.7740 - val_loss: 0.4817 - val_accuracy: 0.8397 - 4s/epoch - 3ms/step
Epoch 2/50
1500/1500 - 3s - loss: 0.4670 - accuracy: 0.8407 - val_loss: 0.4256 - val_a

<a name="3.3"></a>
## 3.3 Keras Tuner with RandomSearch
<a href="#top">[back to top]</a>

In [11]:
tuner_random_search = kt.RandomSearch(
    model_builder,
    #objective='val_accuracy',
    objective='val_loss',
    max_trials=10,
    seed=42,
    directory=f'{data_dir}/kt_random_search',
    project_name='kt_random_search_project',
)

kt_framework(tuner_random_search)

Trial 10 Complete [00h 01m 41s]
val_loss: 0.31762030720710754

Best val_loss So Far: 0.2984068691730499
Total elapsed time: 00h 14m 06s
INFO:tensorflow:Oracle triggered exit
----------------------------------------
===== The hyperparameter search is complete =====
{   'dropout': True,
    'learning_rate': 0.0001,
    'num_layers': 3,
    'units_0': 480,
    'units_1': 64,
    'units_2': 320,
    'units_3': 128,
    'units_4': 192}
Build the model with the optimal hyperparameters and train it on the data for n-epochs.
== Calling model_builder.
hp.values:
{   'dropout': True,
    'learning_rate': 0.0001,
    'num_layers': 3,
    'units_0': 480,
    'units_1': 64,
    'units_2': 320,
    'units_3': 128,
    'units_4': 192}
== Exiting model_builder.
----------------------------------------
Epoch 1/50
1500/1500 - 4s - loss: 0.6731 - accuracy: 0.7699 - val_loss: 0.4469 - val_accuracy: 0.8450 - 4s/epoch - 3ms/step
Epoch 2/50
1500/1500 - 4s - loss: 0.4291 - accuracy: 0.8490 - val_loss: 0.3937 

<a name="3.3"></a>
## 3.3 neptune.ai article
<a href="#top">[back to top]</a>

https://neptune.ai/blog/keras-tuner-tuning-hyperparameters-deep-learning-model

High-level overview of available tuners

How can we get the most out of our model using Keras Tuner? First of all, it’s important to say that there are multiple tuners in Keras. They use different algorithms for hyperparameter search. Here are the algorithms, with corresponding tuners in Keras:

1. `kerastuner.tuners.hyperband.Hyperband` for the HyperBand-based algorithm;
2. `kerastuner.tuners.bayesian.BayesianOptimization` for the Gaussian process-based algorithm;
3. `kerastuner.tuners.randomsearch.RandomSearch` for the random search tuner.

To give you an initial intuition of these methods, I can say that RandomSearch is the least efficient approach. It doesn’t learn from previously tested parameter combinations, and simply samples parameter combinations from a search space randomly.

**BayesianOptimization** is similar to RandomSearch in a way that they both sample a subset of hyperparameter combinations. The key difference is that BayesianOptimization doesn’t sample hyperparameter combinations randomly, it follows a probabilistic approach under the hood. This approach takes into account already tested combinations and uses this information to sample the next combination for a test. 

Hyperband is an optimized version of RandomSearch in terms of search time and, therefore, resources allocation. 

If you’re a curious person and want to learn more about Random Search, Bayesian Optimization and HyperBand, I definitely recommend this article: https://neptune.ai/blog/hyperband-and-bohb-understanding-state-of-the-art-hyperparameter-optimization-algorithms