# **Tuning Hyperparameters of An Artificial Neural Network Using Keras Tuner.**

Hyperparameter tuning is the process of searching optimal set of hyperparameters. It is very difficult to find the optimal set of hyperparameters manually, so certain algorithms make our hyperparameter search easier. Grid search is one of the algorithms that perform an exhaustive search, which is time-consuming by nature. An alternative is the Random Search algorithm that randomly searches the hyperparameter search space, but doesn't guarantee a globally optimal solution. The algorithms which are more likely to provide globally optimal solutions are Bayesian optimization, Hyperband, and Hyperparameter optimization using Genetic algorithms.

**Hyperparameters of an Artificial Neural Network are:**

*   Number of layers to choose.
*   Number of neurons in a layer to choose.
*   Choice of the optimization function.
*   Choice of the learning rate for optimization function.
*   Choice of the loss function.
*   Choice of metrics.
*   Choice of activation function.
*   Choice of layer weight initialization.

## **KerasTuner**

[**KerasTuner**](https://keras.io/keras_tuner/) is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search. Easily configure your search space with a define-by-run syntax, then leverage one of the available search algorithms to find the best hyperparameter values for your models. KerasTuner comes with Bayesian Optimization, Hyperband, and Random Search algorithms built-in, and is also designed to be easy for researchers to extend in order to experiment with new search algorithms.

> [Introduction to the Keras Tuner](https://www.tensorflow.org/tutorials/keras/keras_tuner)


Keras Tuner is an open-source python library developed exclusively for tuning the hyperparameters of Artificial Neural Networks. Keras tuner currently supports four types of tuners or algorithms.

*   **Bayesian Optimization**
*   **Hyperband**
*   **Sklearn**
*   **Random Search**

In [None]:
!pip install keras-tuner

In [None]:
# Import Library.
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import StandardScaler

TRAIN_DATA_PATH = "/content/sample_data/california_housing_train.csv"
TEST_DATA_PATH = "/content/sample_data/california_housing_test.csv"
TARGET_NAME = "median_house_value"

# Load Dataset.
train_data = pd.read_csv(TRAIN_DATA_PATH)
test_data = pd.read_csv(TEST_DATA_PATH)

# Split the data into features and target sets.
X_train, y_train = train_data.drop(TARGET_NAME, axis=1), train_data[TARGET_NAME]
X_test, y_test = test_data.drop(TARGET_NAME, axis=1), test_data[TARGET_NAME]

# Feature Scaling.
scaler = StandardScaler()

X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns)
X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)

**Let's fine-tune the model with a Keras-tuner. The following tuner gets defined with the model builder function.**

In [None]:
import kerastuner as kt


def build_model(hp):
    model = tf.keras.Sequential()

    # Tune the number of units in the first Dense layer. Choose an optimal value between 32-512.
    hp_units1 = hp.Int("units1", min_value=32, max_value=512, step=16)
    hp_units2 = hp.Int("units2", min_value=32, max_value=512, step=16)
    hp_units3 = hp.Int("units3", min_value=32, max_value=512, step=16)
    model.add(tf.keras.layers.Dense(units=hp_units1, activation="relu"))
    model.add(tf.keras.layers.Dense(units=hp_units2, activation="relu"))
    model.add(tf.keras.layers.Dense(units=hp_units3, activation="relu"))
    model.add(tf.keras.layers.Dense(1, kernel_initializer="normal", activation="linear"))

    # Tune the learning rate for the optimizer. Choose an optimal value from 0.01, 0.001, or 0.0001.
    hp_learning_rate = hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])

    # Compile the Model.
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
        loss="MeanSquaredLogarithmicError",
        metrics=["MeanSquaredLogarithmicError"],
    )

    return model


# HyperBand Algorithm from Keras Tuner.
tuner = kt.Hyperband(
    build_model,
    objective="val_mean_squared_logarithmic_error",
    max_epochs=10,
    directory="keras_tuner_dir",
    project_name="keras_tuner_demo",
)

tuner.search(
    X_train_scaled,
    y_train,
    validation_data=(X_test_scaled, y_test),
    epochs=20,
    validation_split=0.2,
)

Trial 30 Complete [00h 00m 41s]
val_mean_squared_logarithmic_error: 0.09914283454418182

Best val_mean_squared_logarithmic_error So Far: 0.09738881886005402
Total elapsed time: 00h 05m 42s
INFO:tensorflow:Oracle triggered exit


In [None]:
# The best hyper-parameters can be fetched using the method `get_best_hyperparameters()` in the tuner instance.
for h_param in [f"units{i}" for i in range(1, 4)] + ["learning_rate"]:
    print(h_param, tuner.get_best_hyperparameters()[0].get(h_param))

units1 160
units2 464
units3 432
learning_rate 0.01


In [None]:
# Select the Best Model which is saved in the tuner instance.
best_model = tuner.get_best_models()[0]
best_model.build(X_train_scaled.shape)
best_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (17000, 160)              1440      
                                                                 
 dense_1 (Dense)             (17000, 464)              74704     
                                                                 
 dense_2 (Dense)             (17000, 432)              200880    
                                                                 
 dense_3 (Dense)             (17000, 1)                433       
                                                                 
Total params: 277,457
Trainable params: 277,457
Non-trainable params: 0
_________________________________________________________________


In [None]:
# Fit the Best Model.
best_model.fit(
    X_train_scaled,
    y_train,
    validation_data=(X_test_scaled, y_test),
    epochs=20,
    batch_size=64,
    validation_split=0.2,
)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f5c7582d3d0>

**Alternatively, we can define the hyper model by subclassing HyperModel class in the Keras Tuner.**

**HyperModel is a keras tuner class that lets you define the model with a searchable space and build it.**

In [None]:
# Create a class that inherits from kerastuner.HyperModel
from kerastuner import HyperModel


class RegressionHyperModel(HyperModel):
    def __init__(self, input_shape):
        self.input_shape = input_shape

    def build(self, hp):
        model = tf.keras.Sequential()

        model.add(
            tf.keras.layers.Dense(
                units=hp.Int("units", 8, 64, 4, default=8),
                activation=hp.Choice(
                    "dense_activation",
                    values=["relu", "tanh", "sigmoid"],
                    default="relu",
                ),
                input_shape=self.input_shape,
            )
        )

        model.add(
            tf.keras.layers.Dense(
                units=hp.Int("units", 16, 64, 4, default=16),
                activation=hp.Choice(
                    "dense_activation",
                    values=["relu", "tanh", "sigmoid"],
                    default="relu",
                ),
            )
        )

        model.add(
            tf.keras.layers.Dropout(
                hp.Float("dropout", min_value=0.0, max_value=0.1, default=0.005, step=0.01)
            )
        )

        model.add(tf.keras.layers.Dense(1))

        # Tune the learning rate for the optimizer. Choose an optimal value from 0.01, 0.001, or 0.0001.
        hp_learning_rate = hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])

        # Compile the Model.
        model.compile(
            optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
            loss="mse",
            metrics=["mse"],
        )

        return model

In [None]:
# Instantiate HyperModel.
hypermodel = RegressionHyperModel(input_shape=(X_train_scaled.shape[1],))

### **Random Search**

Random Search is a hyperparameter tuning method which randomly tries a combination of hyperparameters from a given search space. To use this method in keras tuner, let's define a tuner using one of the available Tuners. Here's a full list of [Tuners](https://keras.io/api/keras_tuner/tuners/).

In [None]:
# RandomSearch Algorithm from Keras Tuner.
tuner_rs = kt.RandomSearch(
    hypermodel, objective="mse", max_trials=10, executions_per_trial=2, seed=42
)

# Run the random search tuner using the search method.
tuner_rs.search(
    X_train_scaled,
    y_train,
    validation_data=(X_test_scaled, y_test),
    epochs=20,
    validation_split=0.2,
)

# Select the best combination of hyperparameters the tuner had tried and evaluate.
best_model = tuner_rs.get_best_models(num_models=1)[0]
best_model.evaluate(X_test_scaled, y_test)

Trial 10 Complete [00h 00m 38s]
mse: 48688111616.0

Best mse So Far: 4101500032.0
Total elapsed time: 00h 06m 29s
INFO:tensorflow:Oracle triggered exit


[4365881344.0, 4365881344.0]

### **Hyperband**

Hyperband is based on the algorithm by Li et. al. It optimizes random search methods through adaptive resource allocation and early-stopping. Hyperband first runs random hyperparameter configurations for one iteration or two. In the next step, this algorithm selects the set of configurations that performs well and finally continues tuning the best performers.

In [None]:
# Hyperband Algorithm from Keras Tuner.
tuner_hb = kt.Hyperband(
    hypermodel,
    max_epochs=10,
    objective="mse",
    seed=42,
    executions_per_trial=2,
    directory="tuner_dir",
    project_name="hyperband_tuner_demo",
)

# Run the Hyperband tuner using the search method.
tuner_hb.search(
    X_train_scaled,
    y_train,
    validation_data=(X_test_scaled, y_test),
    epochs=20,
    validation_split=0.2,
)

# Select the best combination of hyperparameters the tuner had tried and evaluate.
best_model = tuner_hb.get_best_models(num_models=1)[0]
best_model.evaluate(X_test_scaled, y_test)

Trial 30 Complete [00h 00m 19s]
mse: 50970540032.0

Best mse So Far: 4313776896.0
Total elapsed time: 00h 05m 05s
INFO:tensorflow:Oracle triggered exit


[4558466048.0, 4558466048.0]

### **Bayesian Optimization**

Bayesian Optimization is a probabilistic model that maps the hyperparameters to a probability score on the objective function. Unlike Random Search and Hyperband models, Bayesian Optimization keeps track of its past evaluation results and uses it to build the probability model.

In [None]:
# Bayesian Optimization from Keras Tuner.
tuner_bo = kt.BayesianOptimization(
    hypermodel,
    objective="mse",
    max_trials=10,
    seed=42,
    executions_per_trial=2,
    directory="tuner_dir",
    project_name="bayesian_tuner_demo",
)

# Run the Bayesian Optimization using the search method.
tuner_bo.search(
    X_train_scaled,
    y_train,
    validation_data=(X_test_scaled, y_test),
    epochs=20,
    validation_split=0.2,
)

# Select the best combination of hyperparameters the tuner had tried and evaluate.
best_model = tuner_bo.get_best_models(num_models=1)[0]
best_model.evaluate(X_test_scaled, y_test)

Trial 10 Complete [00h 00m 42s]
mse: 3989895552.0

Best mse So Far: 3989895552.0
Total elapsed time: 00h 06m 37s
INFO:tensorflow:Oracle triggered exit


[4333397504.0, 4333397504.0]