# Hyperparameter tuning with TensorFlow Keras Tuner

In neural networks, hyperparameters are the configuration settings used to control the training process. Unlike model parameters (such as weights and biases), hyperparameters are set before training begins. They include settings such as learning rate, number of layers and neurons, batch size, epochs and more.

Keras Tuner is a library that helps to find the optimal set of hyperparameters for our model. It automates the process of searching through hyperparameter space and can find the best hyperparameters efficiently.

**Here is how it works:**
1. Define a hypermodel and the search space: Create a model-building function that specifies which hyperparameters to tune and their possible values. This function should take hyperparameters as input and return a compiled model.
2. Choose a tuning strategy and initialize the tuner: Keras Tuner provides several search algorithms to explore the hyperparameter space efficiently.
3. Perform the search: The tuner evaluates multiple models with different hyperparameter settings and finds the best combination.
4. Retrieve the best model: Once the tuning is complete, we can retrieve the model with the optimal hyperparameters.

**Types of tuners:**
1. Random search: This tuner randomly samples the hyperparameter space and builds the model for each sample. It is useful for exploring a wide hyperparameter space quickly.
2. Bayesian optimization: This tuner uses a probabilistic model to choose hyperparameters, aiming to find the best values in fewer iterations compared to a random search.
3. Hyperband: This tuner uses a bandit-based approach to tune hyperparameters. It explores a large hyperparameter space and can be more efficient by stopping poorly performing trials early.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam, SGD
from keras_tuner import RandomSearch, GridSearch, BayesianOptimization, Hyperband

In [2]:
# Generate dummy data
X_train = np.random.rand(100, 10)
y_train = np.random.rand(100, 1)

### Step 1: Define the model-building function and the search space
The model-building function in Keras Tuner is a key component of the hyperparameter tuning process. It allows us to define the architecture of our neural network and specify which hyperparameters should be tuned. The function-based approach allows Keras Tuner to build multiple models with different hyperparameter configurations. By defining the model architecture in a function, we provide a flexible template that Keras Tuner can use to explore various combinations of hyperparameters.

We will define tunable hyperparameters using the `hp` object, specifying ranges or options for each. This includes integers (e.g., number of units), floats (e.g., learning rate), and choices (e.g., activation functions).

In [3]:
def build_model(hp):
    model = Sequential()
    # Adding the first dense layer with variable units
    model.add(Dense(units=hp.Int('units_layer1', min_value=32, max_value=512, step=32), 
                    activation='relu', 
                    input_shape=(10,)))
    
    # Adding dropout layer and tuning the dropout rate to control overfitting
    model.add(Dropout(rate=hp.Float('dropout_rate', min_value=0.1, max_value=0.5, step=0.1)))

    # Adding a second dense layer with the same variable units as above
    model.add(Dense(units=hp.Int('units', min_value=32, max_value=512, step=32), 
                    activation=hp.Choice('activation', ['relu', 'tanh', 'sigmoid'])))
    
    # Conditional hyperparameters to add or remove layers
    for i in range(hp.Int('num_additional_layers', 1, 3)):
        model.add(Dense(units=hp.Int('units_' + str(i), min_value=2, max_value=10, step=2), 
                        activation='relu'))
    
    # Adding the output layer with a single unit for regression
    model.add(Dense(1, activation='linear'))

    # Compiling the model with a tunable optimizer and learning rate
    model.compile(optimizer=Adam(hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='LOG')), 
                  loss='mean_squared_error')
    return model

**Explanation of general syntax and concepts**
- **Model-building function**:
    ```python
   def build_model(hp):
       # Create and configure the model using 'hp' for hyperparameters
       return compiled_model
   ```
   
    This function is called by the tuner during the search process. It receives a `HyperParameters` object (`hp`) that provides methods to define and manage which hyperparameters should be tuned. This function returns a compiled model that can be trained.
- **Defining integer hyperparameters** (`hp.Int('name', min_value, max_value, step)`) - Allows the tuner to explore different integer values for a hyperparameter.
    - **Parameters:** 
        - `'name'`: Unique identifier for the hyperparameter.
        - `min_value`: Minimum value for the hyperparameter.
        - `max_value`: Maximum value.
        - `step`: Incremental step between values.
    - For example, `hp.Int('units_layer1', min_value=32, max_value=512, step=32)` defines an integer hyperparameter named `'units_layer1'`. The range for this hyperparameter is from 32 to 512, in steps of 32. This enables the tuner to test different layer sizes (numbers of neurons in the layer).
- **Defining floating-point hyperparameters** (`hp.Float('name', min_value, max_value, step)`) - Allows exploration of continuous values within a specified range.
    - **Parameters:** 
        - `'name'`: Identifier for the hyperparameter.
        - `min_value`: Minimum floating-point value.
        - `max_value`: Maximum value.
        - `step`: Incremental step for values.
    - For example, `hp.Float('dropout_rate', min_value=0.1, max_value=0.5, step=0.1)` defines a floating-point hyperparameter for the dropout rate. This range and step size allow the tuner to explore various levels of dropout.
- **Choice hyperparameters** (`hp.Choice('name', ['option1', 'option2', ...])`) - Enables selection from a predefined set of options.
    - **Parameters:** 
        - `'name'`: Name of the hyperparameter.
        - A list of possible values (e.g., activation functions).
    - For example, `hp.Choice('activation_layer1', ['relu', 'tanh', 'sigmoid'])` defines a choice hyperparameter, allowing the tuner to select the activation function for the layer from a list of options.
- **Conditional hyperparameters**:
    ```python
   if hp.Boolean('condition'):
       # Conditional code based on boolean hyperparameter
   ```
   
   It allows dynamic changes in the model architecture based on the value of other hyperparameters. For example, adding or removing layers based on a boolean condition.
   - For example, the loop with `hp.Int('num_additional_layers', 1, 3)` defines a conditional integer hyperparameter that determines how many additional layers are added. This allows for dynamic changes in model architecture based on other hyperparameter values.
- **Sampling on a logarithmic scale:** - `sampling='LOG'` - Specifies that the learning rate values should be sampled on a logarithmic scale. This allows for more efficient exploration of a wide range of values, especially useful for learning rates that can span several orders of magnitude.

### Step 2: Initialize the tuner
Choose a tuning strategy and initialize the tuner. Each tuner has different strengths and use cases.

In [4]:
# Random search tuner
tuner_random = RandomSearch(
    build_model,
    objective='val_loss',
    max_trials=5,  # Maximum number of hyperparameter combinations to try
    executions_per_trial=3,  # Number of times to train each model for reliability
    directory='tuning_dir',
    project_name='random_search')

# Bayesian optimization tuner
tuner_bayesian = BayesianOptimization(
    build_model,
    objective='val_loss',
    max_trials=5,
    executions_per_trial=2,
    directory='tuning_dir',
    project_name='bayesian_optimization')

# Hyperband tuner
tuner_hyperband = Hyperband(
    build_model,
    objective='val_loss',
    max_epochs=5,  # Maximum number of epochs for training
    factor=3,  # Factor to reduce the number of trials
    directory='tuning_dir',
    project_name='hyperband')

**Explanation of general syntax and concepts**

1. **Tuner initialization:**
   ```python
   tuner = TunerClass(
       build_model, 
       objective='objective_metric',
       max_trials=number_of_trials,
       executions_per_trial=number_of_executions,
       directory='directory_name',
       project_name='project_name'
   )
   ```
   - **Purpose:** Initializes a tuner with a specific strategy to find the best hyperparameters.
   - **Parameters:**
     - `TunerClass`: The specific tuner class to use (e.g., `RandomSearch`, `BayesianOptimization`, `Hyperband`).
     - `build_model`: The model-building function defined in Step 1.
     - `objective`: The metric to optimize, typically a validation loss or accuracy.
     - `max_trials`: The maximum number of hyperparameter combinations to try.
     - `executions_per_trial`: The number of times each model configuration is trained to ensure reliability.
     - `directory`: The directory where tuning results are saved.
     - `project_name`: A name for the current tuning project, used to distinguish results.

2. **Tuning strategies:**
   - **Random search:** `RandomSearch(build_model, ...)`
   - **Bayesian optimization:** `BayesianOptimization(build_model, ...)`
   - **Hyperband:** `Hyperband(build_model, max_epochs=number_of_epochs, factor=reduction_factor, ...)`
     - **Parameters:**
       - `max_epochs`: The maximum number of epochs for training.
       - `factor`: Reduction factor for the number of trials per round, keeping only the top-performing configurations.
       
### Step 3: Search for the best hyperparameters
Initiate the search process with the tuner, specifying the training data and any other necessary parameters. The `search` method is similar to `model.fit` in that it trains a model on the given data.

In [5]:
print("Search process for random search tuner:")
tuner_random.search(X_train, y_train, epochs=10, validation_split=0.2)

Trial 5 Complete [00h 00m 07s]
val_loss: 0.09554338206847508

Best val_loss So Far: 0.09554338206847508
Total elapsed time: 00h 00m 43s


In [6]:
print("Search process for Bayesian optimization tuner:")
tuner_bayesian.search(X_train, y_train, epochs=10, validation_split=0.2)

Trial 5 Complete [00h 00m 07s]
val_loss: 0.3189740628004074

Best val_loss So Far: 0.08553072065114975
Total elapsed time: 00h 00m 31s


In [7]:
print("Search process for hyperband tuner:")
tuner_hyperband.search(X_train, y_train, epochs=10, validation_split=0.2)

Trial 10 Complete [00h 00m 02s]
val_loss: 0.11265049874782562

Best val_loss So Far: 0.08507603406906128
Total elapsed time: 00h 00m 24s


**Explanation of general syntax and concepts**

- **Search method** (`tuner.search(X, y, epochs=number_of_epochs, validation_split=split_fraction)`) - Starts the hyperparameter search using the defined model and search space.
   - **Parameters:**
     - `X`: The input features for training the model.
     - `y`: The target values for the model to learn.
     - `epochs`: The number of epochs each model will be trained for during the search.
     - `validation_split`: Fraction of the training data to be used as validation data. It helps evaluate model performance on unseen data.

### Step 4: Get the best model
Now, We can extract the best-performing models and their corresponding hyperparameters. After retrieving the best model, we can further train it on more data, evaluate its performance on a test set, or deploy it for predictions.

In [8]:
# Random search best model
best_model_random = tuner_random.get_best_models(num_models=1)[0]
best_hp_random = tuner_random.get_best_hyperparameters()[0]
print(f"Best hyperparameters from random search: {best_hp_random.values}")

# Bayesian optimization best model
best_model_bayesian = tuner_bayesian.get_best_models(num_models=1)[0]
best_hp_bayesian = tuner_bayesian.get_best_hyperparameters()[0]
print(f"\nBest hyperparameters from Bayesian optimization: {best_hp_bayesian.values}")

# Hyperband search best model
best_model_hyperband = tuner_hyperband.get_best_models(num_models=1)[0]
best_hp_hyperband = tuner_hyperband.get_best_hyperparameters()[0]
print(f"\nBest hyperparameters from hyperband search: {best_hp_hyperband.values}")

Best hyperparameters from random search: {'units_layer1': 256, 'dropout_rate': 0.4, 'units': 128, 'activation': 'relu', 'num_additional_layers': 1, 'units_0': 4, 'learning_rate': 0.0012052065865684864, 'units_1': 4, 'units_2': 10}

Best hyperparameters from Bayesian optimization: {'units_layer1': 448, 'dropout_rate': 0.1, 'units': 320, 'activation': 'tanh', 'num_additional_layers': 2, 'units_0': 6, 'learning_rate': 0.0005437534080408972, 'units_1': 8}

Best hyperparameters from hyperband search: {'units_layer1': 96, 'dropout_rate': 0.2, 'units': 224, 'activation': 'relu', 'num_additional_layers': 3, 'units_0': 6, 'learning_rate': 0.007815055091465812, 'units_1': 4, 'units_2': 10, 'tuner/epochs': 5, 'tuner/initial_epoch': 2, 'tuner/bracket': 1, 'tuner/round': 1, 'tuner/trial_id': '0002'}


**Explanation of general syntax and concepts**

1. **Get best model:** (`best_models = tuner.get_best_models(num_models=n)`) - Retrieves the top `n` models with the best performance based on the objective metric.
    - `num_models`: The number of top models to retrieve.
2. **Get best hyperparameters:**
   `best_hyperparameters = tuner.get_best_hyperparameters(num_trials=n)` - Obtains the hyperparameters corresponding to the top-performing models.
    - `num_trials`: The number of top hyperparameter sets to retrieve.