# Deeplearning LSTM Model Hyperpamaters Tuning
This notebook runs a Bayesian tuning process to find the best hyperparameters for 3 different LSTM models architectures.

In [1]:
import sys
sys.path.append('..')

In [2]:
import numpy as np
import random
import tensorflow
import keras_tuner

from tensorflow import keras
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout, TimeDistributed, RepeatVector
from keras.optimizers import Adam
from lib.read_data import read_and_join_output_file
from lib.deeplearning import get_train_test_datasets,  get_sets_shapes

In [3]:
RANDOM_SEED = 31
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
tensorflow.random.set_seed(RANDOM_SEED)

## Preparing the Dataset
The dataset is prepared as explained in the /ml/deeplearning_training.ipynb notebook. Please refer to it for more details. As a summary:
* The train and test sets are split by Township-Ranges, i.e. some Township-Ranges data are either fully in the train or test set.
* The target value is the value of that variable for 2021
* Data are imputed using a custom pipeline

The resulting train and test sets are of shape [number of Township-Ranges, 7 years (2014-2020), the number of features].
We do not create a validation dataset as we use Keras internal cross-validation mechanism to shuffle the data points (i.e., the Township-Ranges) and keep some for the validation at each training epoch.

In [4]:
test_size=0.15
target_variable="GSE_GWE"
# Load the data from the ETL output files
X = read_and_join_output_file()
X.drop(["SHORTAGE_COUNT"], inplace=True, axis=1)
# Split the input pandas Dataframe into training and test datasets, applies the impute pipeline
# transformation and reshapes the datasets to 3D (samples, time, features) numpy arrays
X_train, X_test, y_train, y_test, _, _, _ = get_train_test_datasets(X, target_variable=target_variable, test_size=test_size, random_seed=RANDOM_SEED)
nb_features = X_train.shape[-1]
get_sets_shapes(X_train, X_test)

Unnamed: 0,nb_items,nb_timestamps,nb_features
training dataset,406,7,80
test dataset,72,7,80


## Hyperparameters Tuning
For each of the 3 LSTM models architectures (from simplest to most complex), we use the Keras BayesianOptimization hyperparameters tuner to estimate the best values for the following hyperparameters:
* the number of units for each *LSTM* or *Dense* unit
* the activation function (*sigmoid*, *tanh*, *relu*) used for all layers, except the output layer which is fixed to a *linear* activation function.
* the learning rate
* the size of the validation dataset
* the batch size
* the number of epochs
## Simple Model Hyper-parameter Tuning
![Simple LSTM Model](../doc/images/lstm_architecture_1.jpg)

In [5]:
class Model1(keras_tuner.HyperModel):
    def build(self, hp):
        model = Sequential()
        hp_units = hp.Int("units", min_value=10, max_value=300, step=10)
        model.add(LSTM(units=hp_units, activation="sigmoid", input_shape=(7, nb_features)))
        model.add(Dense(1, activation="linear"))
        hp_learning_rate = hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])
        model.compile(loss="mse", optimizer=Adam(learning_rate=hp_learning_rate), metrics=[keras.metrics.RootMeanSquaredError()])
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            validation_split=hp.Choice("validation_split", values=[0.05, 0.1, 0.15, 0.2]),
            batch_size=hp.Int("batch_size", min_value=32, max_value=192, step=16),
            epochs=hp.Int("epochs", min_value=30, max_value=500, step=5),
            shuffle=True,
            **kwargs,
        )

In [6]:
stop_early = tensorflow.keras.callbacks.EarlyStopping(monitor='val_root_mean_squared_error', patience=10, verbose=1)
tuner = keras_tuner.BayesianOptimization(Model1(),
                             objective=keras_tuner.Objective("val_root_mean_squared_error", direction="min"),
                             max_trials=250,
                             beta=3.2,
                             seed=RANDOM_SEED,
                             overwrite=True,
                             directory="keras_tuner",
                             project_name="model1_tuner")
tuner.search(X_train, y_train, callbacks=[stop_early])

Trial 250 Complete [00h 00m 07s]
val_root_mean_squared_error: 0.06851499527692795

Best val_root_mean_squared_error So Far: 0.053719695657491684
Total elapsed time: 00h 41m 12s
INFO:tensorflow:Oracle triggered exit


### Best Model Hyperparameters

In [7]:
# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"""
The hyperparameter search is complete.
validation_split: {best_hps.get('validation_split')}
lstm_units: {best_hps.get('units')}
learning_rate: {best_hps.get('learning_rate')}
batch_size: {best_hps.get('batch_size')}
epochs: {best_hps.get('epochs')}
""")


The hyperparameter search is complete.
validation_split: 0.05
lstm_units: 10
learning_rate: 0.01
batch_size: 32
epochs: 500



### Hyperparameters Tuning Summary

In [8]:
tuner.results_summary()

Results summary
Results in keras_tuner\model1_tuner
Showing 10 best trials
<keras_tuner.engine.objective.Objective object at 0x0000017EBF26CE80>
Trial summary
Hyperparameters:
units: 10
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 500
Score: 0.053719695657491684
Trial summary
Hyperparameters:
units: 10
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 500
Score: 0.054562896490097046
Trial summary
Hyperparameters:
units: 10
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 500
Score: 0.05479402840137482
Trial summary
Hyperparameters:
units: 10
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 500
Score: 0.05566919222474098
Trial summary
Hyperparameters:
units: 10
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 30
Score: 0.0573287308216095
Trial summary
Hyperparameters:
units: 10
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 500
Score: 0.05777304619550705
Trial summary
Hyperparamete

###  Results
Running the above BayesianOptimization and other hyperparameters tuning jobs and model tests, the best results on the validation set where obtained with a different set hyperparameters than the ones found by the above BayesianOptimization:
* lstm_units: 160
* lstm_activation: "sigmoid"
* learning_rate: 0.001
* validation_split: 0.1
* batch_size: 128
* epochs: 270
## Model2 Hyper-parameter tuning
![LSTM Model With Dense Layer](../doc/images/lstm_architecture_2.jpg)

In [9]:
class Model2(keras_tuner.HyperModel):
    def build(self, hp):
        model = Sequential()
        lstm_units = hp.Int("lstm_units", min_value=10, max_value=300, step=10)
        model.add(LSTM(units=lstm_units, activation="sigmoid", input_shape=(7, nb_features)))
        dense_units = hp.Int("dense_units", min_value=11, max_value=101, step=2)
        dense_activation = hp.Choice("dense_activation", values=["relu", "tanh", "sigmoid"])
        model.add(Dense(dense_units, activation=dense_activation))
        hp_dropout = hp.Float("dropout_rate", min_value=0.05, max_value=0.25, step=0.05)
        model.add(Dropout(hp_dropout))
        model.add(Dense(1, activation="linear"))
        hp_learning_rate = hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])
        model.compile(loss="mse", optimizer=Adam(learning_rate=hp_learning_rate), metrics=[keras.metrics.RootMeanSquaredError()])
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            validation_split=hp.Choice("validation_split", values=[0.05, 0.1, 0.15, 0.2]),
            batch_size=hp.Int("batch_size", min_value=32, max_value=192, step=16),
            epochs=hp.Int("epochs", min_value=30, max_value=500, step=5),
            shuffle=True,
            **kwargs,
        )

In [10]:
stop_early = tensorflow.keras.callbacks.EarlyStopping(monitor="val_root_mean_squared_error", patience=10, verbose=1)
tuner = keras_tuner.BayesianOptimization(Model2(),
                              objective=keras_tuner.Objective("val_root_mean_squared_error", direction="min"),
                              max_trials=400,
                              beta=3.2,
                              seed=RANDOM_SEED,
                              overwrite=True,
                              directory="keras_tuner",
                              project_name="model2_tuner")
tuner.search(X_train, y_train, callbacks=[stop_early])

Trial 400 Complete [00h 00m 06s]
val_root_mean_squared_error: 0.08698336780071259

Best val_root_mean_squared_error So Far: 0.06801072508096695
Total elapsed time: 03h 21m 41s
INFO:tensorflow:Oracle triggered exit


### Best Model Hyperparameters

In [11]:
# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"""
The hyperparameter search is complete.
validation_split: {best_hps.get('validation_split')}
lstm_units: {best_hps.get('lstm_units')}
dense_units: {best_hps.get('dense_units')}
dense_activation: {best_hps.get('dense_activation')}
dropout_rate: {best_hps.get('dropout_rate')}
learning_rate: {best_hps.get('learning_rate')}
batch_size: {best_hps.get('batch_size')}
epochs: {best_hps.get('epochs')}
""")


The hyperparameter search is complete.
validation_split: 0.05
lstm_units: 10
dense_units: 11
dense_activation: relu
dropout_rate: 0.15000000000000002
learning_rate: 0.01
batch_size: 32
epochs: 30



### Hyperparameters Tuning Summary

In [12]:
tuner.results_summary()

Results summary
Results in keras_tuner\model2_tuner
Showing 10 best trials
<keras_tuner.engine.objective.Objective object at 0x0000017EC57F3FA0>
Trial summary
Hyperparameters:
lstm_units: 10
dense_units: 11
dense_activation: relu
dropout_rate: 0.15000000000000002
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 30
Score: 0.06801072508096695
Trial summary
Hyperparameters:
lstm_units: 10
dense_units: 11
dense_activation: relu
dropout_rate: 0.15000000000000002
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 30
Score: 0.06806003302335739
Trial summary
Hyperparameters:
lstm_units: 10
dense_units: 11
dense_activation: relu
dropout_rate: 0.1
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 30
Score: 0.06849660724401474
Trial summary
Hyperparameters:
lstm_units: 10
dense_units: 11
dense_activation: relu
dropout_rate: 0.15000000000000002
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 30
Score: 0.06947013735771179
Trial summa

### Results
Running the above BayesianOptimization and other hyperparameters tuning jobs and model tests, the best results on the validation set where obtained with a different set hyperparameters than the ones found by the above BayesianOptimization:
* validation_split: 0.1
* lstm_units: 100
* lstm_activation: sigmoid
* dense_units: 11
* dense_activations: tanh
* dropout_rate: 0.1
* learning_rate: 0.0001
* batch_size: 32
* epochs: 200
## Model3 Hyper-parameter tuning
![Encoder-Decoder LSTM Model](../doc/images/lstm_architecture_3.jpg)

In [13]:
class Model3(keras_tuner.HyperModel):
    def build(self, hp):
        model = Sequential()
        lstm_units = hp.Int("lstm_units", min_value=10, max_value=300, step=10)
        model.add(LSTM(units=lstm_units, activation="sigmoid", input_shape=(7, nb_features)))
        model.add(RepeatVector(1))
        lstm_units_2 = hp.Int("2nd_lstm_units", min_value=10, max_value=300, step=10)
        model.add(LSTM(units=lstm_units_2, activation="sigmoid", return_sequences=True))
        dense_units = hp.Int("dense_units", min_value=11, max_value=101, step=2)
        dense_activation = hp.Choice("dense_activation", values=["relu", "tanh", "sigmoid"])
        model.add(TimeDistributed(Dense(dense_units, activation=dense_activation)))
        hp_dropout = hp.Float("dropout_rate", min_value=0.05, max_value=0.25, step=0.05)
        model.add(Dropout(hp_dropout))
        model.add(Dense(1, activation="linear"))
        hp_learning_rate = hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])
        model.compile(loss="mse", optimizer=Adam(learning_rate=hp_learning_rate), metrics=[keras.metrics.RootMeanSquaredError()])
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            validation_split=hp.Choice("validation_split", values=[0.05, 0.1, 0.15, 0.2]),
            batch_size=hp.Int("batch_size", min_value=32, max_value=192, step=16),
            epochs=hp.Int("epochs", min_value=30, max_value=500, step=5),
            shuffle=True,
            **kwargs,
        )

In [14]:
stop_early = tensorflow.keras.callbacks.EarlyStopping(monitor="val_root_mean_squared_error", patience=10, verbose=1)
tuner = keras_tuner.BayesianOptimization(Model3(),
                              objective=keras_tuner.Objective("val_root_mean_squared_error", direction="min"),
                              max_trials=400,
                              beta=3.2,
                              seed=RANDOM_SEED,
                              overwrite=True,
                              directory="keras_tuner",
                              project_name="model3_tuner")
tuner.search(X_train, y_train, callbacks=[stop_early])

Trial 400 Complete [00h 00m 08s]
val_root_mean_squared_error: 0.09001205861568451

Best val_root_mean_squared_error So Far: 0.061044275760650635
Total elapsed time: 03h 25m 35s
INFO:tensorflow:Oracle triggered exit


### Best Model Hyperparameters

In [15]:
# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"""
The hyperparameter search is complete.
validation_split: {best_hps.get('validation_split')}
lstm_units: {best_hps.get('lstm_units')}
2nd_lstm_units: {best_hps.get('2nd_lstm_units')}
dense_units: {best_hps.get('dense_units')}
dense_activation: {best_hps.get('dense_activation')}
dropout_rate: {best_hps.get('dropout_rate')}
learning_rate: {best_hps.get('learning_rate')}
batch_size: {best_hps.get('batch_size')}
epochs: {best_hps.get('epochs')}
""")


The hyperparameter search is complete.
validation_split: 0.05
lstm_units: 300
2nd_lstm_units: 300
dense_units: 11
dense_activation: sigmoid
dropout_rate: 0.05
learning_rate: 0.01
batch_size: 32
epochs: 500



### Hyperparameters Tuning Summary

In [16]:
tuner.results_summary()

Results summary
Results in keras_tuner\model3_tuner
Showing 10 best trials
<keras_tuner.engine.objective.Objective object at 0x0000017EBF298850>
Trial summary
Hyperparameters:
lstm_units: 300
2nd_lstm_units: 300
dense_units: 11
dense_activation: sigmoid
dropout_rate: 0.05
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 500
Score: 0.061044275760650635
Trial summary
Hyperparameters:
lstm_units: 300
2nd_lstm_units: 180
dense_units: 57
dense_activation: sigmoid
dropout_rate: 0.25
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 360
Score: 0.07467430830001831
Trial summary
Hyperparameters:
lstm_units: 300
2nd_lstm_units: 300
dense_units: 11
dense_activation: sigmoid
dropout_rate: 0.25
learning_rate: 0.01
validation_split: 0.05
batch_size: 32
epochs: 155
Score: 0.07538601011037827
Trial summary
Hyperparameters:
lstm_units: 300
2nd_lstm_units: 300
dense_units: 11
dense_activation: sigmoid
dropout_rate: 0.05
learning_rate: 0.01
validation_split: 0.05
batch_si

### Results
Running the above BayesianOptimization and other hyperparameters tuning jobs and model tests, the best results on the validation set where obtained with a different set hyperparameters than the ones found by the above BayesianOptimization:
* validation_split: 0.1
* lstm_units: 300
* lstm_activation: sigmoid
* 2nd_lstm_units: 140
* 2nd_lstm_activation: sigmoid
* dense_units: 21
* dense_activations: tanh
* dropout_rate: 0.2
* learning_rate: 0.001
* batch_size: 32
* epochs: 200