**Group 22**

Name  | Surname | Email  
---------|-------------------|---------
Julio|Vigueras|20220661@novaims.unl.pt 
Ariel|Pérez|20220662@novaims.unl.pt
Miguelanguel|Mayuare|20220665@novaims.unl.pt
Ayotunde|Aribo|20221012@novaims.unl.pt

# Hyper-parameters Tuning
-----

In [None]:
# Make the imports
from tensorflow import keras
from tensorflow.keras import layers, initializers
from tensorflow.keras.models import load_model
from tensorflow.keras.utils import image_dataset_from_directory

import plotly.express as px
from plotly.subplots import make_subplots
import pandas as pd
import pathlib

import keras_tuner as kt

For the search of hyper-parameters, bayesian optimization was used

Bayesian optimization is a technique for optimizing hyperparameters of a CNN by modeling the objective function as a Gaussian process. It samples new hyperparameters based on this model to guide the search for the optimal values. It balances exploration and exploitation of the hyperparameter space effectively and can lead to finding better hyperparameters in fewer iterations.

*Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951-2959).*

While keras tuner provides several algorithms for the task, we had to choose one based on:

Bayesian optimization is particularly useful for optimizing hyperparameters of a CNN because it can balance exploration and exploitation of the hyperparameter space effectively, which can lead to finding better hyperparameters in fewer iterations and it can take into account previous observations to update the model of the objective function, making it more efficient and adaptive to the specific problem.

The latter statement doesn't mean that other algorithms were worse but under our research and with the limited time for the project we had to chose one.

The model tuned, while having the same performance, is less complex and trains faster than D, that makes it a better model.

In [None]:
def model_builder(hp):
    blocks = hp.Int('blocks', min_value=3, max_value=4, step=1)
    data_augmentation = keras.Sequential([
        layers.RandomRotation(hp.Float('rotation', min_value=0.05, max_value=0.2, step=0.05)),
        layers.RandomFlip(),
        layers.RandomContrast(hp.Float('contrast', min_value=0.05, max_value=0.2, step=0.05)),
        layers.RandomBrightness(hp.Float('brightness', min_value=0.05, max_value=0.2, step=0.05)),
        layers.RandomZoom(hp.Float('zoom', min_value=0, max_value=0.2, step=0.05)),
    ])

    inputs = keras.Input(shape=(224, 224, 3))
    x = data_augmentation(inputs)
    x = layers.Rescaling(1./255)(x)
    for i in range(blocks + 1, blocks + 5):
        x = layers.Conv2D(filters=2**i, kernel_size=3,
                          kernel_initializer=initializers.GlorotNormal(seed=123), 
                          activation="relu")(x)
        x = layers.Conv2D(filters=2**i, kernel_size=3, use_bias=False,
                          kernel_initializer=initializers.GlorotNormal(seed=123))(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation("relu")(x)
        x = layers.MaxPooling2D(pool_size=2, strides=2)(x)
        x = layers.Dropout(hp.Float('dropout', min_value=0, max_value=0.5, step=0.1))(x)
    x = layers.Conv2D(filters=256, kernel_size=3, use_bias=False)(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.Flatten()(x)
    x = layers.Dropout(hp.Float('dropout', min_value=0, max_value=0.5, step=0.1))(x)
    outputs = layers.Dense(30, activation="softmax")(x)
    model = keras.Model(inputs=inputs, outputs=outputs)

    learning_rate = hp.Choice('learning_rate', values=[0.0001, 0.001, 0.01])
    model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
              metrics=["accuracy"])
    return model


The next code uses the BayesianOptimization class of the keras-tuner package to fit hyperparameters of a machine learning model.

**model_builder** is the function that defines the machine learning model to be tuned. The function takes an argument hp, which is an object used to define the possible values of the hyperparameters and their ranges.

**objective** is the objective to be optimized. In this case, you want to minimize the validation loss (val_loss).

**max_trials** is the maximum number of times the model is evaluated with different hyperparameter values.

**overwrite** is a boolean value indicating whether to overwrite the previous hyperparameter search results or not. If set to True, the previous results will be overwritten.

In [None]:
tuner = kt.BayesianOptimization(
            model_builder,
            objective='val_loss',
            max_trials=15,
            overwrite=True)

In [None]:
dataset_path = pathlib.Path("moths")
input_shape = (224,224,3)
batch_size=64

In [None]:
# Split dataset
train_dataset = image_dataset_from_directory(
    dataset_path / "train",
    image_size=input_shape[:2],
    batch_size=batch_size)
validation_dataset = image_dataset_from_directory(
    dataset_path / "valid",
    image_size=input_shape[:2],
    batch_size=batch_size)
test_dataset = image_dataset_from_directory(
    dataset_path / "test",
    image_size=input_shape[:2],
    batch_size=batch_size)

Found 3558 files belonging to 30 classes.
Found 445 files belonging to 30 classes.
Found 408 files belonging to 30 classes.


In [None]:
tuner.search(x=train_dataset, epochs=20, validation_data=validation_dataset)

best_model = tuner.get_best_models(num_models=1)[0]
best_hyperparameters = tuner.get_best_hyperparameters(num_trials=1)[0]

Trial 15 Complete [00h 02m 19s]
val_loss: 0.7281265258789062

Best val_loss So Far: 0.7281265258789062
Total elapsed time: 00h 42m 52s
INFO:tensorflow:Oracle triggered exit


In [None]:
tuner.results_summary()

Results summary
Results in ./untitled_project
Showing 10 best trials
Objective(name="val_loss", direction="min")

Trial 14 summary
Hyperparameters:
blocks: 3
rotation: 0.1
contrast: 0.15000000000000002
brightness: 0.15000000000000002
zoom: 0.2
dropout: 0.2
learning_rate: 0.001
Score: 0.7281265258789062

Trial 10 summary
Hyperparameters:
blocks: 3
rotation: 0.15000000000000002
contrast: 0.2
brightness: 0.1
zoom: 0.1
dropout: 0.2
learning_rate: 0.001
Score: 0.8204528093338013

Trial 07 summary
Hyperparameters:
blocks: 3
rotation: 0.1
contrast: 0.2
brightness: 0.15000000000000002
zoom: 0.2
dropout: 0.1
learning_rate: 0.001
Score: 0.8333144187927246

Trial 13 summary
Hyperparameters:
blocks: 4
rotation: 0.05
contrast: 0.2
brightness: 0.2
zoom: 0.05
dropout: 0.0
learning_rate: 0.0001
Score: 0.8442155122756958

Trial 02 summary
Hyperparameters:
blocks: 4
rotation: 0.05
contrast: 0.15000000000000002
brightness: 0.1
zoom: 0.15000000000000002
dropout: 0.2
learning_rate: 0.001
Score: 0.974988281

In [None]:
best_model.build(input_shape=(224, 224, 3))
best_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 sequential (Sequential)     (None, 224, 224, 3)       0         
                                                                 
 rescaling (Rescaling)       (None, 224, 224, 3)       0         
                                                                 
 conv2d (Conv2D)             (None, 222, 222, 16)      448       
                                                                 
 conv2d_1 (Conv2D)           (None, 220, 220, 16)      2304      
                                                                 
 batch_normalization (BatchN  (None, 220, 220, 16)     64        
 ormalization)                                                   
                                                             

In [None]:
model_best_hps = model_builder(best_hyperparameters)

In [None]:
# Callbacks and train model

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="saved_models/model_tuned.keras",
        save_best_only=True,
        monitor="val_loss"
    ),
    keras.callbacks.EarlyStopping(
        patience=30,
        monitor='val_loss'
    )
]

history = model_best_hps.fit(
                train_dataset,
                epochs=200,
                batch_size=64,
                validation_data=validation_dataset,
                callbacks=callbacks
)

Epoch 1/200


2023-04-07 06:37:47.745564: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inmodel_1/dropout_5/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer


Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78/200
Epoch 7

In [None]:
# Visualization
hist_df = pd.DataFrame(history.history)
loss = px.scatter(hist_df['loss'])
val_loss = px.line(hist_df['val_loss'])
accuracy = px.scatter(hist_df['accuracy'])
val_accuracy = px.line(hist_df['val_accuracy'])

fig = make_subplots(cols=2, rows=1, subplot_titles=("Loss", "Accuracy"))
fig.add_trace(loss.data[0], col=1, row=1)
fig.add_trace(val_loss.data[0], col=1, row=1)
fig.add_trace(accuracy.data[0], col=2, row=1)
fig.add_trace(val_accuracy.data[0], col=2, row=1)
fig.update_layout(height=600)

fig.show()

![Accuracy and loss](https://www.dropbox.com/s/j0obaviyp8si4pb/hypersearch.png?raw=1)

We compare against C and E because C performed well having a simple architecture and E because it performed better.

In [None]:
# Load models

model_C = load_model("saved_models/model_handcrafted_C.keras")
model_D = load_model("saved_models/model_handcrafted_D.keras")
model_tuned = load_model("saved_models/model_tuned.keras")


_, model_C_acc = model_C.evaluate(test_dataset)
_, model_D_acc = model_E.evaluate(test_dataset)
_, model_tuned_acc = model_tuned.evaluate(test_dataset)

print(
    f"Model C: {model_C_acc * 100:.2f}% of accuracy\n"
    f"Model E: {model_D_acc * 100:.2f}% of accuracy\n"
    f"Model tuned: {model_tuned_acc * 100:.2f}% of accuracy"
      )

Model C: 82.60% of accuracy
Model E: 86.27% of accuracy
Model tuned: 85.29% of accuracy


Model tuned, while having the same performance, is less complex and trains faster than D, that makes it a better model.