# Aprendendo Tensorflow
Esse notebook visa abordar dois assuntos em específico do Tensorflow:
- Diferentes tuners de hiperparâmetros
- Diferentes tipos de Callbacks e suas utilidades

---
## Bibliotecas  e dados a serem utilizados
---

In [2]:
import tensorflow as tf
from tensorflow import keras

import keras_tuner as kt

Using TensorFlow backend


In [3]:
# Usando o dataset Fashion MNIST
(img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()

# Normalize pixel values between 0 and 1
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0

---
## Tuners
---

In [6]:
def model_builder(hp):
  model = keras.Sequential()
  model.add(keras.layers.Flatten(input_shape=(28, 28)))

  # Tune the number of units in the first Dense layer
  # Choose an optimal value between 32-512
  hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
  model.add(keras.layers.Dense(units=hp_units, activation='relu'))
  model.add(keras.layers.Dense(10))

  # Tune the learning rate for the optimizer
  # Choose an optimal value from 0.01, 0.001, or 0.0001
  hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

  model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])

  return model

### Grid Search
O Grid Search padrão. Contempla todas as combinações de hiperparâmetros definidas pelo usuário.

In [20]:
tuner = kt.GridSearch(model_builder,
                      objective='val_accuracy',
                      max_trials=5,
                      directory='grid_search',
                      project_name='intro_to_kt')

stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

In [21]:
tuner.search(img_train, label_train, epochs=5, validation_split=0.2, callbacks=[stop_early])

# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")

Trial 5 Complete [00h 00m 20s]
val_accuracy: 0.8756666779518127

Best val_accuracy So Far: 0.8756666779518127
Total elapsed time: 00h 01m 44s

The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 64 and the optimal learning rate for the optimizer
is 0.001.



### Hyperband
O Hyperband é um algoritmo de busca de hiperparâmetros que utiliza o conceito de _bandas_ para otimizar o processo de busca. O algoritmo é baseado no conceito de _successive halving_, que consiste em treinar um modelo com um número de épocas e, a cada iteração, descartar os piores modelos e treinar os melhores com um número de épocas maior. O Hyperband utiliza esse conceito para treinar diversos modelos com diferentes números de épocas e descartar os piores, até que reste apenas um modelo. O número de modelos treinados é definido pelo parâmetro `max_epochs`, que define o número máximo de épocas que um modelo pode ser treinado. 

In [13]:
tuner = kt.Hyperband(model_builder,
                     objective='val_accuracy',
                     max_epochs=5,
                     factor=3,
                     directory='hyperband',
                     project_name='tuner_tests')

In [14]:
tuner.search(img_train, label_train, epochs=5, validation_split=0.2, callbacks=[stop_early])

# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")

Trial 10 Complete [00h 00m 22s]
val_accuracy: 0.8511666655540466

Best val_accuracy So Far: 0.8770833611488342
Total elapsed time: 00h 03m 53s

The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 192 and the optimal learning rate for the optimizer
is 0.001.



### RandomSearch
O RandomSearch é um algoritmo de busca de hiperparâmetros que utiliza o conceito de busca aleatória para otimizar o processo de busca. O algoritmo consiste em treinar diversos modelos com diferentes hiperparâmetros, escolhidos aleatoriamente dentro de um intervalo definido pelo usuário. O número de modelos treinados é definido pelo parâmetro `max_trials`, que define o número máximo de modelos que serão treinados.

In [15]:
tuner = kt.RandomSearch(
    model_builder,
    objective='val_accuracy',
    max_trials=5,
    directory='random_search',
    project_name='tuner_tests')

In [17]:
tuner.search(img_train, label_train, epochs=5, validation_split=0.2, callbacks=[stop_early])

best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")

Trial 5 Complete [00h 00m 31s]
val_accuracy: 0.8778333067893982

Best val_accuracy So Far: 0.8785833120346069
Total elapsed time: 00h 02m 51s

The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 160 and the optimal learning rate for the optimizer
is 0.001.



### Bayesian Optimization
A busca bayesiana é um algoritmo de busca de hiperparâmetros que utiliza o conceito de busca bayesiana para otimizar o processo de busca. O algoritmo consiste em treinar diversos modelos com diferentes hiperparâmetros, escolhidos de acordo com uma distribuição de probabilidade que é atualizada a cada iteração. O número de modelos treinados é definido pelo parâmetro `max_trials`, que define o número máximo de modelos que serão treinados.

In [18]:
tuner = kt.BayesianOptimization(
    model_builder,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=3,
    directory='bayesian_optimization',
    project_name='tuner_tests')

In [19]:
tuner.search(img_train, label_train, epochs=5, validation_split=0.2, callbacks=[stop_early])

best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")

Trial 5 Complete [00h 04m 11s]
val_accuracy: 0.8696944316228231

Best val_accuracy So Far: 0.882111112276713
Total elapsed time: 00h 12m 20s

The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 160 and the optimal learning rate for the optimizer
is 0.001.



### Observações
- Existe ainda uma quinta classe de Tuner, chamada `Sklearn Tuner`. Ele é usado para implementar os tuners em modelos do Sklearn. Como não utilizamos modelos do Sklearn, não iremos abordá-lo aqui.
- Uma confusão comum: o `max_trials` é o parâmetro que define o número máximo de modelos que serão treinados. O `max_epochs` é o parâmetro que define o número máximo de épocas que um modelo pode ser treinado. O `max_trials` é utilizado pelo RandomSearch e pela busca bayesiana, enquanto o `max_epochs` é utilizado pelo Hyperband. 
- Como sugestão: é interessante usar um modelo de Random Search para encontrar espaços de parâmetros interessantes. Depois, um Grid Search será útil se poucos parâmetros forem analisados. Se muitos parâmetros forem analisados, verificar todos por todas as épocas pode ser muito demorado, e o Hyperband será mais eficiente. O Bayesian Optimization é uma alternativa ao Random Search, mas é mais complexo e pode ser mais demorado.

### Construindo o modelo

In [22]:
# Build the model with the optimal hyperparameters and train it on the data for 50 epochs
model = tuner.hypermodel.build(best_hps)
history = model.fit(img_train, label_train, epochs=50, validation_split=0.2)

val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Best epoch: 21


In [None]:
hypermodel = tuner.hypermodel.build(best_hps)

# Retrain the model
hypermodel.fit(img_train, label_train, epochs=best_epoch, validation_split=0.2)

In [None]:
eval_result = hypermodel.evaluate(img_test, label_test)
print("[test loss, test accuracy]:", eval_result)

---
## Callbacks
---

### Tipos diferentes de callbacks
Documentation: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks

`BackupAndRestore`: Callback to back up and restore the training state.

`BaseLogger`: Callback that accumulates epoch averages of metrics.

**`CSVLogger`: Callback that streams epoch results to a CSV file.**

**`Callback`: Abstract base class used to build new callbacks.**

`CallbackList`: Container abstracting a list of callbacks.

**`EarlyStopping`: Stop training when a monitored metric has stopped improving.**

`History`: Callback that records events into a History object.

`LambdaCallback`: Callback for creating simple, custom callbacks on-the-fly.

**`LearningRateScheduler`: Learning rate scheduler.**

**`ModelCheckpoint`: Callback to save the Keras model or model weights at some frequency.**

`ProgbarLogger`: Callback that prints metrics to stdout.

**`ReduceLROnPlateau`: Reduce learning rate when a metric has stopped improving.**

`RemoteMonitor`: Callback used to stream events to a server.

`SidecarEvaluatorModelExport`: Callback to save the best Keras model.

`TensorBoard`: Enable visualizations for TensorBoard.

**`TerminateOnNaN`: Callback that terminates training when a NaN loss is encountered.**


*Os callbacks em negrito são os que serão abordados aqui.*  
*Os que não estão em negrito não abordam assuntos relevantes para nossos problemas - `RemoteMonitor` - abordam assuntos muito específicos que eu não domino - `TensorBoard` - ou não funcionam na versão mais atual do Tensorflow - `ProgbarLogger`.*

In [4]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10)
])
optmizer = keras.optimizers.Adam(learning_rate=0.01)
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer=optmizer,
              loss=loss,
              metrics=['accuracy'])

### Callback: CSVLogger

In [None]:
csv_logger = tf.keras.callbacks.CSVLogger(
    'csvlogger_model.csv', separator=',', append=False
)

In [None]:
model.fit(img_train, label_train, epochs=10, validation_split=0.2, callbacks=[csv_logger])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x26190989d10>

### Callback: Callback

In [11]:
# Create an example callback that prints the epoch number after each epoch over the total number of epochs.
class PrintEpochCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs=None):
    print(f'Epoch {epoch} finished', end='\r')

print_epoch_callback = PrintEpochCallback()

In [12]:
model.fit(img_train, label_train, epochs=10, validation_split=0.2, callbacks=[print_epoch_callback], verbose=0)

Epoch 9 finished

<keras.src.callbacks.History at 0x22de639b510>

### Callback: EarlyStopping

In [21]:
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

In [27]:
model.fit(img_train, label_train, epochs=1000, validation_split=0.2, callbacks=[early_stopping])

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000


<keras.src.callbacks.History at 0x2619396fe90>

### Callback: LearningRateScheduler

In [30]:
learning_rate_shceduler = tf.keras.callbacks.LearningRateScheduler(
    lambda epoch: 1 * 10**(epoch / 20),
    verbose=1
)

In [32]:
model.fit(img_train, label_train, epochs=10, validation_split=0.2, callbacks=[learning_rate_shceduler])


Epoch 1: LearningRateScheduler setting learning rate to 1e-08.
Epoch 1/10

Epoch 2: LearningRateScheduler setting learning rate to 1.1220184543019634e-08.
Epoch 2/10

Epoch 3: LearningRateScheduler setting learning rate to 1.2589254117941673e-08.
Epoch 3/10

Epoch 4: LearningRateScheduler setting learning rate to 1.4125375446227544e-08.
Epoch 4/10

Epoch 5: LearningRateScheduler setting learning rate to 1.5848931924611136e-08.
Epoch 5/10

Epoch 6: LearningRateScheduler setting learning rate to 1.7782794100389228e-08.
Epoch 6/10

Epoch 7: LearningRateScheduler setting learning rate to 1.9952623149688796e-08.
Epoch 7/10

Epoch 8: LearningRateScheduler setting learning rate to 2.2387211385683395e-08.
Epoch 8/10

Epoch 9: LearningRateScheduler setting learning rate to 2.51188643150958e-08.
Epoch 9/10

Epoch 10: LearningRateScheduler setting learning rate to 2.8183829312644537e-08.
Epoch 10/10


<keras.src.callbacks.History at 0x26195fab7d0>

### Callback: ModelCheckpoint

In [33]:
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    filepath='model_checkpoint',
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True
)

In [34]:
model.fit(img_train, label_train, epochs=10, validation_split=0.2, callbacks=[model_checkpoint])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x261986a2a10>

### Callback: ReduceLROnPlateau

In [54]:
reduce_LR_on_plateau = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_accuracy',
    factor=0.5,
    patience=2,
    min_lr=0.0001
)

In [56]:
model.fit(img_train, label_train, epochs=10, validation_split=0.2, callbacks=[reduce_LR_on_plateau])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x2619a929d10>

### Callback: TerminateOnNaN

In [57]:
# Encerra o treinamento se uma loss igual a NaN for encontrada
terminate_on_nan = tf.keras.callbacks.TerminateOnNaN()

In [68]:
# Faz uma custom loss que retorna NaN
class CustomLoss(tf.keras.losses.Loss):
  def call(self, y_true, y_pred):
    return tf.math.log(y_pred)
  
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10)
])
optmizer = keras.optimizers.Adam(learning_rate=0.01)
loss = CustomLoss()

model.compile(optimizer=optmizer,
                loss=loss,
                metrics=['accuracy'])

In [69]:
model.fit(img_train, label_train, epochs=10, validation_split=0.2, callbacks=[terminate_on_nan])

Epoch 1/10
Batch 0: Invalid loss, terminating training


<keras.src.callbacks.History at 0x261d8f4da50>