# Call me back

... aka *keras callbacks* ...

Keras provides a powerfull mechanism for implementing callbacks, i.e., a function that must be called at a specific point of the training process. We can write our own callbacks (in an another meetup), or use some of the predefined keras callbacks.

Get some data and define a model!

In [2]:
import keras
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Activation, MaxPool2D, Conv2D, Flatten
from keras.optimizers import Adam


# Prepare the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape((-1, 28, 28, 1)) / 255.0
x_test = x_test.reshape((-1, 28, 28, 1)) / 255.0
y_train = keras.utils.to_categorical(y_train, num_classes=10)
y_test = keras.utils.to_categorical(y_test, num_classes=10)

model = Sequential()
model.add(Conv2D(64, kernel_size=(5, 5), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Conv2D(128, (5, 5), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Let's define some callbacks:

In [3]:
!mkdir -p models
!mkdir -p logs

In [10]:
from keras.callbacks import EarlyStopping, ModelCheckpoint, TensorBoard

# Early Stopping automatically stops the training when a quantity has stopped improving
# If you use the test set as the validation set, you are cheating!!!
stopper = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=2, verbose=1, mode='min')

# Save the model every 2 epochs!
checkpoint = keras.callbacks.ModelCheckpoint('models/weights.{epoch:02d}.keras', monitor='val_loss', verbose=1, save_best_only=False, 
                                             save_weights_only=False)

tensorboard = keras.callbacks.TensorBoard(log_dir='./logs', histogram_freq=1)


In [11]:
results = model.fit(x_train, y_train, batch_size=256, epochs=30,
          verbose=2, validation_data=(x_test, y_test), callbacks=[stopper, checkpoint, tensorboard])


Epoch 1/30


I0000 00:00:1727955727.258520 2367559 service.cc:146] XLA service 0x79b838006160 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1727955727.258540 2367559 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 3080 Ti, Compute Capability 8.6
2024-10-03 14:42:07.279975: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-10-03 14:42:07.395611: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907
2024-10-03 14:42:07.429086: W external/local_xla/xla/service/gpu/nvptx_compiler.cc:762] The NVIDIA driver's CUDA version is 12.2 which is older than the ptxas CUDA version (12.3.107). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibil


Epoch 1: saving model to models/weights.01.keras
235/235 - 9s - 38ms/step - accuracy: 0.9410 - loss: 0.1959 - val_accuracy: 0.9859 - val_loss: 0.0466
Epoch 2/30

Epoch 2: saving model to models/weights.02.keras
235/235 - 1s - 5ms/step - accuracy: 0.9856 - loss: 0.0473 - val_accuracy: 0.9885 - val_loss: 0.0334
Epoch 3/30

Epoch 3: saving model to models/weights.03.keras
235/235 - 1s - 5ms/step - accuracy: 0.9902 - loss: 0.0313 - val_accuracy: 0.9895 - val_loss: 0.0313
Epoch 4/30

Epoch 4: saving model to models/weights.04.keras
235/235 - 1s - 5ms/step - accuracy: 0.9929 - loss: 0.0229 - val_accuracy: 0.9911 - val_loss: 0.0269
Epoch 5/30

Epoch 5: saving model to models/weights.05.keras
235/235 - 1s - 5ms/step - accuracy: 0.9948 - loss: 0.0163 - val_accuracy: 0.9914 - val_loss: 0.0253
Epoch 6/30

Epoch 6: saving model to models/weights.06.keras
235/235 - 1s - 5ms/step - accuracy: 0.9958 - loss: 0.0138 - val_accuracy: 0.9862 - val_loss: 0.0484
Epoch 7/30

Epoch 7: saving model to models/

In [10]:
!ls models

weights.02-0.03.hdf5,  weights.04-0.02.hdf5,
weights.02-0.04.hdf5,  weights.06-0.02.hdf5,


Start the tensorboard!



In [None]:
!python3 -m tensorboard.main --logdir logs

2024-10-03 14:43:21.925507: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-03 14:43:21.939401: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-03 14:43:21.943599: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-03 14:43:21.954826: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
I0000 00:00:1727955803.273935 2368269 cuda_executor.c

![alt text](tensor1.png "Title")
![alt text](tensor2.png "Title")