In [2]:
# necessary imports
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
from tensorflow.keras.datasets import mnist

In [3]:
# define a neural network for Image Classification
def get_mnist_model():
    inputs = keras.Input(shape=(28 * 28,))
    features = keras.layers.Dense(512, activation="relu")(inputs)
    outputs = keras.layers.Dense(10, activation="softmax")(features)

    return keras.Model(inputs, outputs)

In [4]:
# load and pre-process images
(images, labels), (test_images, test_labels) = mnist.load_data()
images = images.reshape((60000, 28 * 28)).astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28)).astype("float32") / 255
train_images, val_images = images[10000:], images[:10000]
train_labels, val_labels = labels[10000:], labels[:10000]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [5]:
train_labels

array([3, 8, 7, ..., 5, 6, 8], dtype=uint8)

Metrics are instruments that can measure the performance of a model during training and validation and are more interpretable than loss fucntions.

Unlike model weights they are not updated by the model using back-propagation.

Keras allows the users to define custom metrics

In [87]:
class RootMeanSquaredError(keras.metrics.Metric):
    def __init__(self, name="rmse", **kwargs):
        super().__init__(name=name, **kwargs)
        self.mse_sum = self.add_weight(name="mse_sum", initializer="zeros")
        self.total_samples = self.add_weight(name="total_samples", initializer="zeros", dtype = tf.int32)

    def update_state(self, y_true, y_pred, sample_weight=None):

        y_true = tf.one_hot(y_true, tf.shape(y_pred)[1])
        mse = tf.reduce_sum(tf.square(y_true - y_pred))
        self.mse_sum.assign_add(mse)
        self.total_samples.assign_add(tf.shape(y_pred)[0])

    def result(self):
        return tf.sqrt(self.mse_sum / tf.cast(self.total_samples, tf.float32))

    def reset_state(self):
        self.mse_sum.assign(0.)
        self.total_samples.assign(0)

`reset_state()`: Called at the start of each epoch to reset metrics.

`result()`:Called at the end of the epoch to compute and return the final metric value based on the accumulated state.

`update_state()`: Called for each batch to update the state with new predictions.



In [91]:
model = get_mnist_model()

model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy", RootMeanSquaredError()])

model.fit(train_images, train_labels, epochs=25,
          validation_data = (val_images, val_labels),
          # earlystopping monitors rmse metric and stops training the model when it stops minimizing for more than 3 epochs
          # model checkpoint saves a model at the end of every epoch. It only overrides a saved model if the validation loss improves
          callbacks = [ keras.callbacks.EarlyStopping(monitor = "rmse", patience = 3, verbose = 1),
                        keras.callbacks.ModelCheckpoint(filepath = "mymodel.h5", verbose = 1, monitor = "val_loss", save_best_only= True)]
          )

Epoch 1/25
Epoch 1: val_loss improved from inf to 0.13382, saving model to mymodel.h5
Epoch 2/25
  35/1563 [..............................] - ETA: 7s - loss: 0.1007 - accuracy: 0.9714 - rmse: 7.3784

  saving_api.save_model(


Epoch 2: val_loss improved from 0.13382 to 0.09437, saving model to mymodel.h5
Epoch 3/25
Epoch 3: val_loss improved from 0.09437 to 0.09247, saving model to mymodel.h5
Epoch 4/25
Epoch 4: val_loss improved from 0.09247 to 0.08646, saving model to mymodel.h5
Epoch 4: early stopping


<keras.src.callbacks.History at 0x786d627e81c0>

While coding the custom metric class, I realized a key distinction between `y_pred.shape[0]` and `tf.shape(y_pred)[0]`, as they often appear to yield the same result. However, there's a subtle yet crucial distinction, especially in the context of TensorFlow and dynamic computation graphs.

**Static vs. Dynamic Shapes**

- **`y_pred.shape[0]`:** This attempts to access the shape of `y_pred` as a static attribute. In simple scenarios, where the shape is known and fixed beforehand, this works fine. It directly returns the size of the first dimension (typically the batch size).
- **`tf.shape(y_pred)[0]`:** This uses TensorFlow's `tf.shape` function. This function is designed to work within TensorFlow's computation graph and can handle both static and *dynamic* shapes. A dynamic shape means the size of a tensor (like your batch size) might not be known until runtime.

In [92]:
test_set_performance = model.evaluate(test_images, test_labels)



In [None]:
# reference from the textbook: Deep learning with Python by Francois Chollet