# Deep learning example
* Author: Rodrigo Ventura
* Date: June 2024

This notebook shows an example of using a neural network to train a classifier for the [MNIST digit dataset](https://en.wikipedia.org/wiki/MNIST_database). The neural network is implented using the [Keras package](https://keras.io/). This example is based on [this](https://keras.io/examples/vision/image_classification_from_scratch/) and [this](https://keras.io/examples/vision/mnist_convnet/) tutorial from Keras.

## Import required packages

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt

# Default seems to be tensorflow on macOS
# To change, set:
#os.environ["KERAS_BACKEND"] = "jax"
# Note that Keras should only be imported after the backend
# has been configured. The backend cannot be changed once the
# package is imported.

import keras

## Load the MNIST dataset and prepare the data

1. load the dataset
2. normalize pixels to the [0,1] interval
3. reshape dataset to arrays [index, height, width, channel=1]

In [None]:
# Load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

## Show a few images from the dataset

Each image is labeled with the true class (i.e., ground truth).

In [None]:
plt.figure(figsize=(10, 10))
for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(np.array(x_train[i,:,:,0]), cmap="Greys_r")
        plt.title(int(y_train[i]))
        plt.axis("off")

## Create neural network model

First, setup basic parameters:
* number of classes (0, ..., 9)
* shape of each data point (height, width, channels=1)

In [None]:
# Model parameters
num_classes = 10
input_shape = (28, 28, 1)

Below there are a few models to play with. Check [this page](https://keras.io/api/layers/) for documentation on the repertoire of layers.

This is the original model from the Keras tutorial:

In [None]:
model = keras.Sequential(
    [
        keras.layers.Input(shape=input_shape),
        keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        keras.layers.MaxPooling2D(pool_size=(2, 2)),
        keras.layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
        keras.layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
        keras.layers.GlobalAveragePooling2D(),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(num_classes, activation="softmax"),
    ]
)

This is a perception for each class, the simplest neural network:

In [None]:
model = keras.Sequential(
    [
        keras.layers.Input(shape=input_shape),
        keras.layers.Flatten(),
        keras.layers.Dense(num_classes, activation="softmax"),
    ]
)

## Show structure of the created model

In [None]:
model.summary()

## Trainning of the model

First, compile the model, defining:
1. the loss function (categorical cross-entropy)
2. the optimizer (ADAM)
3. the performance metric (categorical accuracy)

In [None]:
model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(),
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    metrics=[
        keras.metrics.SparseCategoricalAccuracy(name="acc"),
    ],
)

Next, perform the training itself, configured by:
1. batch size -- how many data points are used for each gradient step)
2. number of epochs -- how many epochs (an epoch means going through all data points)

In [None]:
batch_size = 128
epochs = 20

callbacks = [
    keras.callbacks.ModelCheckpoint(filepath="model_at_epoch_{epoch}.keras"),
    keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
]

model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_split=0.15,
    callbacks=callbacks,
)

## Load/save model

(optional) Use one of the cells below to save or to load the model to/from disk.

In [None]:
model.save("final_model.keras")

In [None]:
model = keras.saving.load_model("final_model.keras")

## Evaluate the performance of the model on the test set

* the value of the loss function on the test set
* the accuracy on the test set (1=100)

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

## Show the classification of a few test data points

In [None]:
plt.figure(figsize=(10, 10))
predictions = model.predict(x_test[0:9])
y = predictions.argmax(axis=1)
for (i,p) in zip(range(9),y):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(np.array(x_test[i,:,:,0]))
        plt.title(f"true={y_test[i]} pred={p}")
        plt.axis("off")

This finishes the example. Now go back and play with different network architectures. Have fun!