# Train a CNN on the MNIST dataset

The MNIST dataset is a collection of hand-written digits, labelled with their corresponding true digit representation. We want to use a convolutional neural network to recognize hand-written digits. the intuition of the convolutional layer is to slide a set of filters on the input image, each one encoding informtation on a specific feature. These filters will respond to spatial patterns in the image, very much like neurons in the visual cortex.

Furthermore, we will use a specific architecute that alternates convolutional layers to pooling layers, and ends with a fully connected layer with a softmax activation function. 
![Picture title](https://upload.wikimedia.org/wikipedia/commons/thumb/2/27/MnistExamples.png/320px-MnistExamples.png)


## Import libraries and define symbolic constants

### Install wandb to keep track of model performance

In [1]:
!pip install --upgrade wandb
!wandb login WANDB_KEY

You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


### Import and initialize parameters

In [2]:
import tensorflow as tf
import numpy as np
from tensorflow import keras

import wandb
from wandb.keras import WandbCallback

wandb.init(project="digit-recognition-leNet")
#data-related constants
IMG_ROWS, IMG_COLUMNS = 28, 28
INPUT_SHAPE = (IMG_ROWS, IMG_COLUMNS, 1)
NB_CLASSES = 10 # we have 10 classes in our dataset, hence 10 neurons in the last layer
VERBOSE = 1 # make it loud

#hyperparameters
EPOCHS = 20  # this is how many times re-train the model, each time optimizing its weight and biases
BATCH_SIZE = 128 # this is the number of instances we take from the training set before running the optimizer
N_FILTERS1 = 20 #filters of the first convolution
N_FILTERS2 = 80 #filters of the second convolution
FILTER_SHAPE = (5,5) # shape of the filters in the convolutional layers
POOL_SHAPE = (2,2) #shape of the pooling filters in the maxpooling layers
POOL_STRIDES = (2,2) # strides of the pooling process 
N_DENSE = 800 #neurons in the dense layer before the softmax
VALIDATION_SPLIT = 0.90 #leave 90% of the training set out for validation (accuracy, to avoid overfitting)
ACTIVATION_FUNCTION_HIDDEN = 'relu' # activation function for the hidden layers
ACTIVATION_FUNCTION_FINAL = 'softmax' # activation function for the output layer 
OPTIMIZER = 'adam' # optimizer, this is how we search for the minimum in the loss function
LOSS_FUNCTION = 'categorical_crossentropy' #loss function, this is what is otimized
METRICS = ['accuracy'] #Our metrics, used to make sure we don't overfit. Computed also on the test set 

wandb.config = {
  "epochs": EPOCHS,
  "batch_size": BATCH_SIZE, 
  "n_hidden": N_DENSE,
  "validation_split": VALIDATION_SPLIT,
  'activation_funciton_hidden': ACTIVATION_FUNCTION_HIDDEN,
  'activation_funciton_final': ACTIVATION_FUNCTION_FINAL,
  'optimizer': OPTIMIZER,
  'loss_function': LOSS_FUNCTION,
  'metric': METRICS,
}


2022-11-06 16:30:08.058815: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-06 16:30:08.186737: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-06 16:30:08.192488: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-06 16:30:08.192509: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if yo

## Load demo dataset from Keras


In [3]:
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

## Reshape data and encode labels (one-hot)

In [4]:
X_train = X_train.reshape(60000, IMG_ROWS, IMG_COLUMNS, 1).astype("float32")/255
X_test = X_test.reshape(10000, IMG_ROWS, IMG_COLUMNS, 1).astype("float32")/255

# use a One-hot representaiton for the digits
Y_train = tf.keras.utils.to_categorical(Y_train, NB_CLASSES)
Y_test = tf.keras.utils.to_categorical(Y_test, NB_CLASSES)

## Build the model

- The model alternates two layers of convolution, relu, pooling

- Followed by flattening, a dense layer and a logistic regression (softmax with output N_classes)

In [5]:
class LeNet():
    def build(input_shape, number_of_classes):
        model = tf.keras.models.Sequential()
        # conv -> relu -> pool
        model.add( 
            keras.layers.Convolution2D(#convolution
            N_FILTERS1,#50 neurons
            FILTER_SHAPE,
            activation=ACTIVATION_FUNCTION_HIDDEN,# with relu activation function
            input_shape=input_shape
            )
        )
        model.add(
            keras.layers.MaxPooling2D(
                pool_size=POOL_SHAPE,#pooling
                strides=POOL_STRIDES,
            )
        )
        # conv -> relu -> pool
        model.add( 
            keras.layers.Convolution2D( #convolution
            N_FILTERS2,#more filters in the innermost layer, this is common practive in CNNs
            FILTER_SHAPE,
            activation=ACTIVATION_FUNCTION_HIDDEN,# with relu activation function
            )
        )
        model.add(
            keras.layers.MaxPooling2D(
                pool_size=POOL_SHAPE, #pooling
                strides=POOL_STRIDES,
            )
        )
        # flatten -> relu -> softmax
        model.add(keras.layers.Flatten())
        model.add(
            keras.layers.Dense(
            N_DENSE,
            activation=ACTIVATION_FUNCTION_HIDDEN,
            )
        )
        model.add(#this is the softmax classifier, or logistic regression
            keras.layers.Dense(
            number_of_classes,
            activation=ACTIVATION_FUNCTION_FINAL,
            )
        )
        return model

model = LeNet.build(
    input_shape=INPUT_SHAPE,
    number_of_classes=NB_CLASSES
    )

2022-11-06 16:30:16.535845: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-11-06 16:30:16.535872: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-06 16:30:16.535886: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (p-ba4822a4-198a-4cdb-8280-0ca8d044b999): /proc/driver/nvidia/version does not exist
2022-11-06 16:30:16.536108: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Compile the model

- We use stochastic gradient descent

- The loss function is categorical cross-entropy, this is particularly well-suited for multi-class problems with a one-hot encoding 

- We use accuracy to evaluate the performance of the model

In [6]:
model.compile(
    optimizer=OPTIMIZER,
    loss=LOSS_FUNCTION,
    metrics=METRICS
)

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 24, 24, 20)        520       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 12, 12, 20)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 8, 8, 80)          40080     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 4, 4, 80)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 1280)              0         
                                                                 
 dense (Dense)               (None, 800)               1

## Train the model

We are now ready to train the model. We need to define the number of epochs and the batch size. 

- Epochs are the number of times the model is exposed to the training dataset. Each time, it will run the optimizer (SGD) and try to minimize the loss function. 

- Batch_size is the number of instances that the optimizer observes before tuning the weights and biases. There are many batches per epoch.

- We split the training data in an 80% training and 20% validation per epoch. The validation set is used to compute the metric and tune hyperparameters, to avoid overfitting.

- We add early stopping, on the loss function on the validation set, with a patience of N epoch. This will stop the optimization if the loss function does not go down for N  consecutive epochs. 

In [7]:
history = model.fit(
    X_train, 
    Y_train, 
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    verbose=VERBOSE,
    validation_split=VALIDATION_SPLIT,
    callbacks=[
        WandbCallback(),
        ],
    )

Epoch 1/20
INFO:tensorflow:Assets written to: /work/wandb/run-20221106_163015-37cc3uiz/files/model-best/assets
INFO:tensorflow:Assets written to: /work/wandb/run-20221106_163015-37cc3uiz/files/model-best/assets
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221106_163015-37cc3uiz/files/model-best)... Done. 0.1s
Epoch 2/20
INFO:tensorflow:Assets written to: /work/wandb/run-20221106_163015-37cc3uiz/files/model-best/assets
INFO:tensorflow:Assets written to: /work/wandb/run-20221106_163015-37cc3uiz/files/model-best/assets
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221106_163015-37cc3uiz/files/model-best)... Done. 0.1s
Epoch 3/20
INFO:tensorflow:Assets written to: /work/wandb/run-20221106_163015-37cc3uiz/files/model-best/assets
INFO:tensorflow:Assets written to: /work/wandb/run-20221106_163015-37cc3uiz/files/model-best/assets
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221106_163015-37cc3uiz/files/model-best)... Done. 0.1

## Test the model on unseen data

In [8]:
test_loss, test_accuracy = model.evaluate(X_test, Y_test, verbose = VERBOSE)
#track test results on wandb
wandb.log({
    "test_loss": test_loss, 
    "test_accuracy": test_accuracy
})



<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=ba4822a4-198a-4cdb-8280-0ca8d044b999' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>