# Train a NN on the MNIST dataset

The MNIST dataset is a collection of hand-written digits, labelled with their corresponding true digit representation. We want to train a MLP (very simple feedforward or sequential network), to recognize hand-written digits. 
![Picture title](https://upload.wikimedia.org/wikipedia/commons/thumb/2/27/MnistExamples.png/320px-MnistExamples.png)


## Import libraries and define symbolic constants

### Install wandb to keep track of model performance

In [1]:
!pip install --upgrade wandb
!wandb login ff97f4ffa6b4b35ec56fc229fc572b0ba72ac1fb

You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


### Import and initialize parameters

In [2]:
import tensorflow as tf
import numpy as np
from tensorflow import keras

import wandb
from wandb.keras import WandbCallback

wandb.init(project="digit-recognition")

EPOCHS = 50  # this is how many times re-train the model, each time optimizing its weight and biases
BATCH_SIZE = 64 # this is the number of instances we take from the training set before running the optimizer
VERBOSE = 1 # make it loud
NB_CLASSES = 10 # we have 10 classes in our dataset, hence 10 neurons in the last layer
N_HIDDEN = 128 # neurons in hidden layer
VALIDATION_SPLIT = 0.2 #leave 20% of the training set out for validation (accuracy, to avoid overfitting)
RESHAPED = 784 # reshape the image from a 28x28 matrix to a vector or 784 elements
DROPOUT = 0.3 # portion of dropout values in the network  
ACTIVATION_FUNCTION_HIDDEN = 'relu' # activation function for the hidden layers
ACTIVATION_FUNCTION_FINAL = 'softmax' # activation function for the output layer 
OPTIMIZER = 'SGD' # optimizer, this is how we search for the minimum in the loss function
LOSS_FUNCTION = 'categorical_crossentropy' #loss function, this is what is otimized
METRICS = ['accuracy'] #Our metrics, used to make sure we don't overfit. Computed also on the test set 

wandb.config = {
  "epochs": EPOCHS,
  "batch_size": BATCH_SIZE, 
  "n_hidden": N_HIDDEN,
  "validation_split": VALIDATION_SPLIT,
  'activation_funciton_hidden': ACTIVATION_FUNCTION_HIDDEN,
  'activation_funciton_final': ACTIVATION_FUNCTION_FINAL,
  'optimizer': OPTIMIZER,
  'loss_function': LOSS_FUNCTION,
  'metric': METRICS,
}


2022-11-05 15:08:49.813763: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-05 15:08:49.928068: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-05 15:08:49.932949: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-05 15:08:49.932967: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if yo

## Load demo dataset from Keras

In [3]:
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

## Reshape data and encode labels (one-hot)

In [4]:
X_train = X_train.reshape(60000, RESHAPED).astype("float32")/255
X_test = X_test.reshape(10000, RESHAPED).astype("float32")/255

# use a One-shot representaiton for the digits
Y_train = tf.keras.utils.to_categorical(Y_train, NB_CLASSES)
Y_test = tf.keras.utils.to_categorical(Y_test, NB_CLASSES)

## Build the model

- The model will be dense, meaning that all neurons at layer L take as inputs all the output of all neurons at layer L-1.

- The model will use softmax as activation function. The softmax function if very nicely describe here 

In [5]:
model = tf.keras.models.Sequential()

model.add(
    keras.layers.Dense(
    N_HIDDEN,
    input_shape =(RESHAPED,),
    name='dense_layer', 
    activation=ACTIVATION_FUNCTION_HIDDEN,
    kernel_regularizer=keras.regularizers.L2(0.01),
    activity_regularizer=keras.regularizers.L2(0.01)
    )
)
model.add(keras.layers.Dropout(DROPOUT))
model.add(
    keras.layers.Dense(
    N_HIDDEN,
    name='dense_layer_2', 
    activation=ACTIVATION_FUNCTION_HIDDEN,
    kernel_regularizer=keras.regularizers.L2(0.01),
    activity_regularizer=keras.regularizers.L2(0.01)   
    )
)
model.add(keras.layers.Dropout(DROPOUT))
model.add(
    keras.layers.Dense(
    NB_CLASSES,
    name='dense_layer_3', 
    activation=ACTIVATION_FUNCTION_FINAL,
    kernel_regularizer=keras.regularizers.L2(0.01),
    activity_regularizer=keras.regularizers.L2(0.01) 
    )
)

2022-11-05 15:08:55.405135: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-11-05 15:08:55.405174: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-05 15:08:55.405191: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (p-ba4822a4-198a-4cdb-8280-0ca8d044b999): /proc/driver/nvidia/version does not exist
2022-11-05 15:08:55.405521: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Compile the model

- We use stochastic gradient descent

- The loss function is categorical cross-entropy, this is particularly well-suited for multi-class problems with a one-hot encoding 

- We use accuracy to evaluate the performance of the model

In [6]:
model.compile(
    optimizer=OPTIMIZER,
    loss=LOSS_FUNCTION,
    metrics=METRICS
)

## Train the model

We are now ready to train the model. We need to define the number of epochs and the batch size. 

- Epochs are the number of times the model is exposed to the training dataset. Each time, it will run the optimizer (SGD) and try to minimize the loss function. 

- Batch_size is the number of instances that the optimizer observes before tuning the weights and biases. There are many batches per epoch.

- We split the training data in an 80% training and 20% validation per epoch. The validation set is used to compute the metric and tune hyperparameters, to avoid overfitting.

- We add early stopping, on the loss function on the validation set, with a patience of N epoch. This will stop the optimization if the loss function does not go down for N  consecutive epochs. 

In [7]:
model.fit(
    X_train, 
    Y_train, 
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    verbose=VERBOSE,
    validation_split=VALIDATION_SPLIT,
    callbacks=[
        WandbCallback(),
        ],
    )

Epoch 1/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_150854-3s9o3tse/files/model-best)... Done. 0.0s
Epoch 2/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_150854-3s9o3tse/files/model-best)... Done. 0.0s
Epoch 3/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_150854-3s9o3tse/files/model-best)... Done. 0.0s
Epoch 4/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_150854-3s9o3tse/files/model-best)... Done. 0.0s
Epoch 5/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_150854-3s9o3tse/files/model-best)... Done. 0.0s
Epoch 6/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_150854-3s9o3tse/files/model-best)... Done. 0.0s
Epoch 7/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_150854-3s9o3tse/files/model-best)... Done. 0.0s
Epoch 8/50
[34m[1mwandb[0m: Adding directory to artifact (/

<keras.callbacks.History at 0x7f256864cb50>

## Test the model on unseen data

In [8]:
test_loss, test_accuracy = model.evaluate(X_test, Y_test)
#track test results on wandb
wandb.log({
    "test_loss": test_loss, 
    "test_accuracy": test_accuracy
})



<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=ba4822a4-198a-4cdb-8280-0ca8d044b999' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>