# Train a NN on the MNIST dataset

The MNIST dataset is a collection of hand-written digits, labelled with their corresponding true digit representation. We want to train a MLP (very simple feedforward or sequential network), to recognize hand-written digits. 
![Picture title](https://upload.wikimedia.org/wikipedia/commons/thumb/2/27/MnistExamples.png/320px-MnistExamples.png)


## Import libraries and define symbolic constants

### Install wandb to keep track of model performance

In [1]:
!pip install --upgrade wandb
!wandb login ff97f4ffa6b4b35ec56fc229fc572b0ba72ac1fb

You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


### Import and initialize parameters

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import wandb
from wandb.keras import WandbCallback
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten

INPUT_SHAPE = (28, 28)
ACTIVATION_FUNCTION = 'sigmoid'
N_CLASSES = 10
OPTIMIZER = 'adam'
METRICS = ['accuracy']
EPOCHS = 50
VERBOSE = 1
VALIDATION_SPLIT = 0.2
WANDB_PROJECT = 'logistic-classification'

wandb.init(project=WANDB_PROJECT)
wandb.config = {
'input_shape': INPUT_SHAPE,
'activation_function': ACTIVATION_FUNCTION,
'number_of_classes': N_CLASSES,
'otpimizer': OPTIMIZER,
'metrics': METRICS,
'epochs': EPOCHS,
'validation_split': VALIDATION_SPLIT
}

2022-11-05 20:32:39.268811: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-05 20:32:39.498951: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-05 20:32:39.498983: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-05 20:32:39.542697: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-05 20:32:42.155320: W tensorflow/stream_executor/pla

## Load MNIST dataset from Keras

In [3]:
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

## Normalize pixel values to be 0-1

In [4]:
X_train = X_train.astype("float32")/255
X_test = X_test.astype("float32")/255
Y_train = Y_train.astype('int32')
Y_test = Y_test.astype('int32')


## Build the model

- We use the sequential class to create a linear stack of feedforward layers. In this we have:

- A layer that flattens the images, creating a 1D array from the 2D matrix

- A layer that takes this array and passes it through a sigmoid function with 10 outputs (multiclass logistic regression)

In [5]:
model = keras.Sequential(
    [
        Flatten(input_shape=INPUT_SHAPE),
        Dense(
            N_CLASSES,
            activation=ACTIVATION_FUNCTION
            )
    ]
)

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 10)                7850      
                                                                 
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________
2022-11-05 20:32:51.368833: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-11-05 20:32:51.368871: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-05 20:32:51.368893: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does

## Compile the model

- We use the adam optimizer

- The loss function is sparse categorical cross-entropy, this is particularly well-suited for multi-class problems with a one-hot encoding 

- We use accuracy to evaluate the performance of the model

In [6]:
model.compile(
    optimizer=OPTIMIZER,
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=METRICS
)

## Train the model

We are now ready to train the model. We need to define the number of epochs and the batch size. 

- Epochs are the number of times the model is exposed to the training dataset. Each time, it will run the optimizer and try to minimize the loss function. 

- We split the training data in an 80% training and 20% validation per epoch. The validation set is used to compute the metric and tune hyperparameters, to avoid overfitting.

In [7]:
history = model.fit(
    x=X_train, 
    y=Y_train, 
    epochs=EPOCHS,
    validation_split=VALIDATION_SPLIT,
    callbacks=[
        WandbCallback(),
        ],
    )

Epoch 1/50
  output, from_logits = _get_logits(
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_203250-11kmlk7n/files/model-best)... Done. 0.0s
Epoch 2/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_203250-11kmlk7n/files/model-best)... Done. 0.0s
Epoch 3/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_203250-11kmlk7n/files/model-best)... Done. 0.0s
Epoch 4/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_203250-11kmlk7n/files/model-best)... Done. 0.0s
Epoch 5/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_203250-11kmlk7n/files/model-best)... Done. 0.0s
Epoch 6/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_203250-11kmlk7n/files/model-best)... Done. 0.0s
Epoch 7/50
[34m[1mwandb[0m: Adding directory to artifact (/work/wandb/run-20221105_203250-11kmlk7n/files/model-best)... Done. 0.0s
Epoch 8/50
Epoch 9/50
[3

## Test the model on unseen data

In [8]:
test_loss, test_accuracy = model.evaluate(X_test, Y_test)
#track test results on wandb
wandb.log({
    "test_loss": test_loss, 
    "test_accuracy": test_accuracy
})

  output, from_logits = _get_logits(


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=ba4822a4-198a-4cdb-8280-0ca8d044b999' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>