# Lecture 2 – MNIST dataset and classification via MLP

# Requirements:

- [Google Colab](https://colab.research.google.com/) 

If running locally:

- [NumPy](https://numpy.org/doc/stable/user/quickstart.html)
- [Pandas](https://pandas.pydata.org/docs/user_guide/10min.html)
- [TensorFlow 2](https://www.tensorflow.org/tutorials)
- [Matplotlib](https://matplotlib.org/stable/tutorials/index.html)

# Objectives:

- Getting familiar with MNIST dataset.
- Implementing an MLP via TensorFlow.
- Training using gradient decent algorithm.
- Overfitting, early stopping, regularization etc. 

## Tutorials

Some python libraries are required to accomplish the tasks assigned in this homework. If you feel like you need to follow a tutorial before, feel free to do so:

*   [Scikit-learn Tutorials](https://www.tensorflow.org/tutorials)
*   [TensorFlow Tutorials](https://scikit-learn.org/stable/tutorial/index.html)
*   [Matplotlib Tutorials](https://matplotlib.org/stable/tutorials/index.html)

# Introduction

In this exercise we will load MNIST dataset and train an MLP model to classify handwritten digits. We will use the gradient decent algorithm to optimize the parameters of the model during training. Further, we study the influence of different hyperparameters, i.e. learning rate, batch size, activation function, number of hidden layers, and number of neurons, on the performance of the model. 

## Imports


In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

## System checks

Is there any GPU available?

In [None]:
gpus = tf.config.list_physical_devices('GPU')
cpus = tf.config.list_physical_devices('CPU')
print(gpus)
print(cpus)

Choose your device for computation. CPU or one of your CUDA devices

In [None]:
tf.config.set_visible_devices(gpus, 'GPU')

# 1. Load the data

Let us load the raw data from Keras.

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [None]:
print(x_train.shape)
print(y_train.shape)

In [None]:
print("Number of original training examples:", len(x_train))
print("Number of original test examples:", len(x_test))


In [None]:
x_train, x_test = x_train/255.0, x_test/255.0
print("minimum of x_train:", x_train.min(), "maximum of x_train:", x_train.max())
print("minimum of y_train:", y_train.min(), "maximum of y_train:", y_train.max())

Show the first example:

In [None]:
print("Label:", y_train[0])
plt.imshow(x_train[0])
plt.colorbar()

# 2. Implementing an MLP from scratch in TensorFlow

We need a so-called **Dense** layer. Let us implement it as a Python class.

In [None]:
class NaiveDense:
    def __init__(self, input_size, output_size, activation):
        self.activation = activation

        w_shape = (input_size, output_size)
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        self.W = tf.Variable(w_initial_value)

        b_shape = (output_size,)S
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)

    def __call__(self, inputs):
        return self.activation(tf.matmul(inputs, self.W) + self.b)

    @property
    def weights(self):
        return [self.W, self.b]

Now, let’s create a **NaiveSequential** class to chain these layers. It wraps a list of layers
and exposes a `__call__()` method that simply calls the underlying layers on the
inputs, in order. It also features a `weights` property to easily keep track of the `layers`’
parameters.

In [None]:
class NaiveSequential:
    def __init__(self, layers):
        self.layers = layers

    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
           x = layer(x)
        return x

    @property
    def weights(self):
       weights = []
       for layer in self.layers:
           weights += layer.weights
       return weights
    
    def compile(self, loss, optimizer):
        self.loss = loss
        self.optimizer = optimizer
    
    def train_step(self, data):
        x_batch, y_batch = data

        with tf.GradientTape() as tape:
            predictions = self(x_batch)
            loss = self.loss(y_batch, predictions)

        gradients = tape.gradient(loss, self.weights)

        # for g, w in zip(gradients, self.weights):
        #     w.assign_sub(g * self.lr)
        self.optimizer.apply_gradients(zip(gradients, self.weights))
        
        return {"loss": loss}
    
    def test_step(self, data):
        x_batch, y_batch = data

        predictions = self(x_batch)
        loss = self.loss(y_batch, predictions)
        
        return {"loss": loss}
    
    def fit(self, train_dataset, epochs, test_dataset):

        history = {'loss':[], 'val_loss': []}

        for epoch in range(epochs):        
            # Train loop
            train_loss = tf.keras.metrics.Mean()
            for x_batch, y_batch in train_dataset:
                train_loss(self.train_step((x_batch, y_batch))["loss"])
            history['loss'].append(train_loss.result())

            # Test loop
            test_loss = tf.keras.metrics.Mean()
            for x_batch, y_batch in test_dataset:
                test_loss(self.test_step((x_batch, y_batch))["loss"])
            history['val_loss'].append(test_loss.result())

            # Print progress
            print(f"Epoch {epoch + 1}: train loss = {history['loss'][-1]:.4f}, test loss = {history['val_loss'][-1]:.4f}")
        
        return history

Using this **NaiveDense** class and this **NaiveSequential** class, we can create a mock
Keras model:

In [None]:
model = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=32, activation=tf.nn.relu),
    NaiveDense(input_size=32, output_size=10, activation=tf.nn.softmax)
])
assert len(model.weights) == 4

In `compile` function, we specify the loss function and the learning rate.

In [None]:
class GD():
    def __init__(self, learning_rate = 0.001):
        self.lr = learning_rate

    def apply_gradients(self, zip_grads_weights):
        for g, w in zip_grads_weights:
            w.assign_sub(g * self.lr)

optimizer = GD()
# optimizer = tf.keras.optimizers.SGD()
model.compile(tf.keras.losses.SparseCategoricalCrossentropy(), optimizer)

Finally, let's create batches of the data using `tf.data.Dataset`.

In [None]:
batch_size = 128

x_train = x_train.reshape(-1, 28*28).astype(np.float32)
x_test = x_test.reshape(-1, 28*28).astype(np.float32)

train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)

# 3. Training loop
An epoch of training simply consists of repeating the training step for each batch in
the training data, and the full training loop is simply the repetition of one epoch:

In [None]:
num_epochs = 10

history = model.fit(train_dataset, num_epochs, test_dataset)

### Evaluating the model
We can evaluate the model by taking the `argmax` of its predictions over the test images,
and comparing it to the expected labels:

In [None]:
y_pred = np.argmax(model(x_test.reshape(-1, 28*28).astype(np.float32)).numpy(), axis = 1)

Let us report the prediction for one of the sample images in the test data.

In [None]:
n = 0
print("Label:", y_test[n], ",  Prediction:", y_pred[n])
plt.imshow(x_test[n].reshape((28, 28)))
plt.colorbar()

Now, we can report accuracy, precision, and recall using `sklearn.metrics`.

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score
accuracy = accuracy_score(y_pred, y_test)
precision = precision_score(y_pred, y_test, average='macro')
recall = recall_score(y_pred, y_test, average='macro')

print('accuracy: ', accuracy, '\nprecision: ', precision, '\nrecall: ', recall)
