# Tutorial 11 - Deep learning intro: Multilayer perceptron

Before we begin, make sure you have updated your venv with `tutorial11.yml`.

## Dataset

MNIST is a widely used for classification of images. The dataset is composed of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images. It is a subset of a larger set available from NIST. More information can be found at the [MNIST homepage](http://yann.lecun.com/exdb/mnist/).

## Main ML topic:
Supervised learning. We will introduce the basics of deep learning and show the capability of *Multilayer perceptron (MLP)*. This architecture is a type of a large field of neural networks that are calles *feed-forward neural networks*.

## Our mission
This tutorial's mission is to train a neural network that would identify your ID hand-written digits. For this taks I ask you to Write down your ID number on a white paper using a dark pen (blue or black). The digits should be separated from each other and about the size of 2X2 cm. Take a picture of your ID number from a distance of about 0.5m with your default focus. Make sure your image is clear and illuminated. Upload the image to your computer and cut each and every digit using "snipping tool". Make sure your digits are saved in PNG format. Name your first digit "1.PNG" and the second "2.PNG" etc. Save your 9 images in the same folder with your notebook.

## Theory reminders
The pereceptron is a model we have already seen before in the name of *logistic regression*. The main limitation of this model is that it is **linear**. Thus, mostly, it would fail to classify some real-life data since the relations between the label and the data are highly non-linear. There is even one famous task that is extreamly simple but fails this model. It is known as the "*XOR prbolem*". 


However, the perceptron itself is really similar to a neuron as shown below:
<center><img src="Images/1.PNG"><center>

The connctivity with other many neurons and the combination of non-linear activation function between them were assumed to be the key for solving complicated tasks as shown here:
<center><img src="Images/mlp.png"><center>

An artificial neural network that its' connections between the nodes do not form a cycle, is called *feedforward* neural network. If, in addition to that, all of the neurons are connected to each other and weights are not shared, it is called *densed* or *fully-connected* neural network.

Due to non-linear activation functions, the network cannot be explictly written as matrix multiplication and it is also makes the loss function highly non-convex. Thus, numeric solutions have to be applied such as *gradient descent*. 

The calculated gradients are still the loss function w.r.t every weight, and in order to calculate them through the layers we use the *chain rule*.

The high number of examples within our training set, mostly will not allow us to save all of it in the computer's memory. Thus, we divide our training sets into *batches*. Due to linearity of the gradient operator, we can first accumulate the loss for all of the examples in the batch and only then calculate the gradient for updating the weights.

For proper training, we should "go through" all of our training sets several times in what we call "*epochs*".

Let's define some terms:
* **Epoch**: A state where our model has "seen" all of our training set.
* **batch**: A subset of the training set to accumulate the loss for. The gradient is calculated and ehe weights are updated only after accumulating (summing) all of the loss terms in the batch.
* **batch size**: number of examples within a batch.

## Data loading

In [None]:
import numpy as np
import itertools
from tqdm import tqdm
import pickle
import sys
import pandas as pd
import matplotlib as mpl
import seaborn as sns
import matplotlib.pyplot as plt
mpl.style.use(['ggplot']) 
%matplotlib inline
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from IPython.display import display, clear_output
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='3'

Now we will import the packages of `tensorflow` and `keras` which are widely used as Python's platforms for deep learning.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras import utils

We will start with checking whether or not an accessible GPU exists on our computer.

In [None]:
if tf.config.list_physical_devices('GPU'):
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
    print("Either GPU does not exist or simply not accessible (cuda is not installed).")

You can also check if cuda is inastalled by the url `chrome://gpu/` in your Chrome browser. Use `Ctrl+F` for `cuda`. If `GPU CUDA compute capability major version` is 0 then cuda is not installed on your GPU.

If a `TensorFlow` operation has both CPU and GPU implementations, by default the GPU devices will be given priority when the operation is assigned to a device.

If you would like a particular operation to run on a device of your choice instead of what's automatically selected for you, you can use `with tf.device` to create a device context, and all the operations within that context will run on the same designated device.

In [None]:
# Place tensors on the CPU
with tf.device('/CPU:0'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# Run on the GPU (if exists)
c = tf.matmul(a, b)
print(c)


Now let's load MNIST from Keras datasets:

In [None]:
from tensorflow.keras.datasets import mnist
data  = mnist.load_data()

## Specific task:

In [None]:
(X_train_orig, Y_train_orig), (x_test, y_test) = mnist.load_data() #60,000 for training, 10000 for testing
x_train, x_val, y_train, y_val = train_test_split(X_train_orig, Y_train_orig, stratify=Y_train_orig, test_size=0.15, random_state=336546)

First of all, we will have a look on MNIST dataset:

In [None]:
fig, axes = plt.subplots(3, 3, figsize=(8,8))
for i, ax in enumerate(axes.flatten()): 
    ax.imshow(x_train[i], cmap='gray', interpolation='none')
    ax.set_title("Digit: {}".format(y_train[i]))
    ax.set_xticks([])
    ax.set_yticks([])
plt.show()

Print the shapes of all of the datasest (x,y of training, validation and testing). Make sure that the shapes of the data make sense to you

In [None]:
# C_1
#------------------------------Implement your code here------------------------

#------------------------------------------------------------------------------

Reshape one of the training examples as a 1-rank array and plot its histogram.

In [None]:
#C2
#------------------------------Implement your code here------------------------

#------------------------------------------------------------------------------

Now reshape all of the X data (including `X_train_orig`) in the shape of `(num of examples, 784)`. Then convert it into `float32` using `np.astype()` and then nromalize it by 255.

In [None]:
#C3
#------------------------------Implement your code here------------------------

#------------------------------------------------------------------------------

Plot the histogram of the same example that you chose to make sure you applied normalization.

In [None]:
#C5
#------------------------------Implement your code here------------------------

#------------------------------------------------------------------------------

Now, we should encode our labels as one-hot vectors. We can do it easily by using `utils` of `keras`.

In [None]:
# one-hot encoding using keras' numpy-related utilities
n_classes = 10
print("Shape before one-hot encoding: ", y_train.shape)
Y_train_orig = utils.to_categorical(Y_train_orig, n_classes)
Y_train = utils.to_categorical(y_train, n_classes)
Y_val = utils.to_categorical(y_val, n_classes)
Y_test = utils.to_categorical(y_test, n_classes)
print("Shape after one-hot encoding: ", Y_train.shape)

Let's see if we got what we expected to have:

In [None]:
print(y_train[0])
print(Y_train[0,:])

Finally, we can build our first neural netwrok! 

Let's see how can we build a simple, but powerful, neural network composed of two hidden layers with *Relu* activation and 512 neurons in each layer. Since it is a feedforward network, we should use `Sequential` class. Our output should be activated with *softmax* since it is a multiclass task and not *multilabel* for instance. For regularization, we would also add *dropout* with probaility of 0.2.

It is useful to name our models and some of their activations:

In [None]:
model_1 = Sequential(name="MLP_1")
model_1.add(Dense(512, input_shape=(784,)))
model_1.add(Activation('relu', name='Relu_1'))                            
model_1.add(Dropout(0.2))

model_1.add(Dense(512))
model_1.add(Activation('relu', name='Relu_2'))
model_1.add(Dropout(0.2))

model_1.add(Dense(10))
model_1.add(Activation('softmax'))


This is it, you have built your first neural network! Let's have a look on its' summery: 

In [None]:
model_1.summary()

Look how many parmeters are learned within this network. This can inform us how complex the model is comparing to other models we saw in the course and this is considered a very simple model.

The next thing to do is to choose our loss function, the metrics that we would like to calculate during the iterations and which optimizer it should use. We do it by the method `compile`.

In [None]:
model_1.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

This model is now ready to train and has its own initialized weights. Since we use a validation set before the overall training, and because `keras` models **do not** re-initialize the weights by default, we should save these weights for later use.

In [None]:
if not("results" in os.listdir()):
    os.mkdir("results")
save_dir = "results/"
model_name = "init_weigths_1.h5"
model_path = os.path.join(save_dir, model_name)
model_1.save(model_path)
print('Saved initialized model at %s ' % model_path)

Now we can train our model using the regular `fit` method we are so familiar with.

In [None]:
history = model_1.fit(x_train, Y_train,
          batch_size=128, epochs=20,
          verbose=2,
          validation_data=(x_val, Y_val))

In order to check what has occured within the iterations, we can use `history` attribute:

In [None]:
history.history.keys()

We can plot the Training loss and accuracy vs. the validation:

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(12,5))

axs[0].plot(history.history['accuracy'])
axs[0].plot(history.history['val_accuracy'])
axs[0].set_title('model accuracy')
axs[0].set_ylabel('accuracy')
axs[0].set_xlabel('epoch')
axs[0].legend(['train', 'val'], loc='lower right')


axs[1].plot(history.history['loss'])
axs[1].plot(history.history['val_loss'])
axs[1].set_title('model loss')
axs[1].set_ylabel('loss')
axs[1].set_xlabel('epoch')
axs[1].legend(['train', 'val'], loc='upper right')

plt.tight_layout()


Note: for much deeper insights and visualizations, you might want to have a look on [TensorBoard](https://www.tensorflow.org/tensorboard).

Build another **simple** model that you think would work and save the initialized weights.

In [None]:
#C_6
model_2 = Sequential(name="MLP_2")
#------------------------------Implement your code here------------------------

#---------------------------------------------------------------------------------
model_2.add(Dense(10))
model_2.add(Activation('softmax'))

In [None]:
model_2.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

save_dir = "results/"
model_name = "init_weigths_2.h5"
model_path = os.path.join(save_dir, model_name)
model_2.save(model_path)
print('Saved initialized model at %s ' % model_path)

Train your model. You may try different hyperparmeters such as batch size or number of epochs. Name the history of the model as `history_2`.

In [None]:
#C7
#------------------------------Implement your code here------------------------

#---------------------------------------------------------------------------------


In [None]:
fig, axs = plt.subplots(1, 2, figsize=(12,5))

axs[0].plot(history_2.history['accuracy'])
axs[0].plot(history_2.history['val_accuracy'])
axs[0].set_title('model accuracy')
axs[0].set_ylabel('accuracy')
axs[0].set_xlabel('epoch')
axs[0].legend(['train', 'val'], loc='lower right')


axs[1].plot(history_2.history['loss'])
axs[1].plot(history_2.history['val_loss'])
axs[1].set_title('model loss')
axs[1].set_ylabel('loss')
axs[1].set_xlabel('epoch')
axs[1].legend(['train', 'val'], loc='upper right')

plt.tight_layout()

We should now train our chosen model (`model_1` by default) with the complete training set. Before we do it, we need to initialize our model weights.

In [None]:
model_1 = load_model("results/init_weigths_1.h5") # Initializing weights before total run

In [None]:
history = model_1.fit(X_train_orig, Y_train_orig,
          batch_size=128, epochs=20,
          verbose=2)
# saving the model
save_dir = "results/"
model_name = "final_weights.h5"
model_path = os.path.join(save_dir, model_name)
model_1.save(model_path)
print('Saved trained model at %s ' % model_path)

Now we can evaluate our model performances on the test set:

In [None]:
mnist_model = load_model("results/final_weights.h5")
loss_and_metrics = mnist_model.evaluate(x_test, Y_test, verbose=2)

print("Test Loss is {:.2f} ".format(loss_and_metrics[0]))
print("Test Accuracy is {:.2f} %".format(100*loss_and_metrics[1]))

For the most interesting part of this tutorial, we will see whehter or not our neural network can figure our hand written digits. In order to do so, we will use the `pillow` package and we would make some preprocessing such as converting RGB to grayscale, looking at the "negative", resizing the images and strach the contrast by a hard threshold.

In [None]:
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
orig = []
fig, axes = plt.subplots(1, 9, figsize=(15,15))
for i, ax in enumerate(axes.flatten()):
    image = Image.open(str(i+1) + '.PNG')
    gray = image.convert('L')
    new_image = gray.resize((28, 28))
    data = np.asarray(new_image).astype('float32')
    data = (255-data)
    data /= 255
    ax.imshow(data, cmap=plt.get_cmap('gray'), vmin=0, vmax=1)
    ax.set_xticks([])
    ax.set_yticks([])
    orig.append(data)

In [None]:
plt.hist(orig[8].reshape(784))

In [None]:
fig, axes = plt.subplots(1, 9, figsize=(15,15))
thresh_img = []
for i, ax in enumerate(axes.flatten()):
    temp = orig[i].copy()
    temp[temp<np.percentile(temp, 85)]=0
    temp[temp>np.percentile(temp, 90)] *=2
    temp[temp>1] = 1
    thresh_img.append(temp)
    ax.imshow(temp, cmap=plt.get_cmap('gray'), vmin=0, vmax=1)
    ax.set_xticks([])
    ax.set_yticks([])

Your ID should appear below:

In [None]:
pred = ""
for img in thresh_img:
    img = img.reshape(1,28*28)
    pred += str(np.argmax(mnist_model.predict(img), axis=-1).item())
print("Your ID is: {}".format(pred))

In [None]:
plt.hist(thresh_img[8].reshape(784))

#### *This tutorial was written by [Moran Davoodi](mailto:morandavoodi@gmail.com) with the assitance of [Yuval Ben Sason](mailto:yuvalbse@gmail.com) & Kevin Kotzen*