In this second part, we will create and train a MultiLayer Perceptron (MLP) and a Convolutional Neural Network (CNN) on the benchmark dataset MNIST.

In [None]:
import keras
from keras.layers import Dense
from keras.models import Sequential
import tensorflow as tf

import numpy as np

from matplotlib import pyplot as plt

Since it's a common dataset, MNIST can be found in the datasets module of keras (actually, it's only on tensorflow in some versions, so try to remove the `tf` part in the string below and see if it works for you)...

In [None]:
(mnist_train_images, mnist_train_labels), (mnist_validation_images, mnist_validation_labels) = tf.keras.datasets.mnist.load_data()
mnist_train_images.shape, mnist_train_labels.shape, mnist_validation_images.shape, mnist_validation_labels.shape

**Question**: what does this variable assignment mean?

```python
(mnist_train_images, mnist_train_labels), (mnist_validation_images, mnist_validation_labels) = ...
```

Let's proceed with querying the dataset first...

We use matplotlib for plotting the image here for two reasons:

1. We can add a title to the data (e.g., by adding the label)
2. PIL shows images on a 1:1 scale, thus a 28 x 28 image is too small to be seen well on a modern monitor. Matplotlib, instead, plots the images so that they are always at a "visible" scale

In [None]:
mnist_first_image = mnist_train_images[0]
mnist_first_label = mnist_train_labels[0]

plt.imshow(mnist_first_image)
plt.title(f'Label: {mnist_first_label}')

When plotting grayscale images (as it is the case with mnist), we should use the "gray" colormap for matplotlib, to avoid getting the weird purple-to-yellow color representation, which is not a faithful representation of the image.

In [None]:
plt.imshow(mnist_first_image, cmap="gray")
plt.title(f'Label: {mnist_first_label}')

We have to normalize the data. **DIY**

In [None]:
# normalize data

# your code here

We can finally construct the neural network. We can use the `Sequential` paradigm, which works by stacking up layers.

**Notice that `Sequential` only works when the information flows sequentially through a NN. If there is branching, we need to use other paradigms**

In [None]:
mlp = Sequential()

mlp.add(keras.layers.Flatten(input_shape=(28, 28)))
mlp.add(Dense(32, activation='relu', input_shape=(784,)))
mlp.add(Dense(32, activation='relu'))
mlp.add(Dense(10, activation='softmax'))

mlp.summary()

Before training, we need to **compile** the model, by specifying an optimizer and loss function. Optionally, we can also list evaluation metrics to monitor them during training.

In [None]:
mlp.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
mlp.fit(mnist_train_images, mnist_train_labels, epochs=5)

This refers only to the training metrics. To get the validation performance, we call `evaluate`

In [None]:
mlp.evaluate(mnist_validation_images, mnist_validation_labels)

**Question**: the method above requires to pass images and labels. At deployment time, however, we might not have labels. What can we do in this case?

Experiment with it...

In [None]:
# your code here

Next, we can build a cnn to train on MNIST. This is the architecture we will reproduce

In [None]:
cnn = Sequential()

cnn.add(keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
cnn.add(keras.layers.MaxPooling2D((2, 2)))

cnn.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
cnn.add(keras.layers.MaxPooling2D((2, 2)))

cnn.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))

cnn.add(keras.layers.GlobalAveragePooling2D())
cnn.add(keras.layers.Dense(10, activation='softmax'))

cnn.summary()

In [None]:
cnn.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
cnn.fit(mnist_train_images, mnist_train_labels, epochs=2)
cnn.evaluate(mnist_validation_images, mnist_validation_labels)

We can try to experiment with this model by creating our data:

In [None]:
from PIL import Image

testimg = Image.open("mnist/2.jpg")
testimg = testimg.convert("L")

print(testimg.size)
plt.imshow(testimg, cmap="gray")

**DIY**

* convert the image to a numpy array
* set it up in the right format for being evaluated by the CNN (tip: batches) 
* generate a prediction

In [None]:
# your code here

**Question**: what is the format of the prediction? How can we convert it to a number?

Now that we have finished training the model, we can save it. **TIPS**:

1. Remember to save a model after training
2. Be careful that the `save` method **overwrites by default**. If you are training multiple models, be careful about the names
3. **Check that the folders for saving a model exist before training**. If the folder does not exist, the saving will fail

In [None]:
cnn.save("mnist_cnn.h5")

**DIY**: implement a residual layer using the functional interface

![](imgs/residual.png)

