In [None]:
import keras 
from keras.datasets import fashion_mnist
from keras.utils import to_categorical

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping

import matplotlib.pyplot as plt

import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report



Fashion MNIST is a dataset from online fashion retailer Zalando consisting of a training set of 60,000 examples and a test set of 10,000 examples from one of 10 classes:

| Label | Description |
|-------|-------------|
| 0     | T-shirt/top |
| 1     | Trouser     |
| 2     | Pullover    |
| 3     | Dress       |
| 4     | Coat        |
| 5     | Sandal      |
| 6     | Shirt       |
| 7     | Sneaker     |
| 8     | Bag         |
| 9	    | Ankle boot  |

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel. This pixel-value is an integer between 0 and 255.

The goal is to predict the clothing category from the pixel-values from an image.

The dataset can be directly loaded using a function from the keras library.



In [None]:
(X_train,y_train), (X_test,y_test) = fashion_mnist.load_data()

Let's check the size of the training and testing sets.


In [None]:
print('X training shape:',X_train.shape)
print('X test shape:',X_test.shape)
print('y training shape:',y_train.shape)
print('y test shape:',y_test.shape)

Each example is composed by 28 rows and 28 columns that contain the 784 pixel values for an image. For example, these are the two first rows for a single example.



In [None]:
X_train[1][:2]

The y arrays are one-dimensional vectors with the class labels.

In [None]:
y_train

We can display the first images in the training data to get an idea.

In [None]:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal',      'Shirt',   'Sneaker',  'Bag',   'Ankle boot']
plt.figure(figsize=[10,10])

for i in range(5):
  n = int('15' + str(i + 1))
  plt.subplot(n)
  plt.imshow(X_train[i], cmap='gray')
  plt.title("Class : {}\n {}".format(y_train[i], class_names[i]))


We scale the data into the 0-1 range.

In [None]:
X_train = X_train/255.0
X_test = X_test/255.0

The network cannot work directly with categorical values, so we must convert them to numerical generating one boolean column (0/1) for each class category.

In [None]:
y_train_enc = to_categorical(y_train)
y_test_enc = to_categorical(y_test)

In [None]:
y_train_enc

Now, we split the training data into a training and a validation set using the `train_test_split` function from `sklearn` as usual.



In [None]:
x_train,x_val,y_train,y_val = train_test_split(X_train, y_train_enc, test_size=0.2, random_state=35, stratify = y_train_enc)

Let's check the shapes of the training and validation datasets.

In [None]:
print('X training shape:',x_train.shape)
print('X validation shape:',x_val.shape)
print('y training shape:',y_train.shape)
print('y validation shape:',y_val.shape)

## 2-layer Multilayer Perceptron

---

**Build a keras model that represents a two-layer MLP, i.e., an input layer, a single hidden layer and an output layer.**

* **The input layer is a Flatten layer to convert the input into a single vector:**

> `Flatten(input_shape=(28,28,1)`

* **Add a Dense layer with 128 units and ReLu activation function as the hidden layer.**

* **Add a Dense layer with 10 units (one per class label) and softmax activation function as the output layer.**

**Compile the model and display the summary.**

---

---

**Train (fit) the model using a batch size of 256, 15 epochs and using the validation sets obtained above (x_val,y_val) as validation data.**

---

---

**Plot the loss and accuracy for the training and validation data**

**You can use the following code. It assumes that the object returning by the fit is named `model_history`, you can change the name.**

**Study the plots. Does the model overfit the data? Check if the behavior suggests that training should stop earlier or more epochs are needed. In that case, modify the number of epochs and train the model again.**

---

In [None]:
plt.figure(figsize=[12,4])

accuracy = model_history.history['accuracy']
val_accuracy = model_history.history['val_accuracy']
loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

plt.subplot(1, 2, 1)

plt.ylabel('Loss')
plt.plot(loss,label='Training loss');
plt.plot(val_loss,label='Validation loss');
plt.legend(loc='upper left')

plt.grid()

plt.subplot(1, 2, 2)

plt.ylabel('Accuracy')
plt.plot(accuracy,label='Training acc')
plt.plot(val_accuracy,label='Validation acc')
plt.legend(loc='upper left');

plt.grid()


---

**Evaluate your final model on the test data using the `evaluate` method. Check the loss and the accuracy.**

---

---

**Use the ``classification_report`` function from ``scikit-learn`` to get the classification metrics on each class and see which classes are most frequently misclassified.**

**You can use the following code, substituting the `model` variable for the one containing your fitted model.**

---

In [None]:
predict_test=model.predict(X_test);
y_pred=np.argmax(predict_test,axis=1)
print(classification_report(y_test, y_pred, target_names=class_names))

## Convolutional Neural Networks

---

**Create a CNN model with the following layers:**

* **A convolutional layer with 32 filters, a 3x3 kernel size, a ReLu activation function and `padding='same'`**

* **A max-pooling layer with a 2x2 pool size**

* **A flatten layer**

* **A dense layer with 128 units and a ReLu activation function**

* **A dense layer with 10 units and a softmax activation function**

**Since the convolutional layer is the first, you must specify the size of the input as a parameter with**

> `input_shape=(28, 28, 1)`

**Compile the model and display the summary.**

---

---

**Train (fit) the model using a batch size of 256, 25 epochs and using the validation sets obtained above (x_val,y_val) as validation data.**

---

---

**Plot the loss and accuracy for the training and validation data.**

**Study the plots. Does the model overfit the data?**

---

### Early stopping

---

**If you think that the training should have stopped earlier to avoid overfitting, you can apply early stopping adding the following to the fit call:**

> `callbacks=EarlyStopping(monitor='val_loss',patience=3)`

**Plot the loss and accuracy for the training and validation data for the new model.**

---

---

**Select your final model with or without early stopping and apply it on the test data. Check the loss and the accuracy.**

**Print a classification report.**

---

### Adding dropout to avoid overfitting

---

**A way to avoid overfitting is to add dropout layers. Add one dropout layer with a 0.25 rate to the previous model. Place it after the max-pooling layer. Add another after the first dense layer**

**Train (fit) the model using a batch size of 256, 25 epochs and using the validation sets obtained above (x_val,y_val) as validation data.**

**Plot the loss and accuracy for the training and validation data for the new model.**

**You can also check how the dropout works with and without early stopping.**

---



---

**Train (fit) the model using a batch size of 256, 25 epochs and using the validation sets obtained above (x_val,y_val) as validation data.**

---

---

**Select the model with or without early stopping (but with dropout) and evaluate it on test data**

**Print the classification report**

---

---
**Now create a model with the following layers:**

* **A convolutional layer with 32 filters and a 3x3 kernel size, a ReLu activation function and `padding='same'` (input layer)**

* **A max-pooling layer with a 2x2 pooling size**

* **A convolutional layer with 64 filters and a 3x3 kernel size, a ReLu activation function and `padding='same'`**

* **A max-pooling layer with a 2x2 pooling size**

* **A dropout layer with a 0.3 rate**

* **A convolutional layer with 128 filters and a 3x3 kernel size, a ReLu activation function and `padding='same'`**

* **A convolutional layer with 128 filters and a 3x3 kernel size, a ReLu activation function and `padding='same'`**

* **A max-pooling layer with a 2x2 pooling size**

* **A dropout layer with a 0.4 rate**

* **A flatten layer**

* **A dense layer with 512 units**

* **A dropout layer with a 0.25 rate**

* **A dense layer with 10 units and a softmax activation function**

---



**Compile the model**

**Train (fit) the model using a batch size of 256, 25 epochs and using the validation sets obtained above (x_val,y_val) as validation data.**

**Plot the loss and accuracy for the training and validation data for the new model.**

**Evaluate the model on the test set**

**To complete the assignment, you can try to add early stopping to the mode or modify other parameters (number of filters, activation functions, dropout rate) and even adding new layers**
