# **Session 5 Practice: Fashion MNIST with different Nets <font color="red"> SOLUTION </font>**


<img src="https://do3z7e6uuakno.cloudfront.net/uploads/event/logo/1112702/595053a7143adafce285b2e39ca04f1a.jpeg" width="300">


### Example adapted by AI Saturdays Euskadi.

The objective of this practice is to understand how Neural Networks are modelled with a given dataset.

In particular, the dataset is the  __*Fashion MNIST*__, which was inspired by the famous [MNIST (created by Yann LeCun et al.)](http://yann.lecun.com/exdb/mnist/), but instead of classifying digits from 0 to 9, you'll classify __clothes__.

For this, the dataset contains the following labels:

| Label | Description |
| :-: | :- |
| 0 | T-shirt / Top |
| 1 | Trousers |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |


**Instructions:**

- You'll be using Python 3.
- You'll use Python's libraries: Pandas, Numpy, Keras.

**Completing the exercise, you´ll learn to:**
- Better use and understand Python NoteBooks.
- Be able to use python functions and additional libraries.
- Correctly apply the NN algorithm.
- Improve the predictions using Hyperparameter Tunning
- Compare with other NN Architectures, adjusting parameters.

Let's get started!

### 0. Importing the libraries

Since we'll be working withh Keras, we need to install all libraries related to Keras. We recommend to install Tensorflow directly. It is the package containing the library Keras, and it's developed by Google. [Tensorflow documentation](https://www.tensorflow.org/learn)

In [2]:
!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.8.0-cp38-cp38-win_amd64.whl (438.0 MB)
Collecting libclang>=9.0.1
  Downloading libclang-13.0.0-py2.py3-none-win_amd64.whl (13.9 MB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Downloading tensorflow_io_gcs_filesystem-0.24.0-cp38-cp38-win_amd64.whl (1.5 MB)
Collecting keras<2.9,>=2.8.0rc0
  Downloading keras-2.8.0-py2.py3-none-any.whl (1.4 MB)
Collecting tensorboard<2.9,>=2.8
  Downloading tensorboard-2.8.0-py3-none-any.whl (5.8 MB)
Collecting tf-estimator-nightly==2.8.0.dev2021122109
  Downloading tf_estimator_nightly-2.8.0.dev2021122109-py2.py3-none-any.whl (462 kB)
Installing collected packages: libclang, tensorflow-io-gcs-filesystem, keras, tensorboard, tf-estimator-nightly, tensorflow
  Attempting uninstall: tensorboard
    Found existing installation: tensorboard 2.5.0
    Uninstalling tensorboard-2.5.0:
      Successfully uninstalled tensorboard-2.5.0
Successfully installed keras-2.8.0 libclang-13.0.0 tensorboard-2.8.0 tenso

In [1]:
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

## Data Analysis

### 1. Import the dataset.

This dataset is integrated within Keras. In particular within ```tensorflow.keras.datasets.fashion_mnist```. Hence we'll directly use the method ```load_data()``` to fetch the Train and Test sets.

Before processing the dataset, take a look at the following [link](https://knowyourdata-tfds.withgoogle.com/#dataset=fashion_mnist&tab=STATS&select=kyd%2Ffashion_mnist%2Flabel&expanded_groups=cloud_vision,fashion_mnist)  to better understand the dataset. Play around with it to see train / test distribution, and other interesting insights.

__What is the proportion of Train / Test?__
- *Your answer*

__How many pictures have faces?__
- *Your answer*

In [2]:
#One-liner.
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

In [3]:
# Visualize the dataset
print(x_train[0][9]) # We´ll leave this here for you...

[  0   0   0   0   0   0   0   0   0   0   0   0   0 183 225 216 223 228
 235 227 224 222 224 221 223 245 173   0]


### 2. Make the dataset 1-dimensional.

Same as in the original MNIST, we have files containing pictures. These are matrixes of 28x28 pixels. 

We are going to be applying different models, but the first one is the 1-layer Perceptron. For this model we need to have the data in Long format, in a vector-like format.

We need to modify the training and test data, to later get the into categorical variables.

__Tips: ```reshape()``` and ```to_categorical()```.__

Before reshaping, keep in mind the color in each pixel is expressed in a scale 0 to 256...

In [4]:
# Four lines of code

x_train = x_train.reshape(x_train.shape[0], -1) / 255.0
x_test = x_test.reshape(x_test.shape[0], -1) / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

In [5]:
#Print out the sizes of the target variables

print(y_train.shape)
print(y_test.shape)

(60000, 10)
(10000, 10)


### Model 1: Perceptron.

Without getting too much into detail about the activation functions, we leave this "Golden Rule" for you to write down:

1. Use ReLU ('relu') cuando puedas, para las neuronas de cada capa oculta.
2. Use Softmax ('softmax') when you want to make a classification and your output has more than two categories.
3. Use Sigmoid ('sigmoid') when your output has two categories.
As you'll see next, models are formed of the following sections:

* ```Sequential()```: Tells Keras that you''ll add a sequence of layers.
* ```add()```: Adds the layer with the details you need. In the first layer that you define you need to specify the input's shape (```input_dim```). It's not necessary in the following.
* ```compile()```: This defines how the Net is going to be trained (Loss Function, Optimizer and metric to optimize towards.

Having stated this, let's create the first model based on the Perceptron to solve the Classification problem.

In [6]:
# Model Definition

model = Sequential()
model.add(Dense(10, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Now we will __train__ the model. As you can see, we need to specify the number of *epochs*, which are the complete iterations over the dataset, as well as the train and validation split. For the split, a 0.1 indicates a 10% Validation set.

In [7]:
# Training the model

model.fit(x_train, y_train, epochs=10, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1c18dec2e80>

Once trained the model, let's __evaluate__ the model's accuracy over the test set:

In [8]:
# Model Evaluation

_, test_acc = model.evaluate(x_test, y_test)
print(test_acc)

0.8482999801635742


__Model 1__: We've achieved an accuracy close to  __84%__... ¡Let's see how we can improve this!

### 4. Model 2: Perceptron with more neurons.

We will do as for Model 1, though this time with 50 Neurons per layer.

In [9]:
# Model Definition

model2 = Sequential()
model2.add(Dense(50, input_dim=784, activation='relu'))
model2.add(Dense(10, activation='softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [10]:
# Training the model

model2.fit(x_train, y_train, epochs=10, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1c18f5d1f70>

In [11]:
# Model Evaluation

_, test_acc = model2.evaluate(x_test, y_test)
print(test_acc)

0.8711000084877014


__Model 2__: We have now achieved an accuracy of  __87%__, it has clearly increased! Let's see if we can imporve it more.

### 5. Model 3: Multilayer Perceptron.

Now, we'll add a __new layer to the Perceptron__. 

Ideally, we will obtain a better output (the deeper the net, it's supposed to generalise better and it is better adjusted.

In [12]:
# Model Definition

model3 = Sequential()
model3.add(Dense(50, input_dim=784, activation='relu'))
model3.add(Dense(50, activation='relu'))
model3.add(Dense(10, activation='softmax'))
model3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [14]:
# Training the model

model3.fit(x_train, y_train, epochs=10, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1c1b687d4f0>

In [15]:
# Model Evaluation

_, test_acc = model3.evaluate(x_test, y_test)
print(test_acc)

0.8787999749183655


__Modelo 3__: We have got an accuracy of approximately __88%__... Has it improved? Yes, though not much.

What about trying with a __diffferent architecture__?

### 6. Model 4: Convolutional Neural Network (CNN)

Without getting too much into detail - this is the scope of another of our courses ;) - a  __Convolutional Neural Network (CNN)__ allows for better image recognition than the Perceptron, due to the mathematical operations that drive it internally.

Therefore, we leave an example of how are they typically used. Notice ther eare mre imports now:

* ```Conv2D```: Allows 2-dimensional convolution operations.
* ```MaxPooling2D```: Allows 2-dimensional pooling operations.
* ```Flatten```: Makes it easy to get the data into Long format (flattening the data).

They are usually used in a sequential way, mainly because of how the Convolution works.

In [23]:
# Importing libraries

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.utils import model_to_dot
import numpy as np

# Train / Test Split

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train[:,:,:,np.newaxis] / 255.0
x_test = x_test[:,:,:,np.newaxis] / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

In [20]:
# Model Definition

model4 = Sequential()
model4.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28,28, 1))) 
model4.add(MaxPooling2D(pool_size=2))
model4.add(Flatten())
model4.add(Dense(10, activation='softmax'))
model4.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [21]:
# Represent the Model's architecture

model4.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 28, 28, 64)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 14, 14, 64)       0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 12544)             0         
                                                                 
 dense_7 (Dense)             (None, 10)                125450    
                                                                 
Total params: 125,770
Trainable params: 125,770
Non-trainable params: 0
_________________________________________________________________


In [40]:
model_to_dot(model4, show_shapes=True, show_layer_names=True) #https://graphviz.gitlab.io/download/

You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model/model_to_dot to work.


In [19]:
# Training the model

model4.fit(x_train, y_train, epochs=10, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1a1786b8100>

In [20]:
# Model Evaluation

_, test_acc = model4.evaluate(x_test, y_test)
print(test_acc)

0.8988000154495239


¡WOW! It performs better than the previous architectures...with an accuracy of almost __90%__.

Although somewhat expected, it has been seen that __CNNs__ are good for processing images due to the Convolution operation.

However, it is not part of the Machine Learning curriculum to cover how wonderful these are.