# Recognition of images of the fashion mnist dataset

## The project


The aim of the project is to use differents techniques of deep learning in order to predict type of clothes of the fashion mnist dataset the more precisely possible.

The dataset is compose of 60 000 images for training and 10 000 images for testing.

<img src="img/Fashion-MNIST-Dataset-Images-with-Labels-and-Description.png">

There are 10 different classes, the neural network will have to predict for an image given what type of class it is.

## The code

### Linear model

We will begin our training by our more simple model : a linear model.

First we need to import the packages we will need :

```python

import tensorflow as tf
import tensorflow.keras as keras
import matplotlib.pyplot as plt

```

Then we need to create our linear model using keras :

```python

def linear_model(x, y, val_x, val_y, opt, loss_func, epochs, batch_size):
    model = keras.Sequential([
        # convert a two dimensional matrix into a vector
        keras.layers.Flatten(),
        keras.layers.Dense(10, activation=keras.activations.softmax),
    ])

    model.compile(optimizer=opt, loss=loss_func, metrics=keras.metrics.categorical_accuracy)

    logs = model.fit(x, y, validation_data=(val_x, val_y), epochs=epochs, batch_size=batch_size,
                     callbacks=[keras.callbacks.LearningRateScheduler(scheduler)])
    model.summary()

    return logs

```

The model take in parameter :

* The training and testing datas
* The function of optimization
* The function for evaluate the loss
* The epochs (number of time the neural network process the entire datset)
* The batch size (number of example given before the neural network corrige the weights

Here we choose for the activation function the softmax because the sum of the output returned is 1 and its good in a categorical problem as it return a pourcentage on how much it thinks an image is a certain type of category or not.

When we fit the model there is an argument called callbacks, what does he do ? This argument call every epochs the function scheduler :

```python

def scheduler(epoch, lr):
    if epoch < 150:
        return lr
    else:
        return lr * 0.9875

```

This function allow the learning rate to be reduced from the 150th iteration. Reduction of the learning rate will allow the neural network to become more and more precise between each epoch(from the 150th).


In the main function :


```python


if __name__ == "__main__":
    # how many time the model will review the training data
    epochs = 300
    # number of data images who spreed through the network (forward propagation), after that the network
    # mean the sum of errors and make only one backpropagation
    # batch size increase the available computational parallelism and make it converge faster to optimum local
    # but algorithm with large batch size will hardly find the minimum global compared to little bach size
    batch_size = 1024

    # get data of training and testing from fashion mnist dataset
    (x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

    # pixel have values from 0 to 255, normalize them
    x_train = x_train / 255.0
    x_test = x_test / 255.0

    # transform label (containing a value from O to 9) to matrix of 10 (one hot encoding)
    y_train = keras.utils.to_categorical(y_train, 10)
    y_test = keras.utils.to_categorical(y_test, 10)

    all_logs = []
    log = linear_model(x_train, y_train, x_test, y_test, keras.optimizers.SGD(lr=0.05, momentum=0.95),
                       keras.losses.categorical_crossentropy, epochs=epochs, batch_size=batch_size)

    all_logs.append(log)

    plot_log(all_logs)
    

```
So firstly we introduce hyperparameters epochs and batch_size and set it respectively to 300 and 1024.
A large batch size will allow the network to process the data much faster but there at risk that it converge in global (and not local) optimum.

For the loss function, cross-entropy is used as it is a good function coupled to the softmax functions as it penalized well the deviations between output and predicted values.

The function plot_log allow us to display the loss and accuracy of our models.

After 70 epochs, here are our results :

<img src="img/plot_1_1.png">

<img src="img/plot_1_2.png">

<img src="img/plot_1_3.png">

<img src="img/plot_1_4.png">

As we can see in these graphs, the loss fall down in the first epochs and then decrease a little.
On the training data, the values seems to decrease until the end but on the training data the loss seems to stabilise and even increase at the end. Let's look more carrefuly the datas :

<img src="img/plot_1_5.png">

<img src="img/plot_1_6.png">

<img src="img/plot_1_7.png">

As we can see, the loss on the testing test was as its lowest on the 38th epoch. Then the loss on training test is still increasing a little but the loss on training test decrease over the time.

It suggets that we are strating overfitting, the model start over-learn and can't generalize anymore.


Let's try with a MLP



### Multi Layer Perceptron


The only difference with the previous code is that we add two more layers in the model :

```python3

def multi_layer_perceptron(x, y, val_x, val_y, opt, loss_func, epochs, batch_size):
    model = keras.Sequential([
        # convert a two dimensional matrix into a vector
        keras.layers.Flatten(),
        keras.layers.Dense(60, activation=keras.activations.relu),
        keras.layers.Dense(60, activation=keras.activations.relu),
        keras.layers.Dense(10, activation=keras.activations.softmax),
    ])

    model.compile(optimizer=opt, loss=loss_func, metrics=keras.metrics.categorical_accuracy)

    logs = model.fit(x, y, validation_data=(val_x, val_y), epochs=epochs, batch_size=batch_size,
                     callbacks=[keras.callbacks.LearningRateScheduler(scheduler)])
    model.summary()

    return logs

```

I reduce the number of epochs at 50 as he become useless to train more if the model overfit before the end.

I also change the scheduler function :

```python3

def scheduler(epoch, lr):
    if epoch < 30:
        return lr
    else:
        return lr * 0.98

```

Let's see what are the results :


<img src="img/plot_2_1.png">

<img src="img/plot_2_2.png">

<img src="img/plot_2_3.png">

<img src="img/plot_2_4.png">


As we can see with the plots, the MLP performs much better than the previous model.
For these example i use the activation function relu, what would happen with others activation functions ? Which is the best for this example ? We will see :


<img src="img/plot_3_1.PNG">

<img src="img/plot_3_2.PNG">

<img src="img/plot_3_3.PNG">

<img src="img/plot_3_4.PNG">


We can see very interesting results, elu, relu, selu and tanh activation function seems pretty similar. Otherwise the sigmoid function tends to work poorly on the earlys epochs but at the end started to surpass the others.


We can see by analysing the error on training and testinf that the model is overfitting the data except on sigmoid.
We have 3 ways to fight overfitting :
* reduce the model complexity
* add more data
* add regularization technics

Adding more data is not possible and between the two last choices i choose to add regularization technics.
let's see what happens if i add dropout.