## le dropout 

Le dropout est une méthode de régularisation qui consiste à mettre la sortie de certains neurones à zéro pendant l'entraînement de manière aléatoire.

On peut voir ça comme un entraînement avec handicap : si vous êtes capable de tirer à l'arc en fermant un oeil, vous serez peut être meilleur quand vous avez vos deux yeux. 

Keras propose une couche Dropout qui permet d'appliquer du dropout sur une couche au choix

Reprendre le réseau de l'exercice précédent avec 2 couches de convolutions et 2 couches denses. 

Ajouter une couche de dropout entre les deux couches convolutionnelles avec p=0.05 
Entraîner le réseau. 

Faire en sorte d'afficher l'accuracy sur le train et le test et la comparer à un réseau sans dropout

Where should we place Dropout layers ?
- https://stats.stackexchange.com/questions/240305/where-should-i-place-dropout-layers-in-a-neural-network/317313#317313
- not very well known for the moment but one of the most current practices is to put them in the Dense layers

In [None]:
from tensorflow.keras.datasets.cifar10 import load_data

data = load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [None]:
train, test = data

In [None]:
X_train, y_train = train
X_test, y_test = test

In [None]:
images_count, image_height, image_width, color_count = X_train.shape

In [None]:
X_train = X_train / 255
X_test = X_test / 255

In [None]:
import numpy as np
class_values = np.unique(data[0][1]) # or else len(set(y_train))
class_count = len(np.array(class_values))
class_count, class_values

(10, array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8))

In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Dropout

### Without dropout

In [None]:
bigger_model = Sequential()
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3]))
bigger_model.add(Flatten())
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
bigger_model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

In [None]:
bigger_model_history = bigger_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
bigger_model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f1265bae690>

### With dropout (p = 0.05)

In [None]:
bigger_model = Sequential()
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
bigger_model.add(Dropout(rate=0.05))
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3]))
bigger_model.add(Flatten())
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
bigger_model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

In [None]:
bigger_model_history = bigger_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
bigger_model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f11edbbaad0>

Mettre un dropout de 0.05 a l'air d'améliorer très légèrement les performances du modèle en termes de loss et d'accuracy

Entraîner à la suite plusieurs réseau en faisant croitre de manière progressive le dropout prendre p = 0.1, 0.2, 0.5, 0.8 


Que constatez vous sur l'évolution des performances du modèles sur le train et le test ? 

Qu'en déduisez vous sur le choix du dropout ?


### With dropout (p = 0.1)

In [None]:
bigger_model = Sequential()
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
bigger_model.add(Dropout(rate=0.1))
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3]))
bigger_model.add(Flatten())
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
bigger_model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

In [None]:
bigger_model_history = bigger_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
bigger_model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f11ed9d9290>

Le dropout de 0.1 améliore un peu plus les performances que le dropout de 0.05

### With dropout (p = 0.2)

In [None]:
bigger_model = Sequential()
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
bigger_model.add(Dropout(rate=0.2))
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3]))
bigger_model.add(Flatten())
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
bigger_model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

In [None]:
bigger_model_history = bigger_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
bigger_model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f11ed862810>

Le dropout de 0.2 a l'air de continuer d'augmenter les performances du modèle

### With dropout (p = 0.5)

In [None]:
bigger_model = Sequential()
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
bigger_model.add(Dropout(rate=0.5))
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3]))
bigger_model.add(Flatten())
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
bigger_model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

In [None]:
bigger_model_history = bigger_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
bigger_model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f11ed739990>

Le dropout de 0.5 commence par contre à dégrader les performances du modèle

### With dropout (p = 0.8)

In [None]:
bigger_model = Sequential()
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
bigger_model.add(Dropout(rate=0.8))
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3]))
bigger_model.add(Flatten())
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
bigger_model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

In [None]:
bigger_model_history = bigger_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
bigger_model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f11ec26d950>

Le dropout de 0.8 continue de dégrader les performances du modèle (ce qui semble peut-être logique si on drop 80% des neurones de convolution)

Un dropout à 1 va totalement shut down le modèle vu que la couche précédente est complètement déconnectée de la couche suivante, cependant on peut expérimenter et trouver le dropout maximal à partir duquel on aura amélioré les performances du modèle au maximum en limitant au plus l'overfitting.

Bonus : remettre p=0.05 pour le dropout de la couche convolutionnelle et ajouter également du dropout dans les couches dense. Mettez des valeurs un peu plus importantes que dans la couche convolutionnelle.

Quels sont les résultats ?

### With dropout (p = 0.05)

In [None]:
bigger_model = Sequential()
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
bigger_model.add(Dropout(rate=0.05))
bigger_model.add(Conv2D(32, activation='relu', kernel_size=[3,3]))
bigger_model.add(Flatten())
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dropout(rate=0.15))
bigger_model.add(Dense(units=300, activation='relu'))
bigger_model.add(Dropout(rate=0.15))
bigger_model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
bigger_model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

In [None]:
bigger_model_history = bigger_model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
bigger_model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f11ed51de50>

Mettre un dropout de 0.05 a l'air d'améliorer très légèrement les performances du modèle en termes de loss et d'accuracy

Entraîner à la suite plusieurs réseau en faisant croitre de manière progressive le dropout prendre p = 0.1, 0.2, 0.5, 0.8 


Que constatez vous sur l'évolution des performances du modèles sur le train et le test ? 

Qu'en déduisez vous sur le choix du dropout ?
