# Efficiency of dropout

In this notebook, I will implement three NN in Keras, one not using dropout and other two using dropout (with different proportion)

* Environment: Google colab, Keras
* Dataset used: MNIST digit
* Last updated: 2021-11-08


In [1]:
#import libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout

In [2]:
#import dataset
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [3]:
#one-hot encoding
y_train = keras.utils.to_categorical(y_train, num_classes=10)
y_test = keras.utils.to_categorical(y_test, num_classes=10)

In [4]:
#normalize data (0~1)
X_train = X_train / 255.0
X_test = X_test / 255.0

#reshape data
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)

In [5]:
#create models


#model1 = no dropout
model1 = Sequential([
  Dense(units=128, input_shape=(784,), activation='relu'),
  Dense(units=64, activation='relu'),
  Dense(units=10, activation='softmax')
])


#compile model

model1.summary()
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
dense_1 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
_________________________________________________________________


In [6]:
#create models


#model2 = dropout prportion 0.2
model2 = Sequential([
  Dense(units=128, input_shape=(784,), activation='relu'),
  Dropout(0.2),
  Dense(units=64, activation='relu'),
  Dropout(0.2),
  Dense(units=10, activation='softmax')
])


#compile model

model2.summary()
model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 128)               100480    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 64)                8256      
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                650       
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
_________________________________________________________________


In [7]:
#create models


#model3 = dropout prportion 0.5
model3 = Sequential([
  Dense(units=128, input_shape=(784,), activation='relu'),
  Dropout(0.5),
  Dense(units=64, activation='relu'),
  Dropout(0.5),
  Dense(units=10, activation='softmax')
])


#compile model

model3.summary()
model3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 128)               100480    
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 64)                8256      
_________________________________________________________________
dropout_3 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_8 (Dense)              (None, 10)                650       
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
_________________________________________________________________


In [8]:
#train each model

model1.fit(x=X_train, y=y_train, batch_size=64, epochs=10, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fabfc2bb750>

In [9]:
model2.fit(x=X_train, y=y_train, batch_size=64, epochs=10, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fabf7aab6d0>

In [10]:
model3.fit(x=X_train, y=y_train, batch_size=64, epochs=10, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fabfc12d510>

In [12]:
#evaluate models
test_loss1, test_acc1 = model1.evaluate(x=X_test, y=y_test)
print('Test loss: {}, Test accuracy: {}'.format(test_loss1,test_acc1))
test_loss2, test_acc2 = model2.evaluate(x=X_test, y=y_test)
print('Test loss: {}, Test accuracy: {}'.format(test_loss2,test_acc2))
test_loss3, test_acc3 = model3.evaluate(x=X_test, y=y_test)
print('Test loss: {}, Test accuracy: {}'.format(test_loss3,test_acc3))

Test loss: 0.0820424035191536, Test accuracy: 0.9771000146865845
Test loss: 0.0742267295718193, Test accuracy: 0.978600025177002
Test loss: 0.10025808960199356, Test accuracy: 0.9717000126838684


Overall, three models didn't show big difference both on test accuracy and loss. However, model with suitable proportion of dropout got the highest accuracy and lowest loss. Also, too high proportion dropout can make model even worse, as shown from model 3.