# Dropout Regularization technique to avoid overfit and generalize model during training

- Simple and powerful regularization technique to __avoid overfitting__ and __generalize model__ during training is to apply <font color=red>Dropout</font>
- __Dropout__ is a technique where randomly selected neurons are ignored during training
- Dropped out neurons do not participate in forward pass and backward pass
- Dropout is applied only to training data
- It can be applied to __Input__ layers and / or __Hidden__ layers
- Applying dropout is believed to let model __represent patterns better__
- Applying dropout makes model __less sensitive to weights__

<font color=red>Tips</font>
- Use a dropout of 20% to 50% of neurons
- __Low dropout__ may have __low or no impact__
- __High dropout__ may let model __under learn__
- Suitable for complex, larger network
- Apply dropout on __input__ layers as well as __hidden__ layers
- Use a large learning rate with decay and a large momentum. Increase your learning rate by a factor of 10 to 100 and use a high momentum value of 0.9 or 0.99.
- Constrain the size of network weights. A large learning rate can result in very large network weights. Imposing a constraint on the size of network weights such as max-norm regularization with a size of 4 or 5 has been shown to improve results.

In [1]:
import pandas as pd
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.optimizers import SGD
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

Baseline performance __before applying Dropout__

In [2]:
dataframe = pd.read_csv('Data/sonar.all-data', header=None)
dataset = dataframe.values

X = dataset[:, 0:60].astype(float)
y = dataset[:, 60]

encoder = LabelEncoder()
encoder.fit(y)
encoded_y = encoder.transform(y)

def create_baseline():
    model = Sequential()
    model.add(Dense(60, input_dim=60, activation='relu'))
    model.add(Dense(30, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    sgd = SGD(lr=0.01, momentum=0.8)    
    model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])
    
    return model

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)
kfold = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 87.98% (3.86%)


Performance after applying __Dropout on Input layer__

In [3]:
from tensorflow.keras.layers import Dropout
from tensorflow.keras.constraints import max_norm

def create_baseline():
    model = Sequential()
    model.add(Dropout(0.2, input_shape=(60, )))
    model.add(Dense(60, activation='relu', kernel_constraint=max_norm(3)))
    model.add(Dense(30, activation='relu', kernel_constraint=max_norm(3)))
    model.add(Dense(1, activation='sigmoid'))
    
    sgd = SGD(lr=0.1, momentum=0.9)    
    model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])
    
    return model

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, epochs=300, batch_size=16, verbose=0)))

pipeline = Pipeline(estimators)
kfold = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print("Performance on applying dropout on input layer: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Performance on applying dropout on input layer: 85.05% (8.04%)


Notice <font color=red>drop on performance</font> after applying dropout on <font color=red>input layer</font>:
- Baseline:    88.50% (6.06%)
- Input Layer: 86.50% (9.11%)

Performance after applying __Dropout on Hidden layers__

In [4]:
from tensorflow.keras.layers import Dropout
from tensorflow.keras.constraints import max_norm

def create_baseline():
    model = Sequential()
    model.add(Dense(60, input_dim=60, activation='relu', kernel_constraint=max_norm(3)))
    model.add(Dropout(0.2))
    model.add(Dense(30, activation='relu', kernel_constraint=max_norm(3)))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='sigmoid'))
    
    sgd = SGD(lr=0.1, momentum=0.9)    
    model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])
    
    return model

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, epochs=300, batch_size=16, verbose=0)))

pipeline = Pipeline(estimators)
kfold = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True)

results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print("Performance on applying dropout on hidden layers: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Performance on applying dropout on hidden layers: 85.17% (6.11%)


Notice <font color=red>drop on performance</font> after applying dropout on input layer and <font color=red>hidden layers</font>:
- Baseline:    88.50% (6.06%)
- Input Layer: 86.50% (9.11%)
- Hidden Layers: 85.05% (6.33%)

Summary:
- We can see that for this problem and for the chosen network configuration that using __dropout__ in the hidden layers __did not lift performance__. In fact, performance was worse than the baseline. 
- It is possible that additional training epochs are required or that further tuning is required to the learning rate.