Title: Adding Dropout    
Slug: adding_dropout    
Summary: How to add dropout to a neural network model in Python.    
Date: 2017-09-25 12:00  
Category: Deep Learning - Keras  
Tags: Basics
Authors: Kabir Khan

# What is Dropout?

Dropout is a state-of-the-art regularization technique for deep neural networks. It is a mask over a particular layer of a network to remove certain nodes and their connections from a training pass.

### **e.g.** 

If you apply dropout with a probability of 0.2 to a layer, then for each node in that layer there is 0.2 chance of removing it from training. 

For each iteration, a different subset of your network is trained creating an ensemble effect. This technique is shown to reduce overfitting and is used in most network architectures today.

<img src="http://cs231n.github.io/assets/nn2/dropout.jpeg">
> Image Source: http://cs231n.github.io/neural-networks-2/#reg

## Packages

In [1]:
# Load libraries
import numpy as np
from keras.datasets import imdb
from keras.preprocessing.text import Tokenizer
from keras import models
from keras import layers

# Set random seed
np.random.seed(0)

Using TensorFlow backend.


## Load IMDB Movie Review Data

In [2]:
# Set the number of features we want
number_of_features = 1000

# Load data and target vector from movie review data
(train_data, train_target), (test_data, test_target) = imdb.load_data(num_words=number_of_features)

# Convert movie review data to a one-hot encoded feature matrix
tokenizer = Tokenizer(num_words=number_of_features)
train_features = tokenizer.sequences_to_matrix(train_data, mode='binary')
test_features = tokenizer.sequences_to_matrix(test_data, mode='binary')

Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz


## Functions to train and evaluate model

In [19]:
def train_and_evaluate(model):
    # Compile neural network
    model.compile(loss='binary_crossentropy', # Cross-entropy
                optimizer='adam', # Adam Optimizer
                metrics=['accuracy']) # Accuracy performance metric
    
    # Train neural network
    history = model.fit(train_features, # Features
                      train_target, # Target vector
                      epochs=10, # Number of epochs
                      verbose=1, # No output
                      batch_size=100, # Number of observations per batch
                      validation_data=(test_features, test_target)) # Data for evaluation
    
    return history

## Neural Network Architecture

We'll start with a standard network architecture that doesn't use Dropout

In [20]:
# Start neural network
network = models.Sequential()

# Add fully connected layer with a ReLU activation function for input layer
network.add(layers.Dense(units=16, activation='relu', input_shape=(number_of_features,)))

# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=16, activation='relu'))

# Add fully connected layer with a sigmoid activation function
network.add(layers.Dense(units=1, activation='sigmoid'))

In [21]:
train_and_evaluate(network)

Train on 25000 samples, validate on 25000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x11643df98>

## Neural Network Architecture With Dropout

In Keras, we can implement dropout using the convenient `Dropout` layer into our network architecture. Each `Dropout` layer will drop a user-defined hyperparameter of units in the previous layer every batch. 

Remember in Keras the input layer is assumed to be the first layer and not added using the `add` method. Therefore, if we want to add dropout to the input layer, the layer we add in our model is a dropout layer. This layer contains both the proportion of the input layer's units to drop `0.2` and `input_shape` defining the shape of the observation data. We add `Dropout` layers with `0.5` to each subsequent layer.

In [22]:
# Start neural network with dropout
network_dropout = models.Sequential()

# Add a dropout layer for input layer
network_dropout.add(layers.Dropout(0.2, input_shape=(number_of_features,)))

# Add fully connected layer with a ReLU activation function
network_dropout.add(layers.Dense(units=16, activation='relu'))

# Add a dropout layer for previous hidden layer
network_dropout.add(layers.Dropout(0.5))

# Add fully connected layer with a ReLU activation function
network_dropout.add(layers.Dense(units=16, activation='relu'))

# Add a dropout layer for previous hidden layer
network_dropout.add(layers.Dropout(0.5))

# Add fully connected layer with a sigmoid activation function
network_dropout.add(layers.Dense(units=1, activation='sigmoid'))

In [23]:
train_and_evaluate(network_dropout)

Train on 25000 samples, validate on 25000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1170d7fd0>

## Results

We can see that without dropout we'll get a much higher training accuracy over 10 epochs. However, our validation accuracy starts to decline. With dropout, our network's training accuracy regularizes and we have a consistently higher validation accuracy.

**Validation Accuracy Comparison**:
    <table>
        <tr>
            <td></td>
            <th> **Network** </th>
            <th> **Network With Dropout** </th>
        </tr>
        <tr>
            <td> **User stayed in the course** </td>
            <td> 84.5% </td>
            <td> 86.04% </td>
        </tr>
    </table>