<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Lab Session 2 - 1.5 Hours </h1>
<h1 style="text-align:center"> Convolutional Neural Network (CNN) for Handwritten Digits Recognition</h1>

<b> Group name:</b> # Paolo Moriello, Giuseppe Coccia
 
 
The aim of this session is to practice with Convolutional Neural Networks. Each group should fill and run appropriate notebook cells. 


Generate your final report (export as HTML) and upload it on the submission website http://bigfoot-m1.eurecom.fr/teachingsub/login (using your deeplearnXX/password). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed and submitted by May 30th 2018 (23:59:59 CET).

# Introduction

In the previous Lab Session, you built a Multilayer Perceptron for recognizing hand-written digits from the MNIST data-set. The best achieved accuracy on testing data was about 97%. Can you do better than these results using a deep CNN ?
In this Lab Session, you will build, train and optimize in TensorFlow one of the early Convolutional Neural Networks,  **LeNet-5**, to go to more than 99% of accuracy. 






# Load MNIST Data in TensorFlow
Run the cell below to load the MNIST data that comes with TensorFlow. You will use this data in **Section 1** and **Section 2**.

In [1]:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
from sklearn.utils import shuffle

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
X_train, y_train           = mnist.train.images, mnist.train.labels
X_validation, y_validation = mnist.validation.images, mnist.validation.labels
X_test, y_test             = mnist.test.images, mnist.test.labels
print("Image Shape: {}".format(X_train[0].shape))
print("Training Set:   {} samples".format(len(X_train)))
print("Validation Set: {} samples".format(len(X_validation)))
print("Test Set:       {} samples".format(len(X_test)))

epsilon = 1e-10 # this is a parameter you will use later

  from ._conv import register_converters as _register_converters


Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Image Shape: (784,)
Training Set:   55000 samples
Validation Set: 5000 samples
Test Set:       10000 samples


# Section 1 : My First Model in TensorFlow

Before starting with CNN, let's train and test in TensorFlow the example
**y=softmax(Wx+b)** seen in the first lab. 

This model reaches an accuracy of about 92 %.
You will also learn how to launch the TensorBoard https://www.tensorflow.org/get_started/summaries_and_tensorboard to visualize the computation graph, statistics and learning curves. 

<b> Part 1 </b> : Read carefully the code in the cell below. Run it to perform training. 

In [2]:
#STEP 1

# Parameters
learning_rate = 0.01
training_epochs = 40
batch_size = 128
display_step = 1
logs_path = 'log_files/'  # useful for tensorboard

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 784], name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')

# Set model weights
W = tf.Variable(tf.zeros([784, 10]), name='Weights')
b = tf.Variable(tf.zeros([10]), name='Bias')

# Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    # We use tf.clip_by_value to avoid having too low numbers in the log function
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
with tf.name_scope('SGD'):
    # Gradient Descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

#STEP 2 

# Launch the graph for training
with tf.Session() as sess:
    sess.run(init)
    # op to write logs to Tensorboard
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=(i==0))
            # Run optimization op (backprop), cost op (to get loss value)
            # and summary nodes
            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost))

    print("Optimization Finished!")
    summary_writer.flush()

    # Test model
    # Calculate accuracy
    print("Accuracy:", acc.eval({x: mnist.test.images, y: mnist.test.labels}))

Epoch:  01   =====> Loss= 1.288857968
Epoch:  02   =====> Loss= 0.732793764
Epoch:  03   =====> Loss= 0.600320643
Epoch:  04   =====> Loss= 0.536872040
Epoch:  05   =====> Loss= 0.497822908
Epoch:  06   =====> Loss= 0.471331935
Epoch:  07   =====> Loss= 0.451140417
Epoch:  08   =====> Loss= 0.436008948
Epoch:  09   =====> Loss= 0.423498562
Epoch:  10   =====> Loss= 0.412933110
Epoch:  11   =====> Loss= 0.404503482
Epoch:  12   =====> Loss= 0.396694267
Epoch:  13   =====> Loss= 0.390278287
Epoch:  14   =====> Loss= 0.384612805
Epoch:  15   =====> Loss= 0.379301537
Epoch:  16   =====> Loss= 0.374702650
Epoch:  17   =====> Loss= 0.370602037
Epoch:  18   =====> Loss= 0.366252979
Epoch:  19   =====> Loss= 0.362944725
Epoch:  20   =====> Loss= 0.359408229
Epoch:  21   =====> Loss= 0.356740769
Epoch:  22   =====> Loss= 0.353887012
Epoch:  23   =====> Loss= 0.351285579
Epoch:  24   =====> Loss= 0.348614941
Epoch:  25   =====> Loss= 0.346529221
Epoch:  26   =====> Loss= 0.344402499
Epoch:  27  

<b> Part 2  </b>: Using Tensorboard, we can  now visualize the created graph, giving you an overview of your architecture and how all of the major components  are connected. You can also see and analyse the learning curves. 

To launch tensorBoard: 
- Open a Terminal and run the command line **"tensorboard --logdir=lab_2/log_files/"**
- Click on "Tensorboard web interface" in Zoe  


Enjoy It !! 


# Section 2 : The 99% MNIST Challenge !

<b> Part 1 </b> : LeNet5 implementation

You are now more familar with **TensorFlow** and **TensorBoard**. In this section, you are to build, train and test the baseline [LeNet-5](http://yann.lecun.com/exdb/lenet/)  model for the MNIST digits recognition problem.  

Then, you will make some optimizations to get more than 99% of accuracy.

For more informations, have a look at this list of results: http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html


<img src="lenet.png",width="800" height="600" align="center">
<center><span>Figure 1: Lenet-5 </span></center>





The LeNet architecture takes a 28x28xC image as input, where C is the number of color channels. Since MNIST images are grayscale, C is 1 in this case.

--------------------------
**Layer 1 - Convolution (5x5):** The output shape should be 28x28x6. **Activation:** ReLU. **MaxPooling:** The output shape should be 14x14x6.

**Layer 2 - Convolution (5x5):** The output shape should be 10x10x16. **Activation:** ReLU. **MaxPooling:** The output shape should be 5x5x16.

**Flatten:** Flatten the output shape of the final pooling layer such that it's 1D instead of 3D.  You may need to use tf.reshape.

**Layer 3 - Fully Connected:** This should have 120 outputs. **Activation:** ReLU.

**Layer 4 - Fully Connected:** This should have 84 outputs. **Activation:** ReLU.

**Layer 5 - Fully Connected:** This should have 10 outputs. **Activation:** softmax.


<b> Question 2.1.1 </b>  Implement the Neural Network architecture described above.
For that, your will use classes and functions from  https://www.tensorflow.org/api_docs/python/tf/nn. 

We give you some helper functions for weigths and bias initilization. Also you can refer to section 1. 


In [3]:
# Functions for weigths and bias initilization 
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0., shape=shape)
  return tf.Variable(initial)

In [4]:
from tensorflow.contrib.layers import flatten

def LeNet5_Model(image):    
    # your inmplementation goes here

    # Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
    conv1_W = weight_variable(shape=(5, 5, 1, 6))
    conv1_b = bias_variable(shape = [6])
    conv1_output = tf.nn.conv2d(image, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
    # Activation.
    conv1_output = tf.nn.relu(conv1_output)
    # Pooling. Input = 28x28x6. Output = 14x14x6.
    conv1_output = tf.nn.max_pool(conv1_output, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    # Layer 2: Convolutional. Input = 14x14x6. Output = 10x10x16.
    conv2_W = weight_variable(shape=(5, 5, 6, 16)) 
    conv2_b = bias_variable(shape = [16])
    conv2_output   = tf.nn.conv2d(conv1_output, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
    # Activation.
    conv2_output = tf.nn.relu(conv2_output)
    # Pooling. Input = 10x10x16. Output = 5x5x16.
    conv2_output = tf.nn.max_pool(conv2_output, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    # Flatten. Input = 5x5x16. Output = 400.
    fc0_output = flatten(conv2_output)  #tf.reshape(conv2_output, [-1])   

    # Layer 3: Fully Connected. Input = 400. Output = 120.
    fc1_W = weight_variable(shape=(400, 120)) 
    fc1_b = bias_variable(shape = [120])
    fc1_output = tf.matmul(fc0_output, fc1_W) + fc1_b
    # Activation.
    fc1_output = tf.nn.relu(fc1_output)

    # Layer 4: Fully Connected. Input = 120. Output = 84.
    fc2_W = weight_variable(shape=(120, 84)) 
    fc2_b = bias_variable(shape = [84])
    fc2_output = tf.matmul(fc1_output, fc2_W) + fc2_b
    # Activation.
    fc2_output = tf.nn.relu(fc2_output)

    # Layer 5: Fully Connected. Input = 84. Output = 10.
    fc3_W = weight_variable(shape=(84, 10))
    fc3_b = bias_variable(shape = [10])
    fc3_output = tf.matmul(fc2_output, fc3_W) + fc3_b

    return fc3_output

<b> Question 2.1.2. </b>  Calculate the number of parameters of this model 

In [5]:
# conv l1
conv1 = 5*5*1*6 # filter_height * filter_width * channels_in * num_feature_maps
# conv l2
conv2 = 5*5*1*16 # filter_height * filter_width * channels_in * num_feature_maps
# fcl1
fcl1 = 5*5*16*120 # fcl_input_size * fcl_output_size
# fcl2
fcl2 = 84*120 # fcl_input_size * fcl_output_size
# fcl3
fcl3 = 84*10 # fcl_input_size * fcl_output_size
# biases
bias = 6+16+120+84+10

total = bias + fcl1 + fcl2 + fcl3 + conv2 + conv1
print(total)

59706


<b> Question 2.1.3. </b>  Define your model, its accuracy and the loss function according to the following parameters (you can look at Section 1 to see what is expected):

     Learning rate: 0.001
     Loss Fucntion: Cross-entropy
     Optimizer: tf.train.GradientDescentOptimizer
     Number of epochs: 40
     Batch size: 128

In [6]:
tf.reset_default_graph() # reset the default graph before defining a new model

# Parameters
learning_rate = 0.001
training_epochs = 40
batch_size = 128
logs_path = 'log_files/'
display_step = 10

# Model, loss function and accuracy

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, (None, 28, 28, 1), name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.int32, [None, 10], name='LabelData')

# Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = LeNet5_Model(x)
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=pred)
    cost = tf.reduce_mean(cross_entropy)
with tf.name_scope('SGD'):
    # Gradient Descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))

<b> Question 2.1.4. </b>  Implement the evaluation function for accuracy computation 

In [7]:
VALIDATION_SIZE = 128
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()

def evaluate(logits, labels):
    # logits will be the outputs of your model, labels will be one-hot vectors corresponding to the actual labels
    # logits and labels are numpy arrays
    # this function should return the accuracy of your model
    num_examples = len(logits)
    total_accuracy = 0
    sess = tf.get_default_session()
    
    for offset in range(0, num_examples, VALIDATION_SIZE):
        batch_x, batch_y = logits[offset:offset+VALIDATION_SIZE], labels[offset:offset+VALIDATION_SIZE]
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
        total_accuracy += (accuracy * len(batch_x))
        
    return total_accuracy / num_examples

<b> Question 2.1.5. </b>  Implement training pipeline and run the training data through it to train the model.

- Before each epoch, shuffle the training set. 
- Print the loss per mini batch and the training/validation accuracy per epoch. (Display results every 100 epochs)
- Save the model after training
- Print after training the final testing accuracy 



In [8]:
X_train = np.reshape(X_train, (55000, 28, 28, 1))
X_validation = np.reshape(X_validation, (5000, 28, 28, 1))
X_test = np.reshape(X_test, (10000, 28, 28, 1))

In [9]:
def train(init, sess, logs_path, n_epochs, batch_size, optimizer, cost, merged_summary_op):
    # optimizer and cost are the same kinds of objects as in Section 1
    # Train your model
    global X_train, y_train, X_validation, y_validation
    sess.run(init)
    # op to write logs to Tensorboard
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    # Training cycle
    for epoch in range(n_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size) if mnist.train.num_examples%batch_size == 0 else int(mnist.train.num_examples/batch_size)+1
        # Loop over all batches
        X_train, y_train = shuffle(X_train, y_train)
        for offset in range(0, len(X_train), batch_size):
            batch_xs, batch_ys = X_train[offset:offset+batch_size], y_train[offset:offset+batch_size]
            # Run optimization op (backprop), cost op (to get loss value)
            # and summary nodes
            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch: ", '%02d' % (epoch+1), "  ==> Loss:", "{:.9f}".format(avg_cost),
                  " ==> Training Accuracy:", evaluate(X_train, y_train),
                 " ==> Validation Accuracy:", evaluate(X_validation, y_validation))

    print("Optimization Finished!")
    summary_writer.flush()

In [10]:
import time

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNet-5_SGD", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNet-5_SGD", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()
    
with tf.Session() as sess:
    t0 = time.time()
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)
    t1 = time.time()
    
    # saving model
    saver.save(sess, './LeNet_SGD')
    print("Model saved")
    
    print("Training time:", t1-t0)
    
    # Test model
    # Print the accuracy on testing data
    print("Test Accuracy:", acc.eval({x: X_test, y: y_test}))

Epoch:  10   ==> Loss: 0.389845026  ==> Training Accuracy: 0.8919636363636364  ==> Validation Accuracy: 0.8954
Epoch:  20   ==> Loss: 0.244048110  ==> Training Accuracy: 0.9277818181731484  ==> Validation Accuracy: 0.9328
Epoch:  30   ==> Loss: 0.189173302  ==> Training Accuracy: 0.9438909090475602  ==> Validation Accuracy: 0.9492
Epoch:  40   ==> Loss: 0.155106402  ==> Training Accuracy: 0.953818181774833  ==> Validation Accuracy: 0.9578
Optimization Finished!
Model saved
Training time: 615.5931944847107
Test Accuracy: 0.9546


Here we have printed out the stats every 10 epochs, instead of every 100, because we have a total of 40 epochs, and it would not have been shown anything otherwise. As we can see, there is an improvement from epoch to epoch: the loss always decreases, and the training and validation accuracies always increase.

The final obtained accuracy on the test set with this first Lenet implementation is 95.46%, which is already an improvement with respect to the result above. This confirms the fact that the LeNet architecture perfectly fits the given case of study.

Now let's see what happens with a more optimized implementation.

<b> Question 2.1.6 </b> : Use TensorBoard to visualise and save loss and accuracy curves. 
You will save figures in the folder **"lab_2/MNIST_figures"** and display them in your notebook.

<img src="MNIST_figures/acc.png",width="800" height="600" align="center">

<img src="MNIST_figures/loss.png",width="800" height="600" align="center">

<b> Part 2 </b> : LeNET 5 Optimization


<b> Question 2.2.1 </b>

- Retrain your network with AdamOptimizer and then fill the table above:


| Optimizer            |  Gradient Descent  |    AdamOptimizer    |
|----------------------|--------------------|---------------------|
| Testing Accuracy     |       95,46%       |        99,29%       |       
| Training Time        |       615,6s       |        622,6s       |       

- Which optimizer gives the best accuracy on test data?

**Your answer:** ADAM. The Adam Optimizer is able to reach the increadible 99.3% of accuracy. The training time is more or less the same in both the cases.


In [11]:
tf.reset_default_graph()

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, (None, 28, 28, 1), name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.int32, [None, 10], name='LabelData')

# Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = LeNet5_Model(x)
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=pred)
    cost = tf.reduce_mean(cross_entropy)
with tf.name_scope('Adam'):
    # Gradient Descent
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))
    

correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNet-5_Adam", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNet-5_Adam", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

with tf.Session() as sess:
    t0 = time.time()
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)
    t1 = time.time()
    
    # saving model
    saver.save(sess, './LeNet_Adam')
    print("Model saved")
    
    print("Training time:", t1-t0)
    
    # Test model
    # Print the accuracy on testing data
    print("Test Accuracy:", acc.eval({x: X_test, y: y_test}))

Epoch:  10   ==> Loss: 0.018782492  ==> Training Accuracy: 0.9947272727446123  ==> Validation Accuracy: 0.9884
Epoch:  20   ==> Loss: 0.007352517  ==> Training Accuracy: 0.9988727272727272  ==> Validation Accuracy: 0.9918
Epoch:  30   ==> Loss: 0.005515630  ==> Training Accuracy: 0.9987636363636364  ==> Validation Accuracy: 0.9924
Epoch:  40   ==> Loss: 0.003077110  ==> Training Accuracy: 0.9996727272727273  ==> Validation Accuracy: 0.9906
Optimization Finished!
Model saved
Training time: 622.5887453556061
Test Accuracy: 0.9929


Here we have reached our goal: ~99,3% accuracy!

The Adam is different to classical stochastic gradient descent. As we know, Stochastic gradient descent maintains a single learning rate (termed alpha) for all weight updates and the learning rate does not change during training.

In this case, instead, a learning rate is maintained for each network weight (parameter) and separately adapted as learning unfolds: "the method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients".

So, thanks to the AdamOptimizer, we are able to reach the impressive 99% accuracy on the test set!

We can see the effectiveness of this application by looking also at the results obtained during the training. At the first print (epoch 10) we have already an accuracy and a loss which are way better than the ones obtained during the training using the SGD. The training accuracy is always over the 99% (at the end after 40 iterations it is 99.97% !!!), and the validation accuracy comes at that point right after the first 10 iterations.

<b> Question 2.2.2</b> Try to add dropout (keep_prob = 0.75) before the first fully connected layer. You will use tf.nn.dropout for that purpose. What accuracy do you achieve on testing data?

**Accuracy achieved on testing data:** 98,86%

Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass.

As a neural network learns, neuron weights settle into their context within the network. Weights of neurons are tuned for specific features providing some specialization. Neighboring neurons become to rely on this specialization, which if taken too far can result in a fragile model too specialized to the training data. This reliant on context for a neuron during training is referred to complex co-adaptations.

If neurons are randomly dropped out of the network during training, other neurons will have to step in and handle the representation required to make predictions for the missing neurons. This is believed to result in multiple independent internal representations being learned by the network.

The effect is that **the network becomes less sensitive to the specific weights of neurons**. This in turn results in a network that is capable of **better generalization** and is **less likely to overfit** the training data.

In [12]:
def LeNet5_Model_Dropout(image):    
    # Layer 1: Convolutional. Input = 28x28x1. Output = 28x28x6.
    conv1_W = weight_variable(shape=(5, 5, 1, 6))
    conv1_b = bias_variable(shape = [6])
    conv1_output = tf.nn.conv2d(image, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
    # Activation.
    conv1_output = tf.nn.relu(conv1_output)
    # Pooling. Input = 28x28x6. Output = 14x14x6.
    conv1_output = tf.nn.max_pool(conv1_output, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    # Layer 2: Convolutional. Input = 14x14x6. Output = 10x10x16.
    conv2_W = weight_variable(shape=(5, 5, 6, 16)) 
    conv2_b = bias_variable(shape = [16])
    conv2_output   = tf.nn.conv2d(conv1_output, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
    # Activation.
    conv2_output = tf.nn.relu(conv2_output)
    # Pooling. Input = 10x10x16. Output = 5x5x16.
    conv2_output = tf.nn.max_pool(conv2_output, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    # Flatten. Input = 5x5x16. Output = 400.
    fc0_output = flatten(conv2_output)  #tf.reshape(conv2_output, [-1])
    
    # adding dropout
    fc0_output = tf.nn.dropout(fc0_output, keep_prob=0.75)

    # Layer 3: Fully Connected. Input = 400. Output = 120.
    fc1_W = weight_variable(shape=(400, 120)) 
    fc1_b = bias_variable(shape = [120])
    fc1_output = tf.matmul(fc0_output, fc1_W) + fc1_b
    # Activation.
    fc1_output = tf.nn.relu(fc1_output)

    # Layer 4: Fully Connected. Input = 120. Output = 84.
    fc2_W = weight_variable(shape=(120, 84)) 
    fc2_b = bias_variable(shape = [84])
    fc2_output = tf.matmul(fc1_output, fc2_W) + fc2_b
    # Activation.
    fc2_output = tf.nn.relu(fc2_output)

    # Layer 5: Fully Connected. Input = 84. Output = 10.
    fc3_W = weight_variable(shape=(84, 10))
    fc3_b = bias_variable(shape = [10])
    fc3_output = tf.matmul(fc2_output, fc3_W) + fc3_b

    return fc3_output

In [13]:
tf.reset_default_graph()

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, (None, 28, 28, 1), name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.int32, [None, 10], name='LabelData')

# Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = LeNet5_Model_Dropout(x)
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=pred)
    cost = tf.reduce_mean(cross_entropy)
with tf.name_scope('Adam'):
    # Gradient Descent
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))
    
    
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNet-5_Adam_Dropout", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNet-5_Adam_Dropout", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

with tf.Session() as sess:
    t0 = time.time()
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)
    t1 = time.time()
    
    # saving model
    saver.save(sess, './LeNet_Adam_Dropout')
    print("Model saved")
    
    print("Training time:", t1-t0)
    
    # Test model
    # Print the accuracy on testing data
    print("Test Accuracy:", acc.eval({x: X_test, y: y_test}))

Epoch:  10   ==> Loss: 0.035443074  ==> Training Accuracy: 0.9878727272900668  ==> Validation Accuracy: 0.9826
Epoch:  20   ==> Loss: 0.019718802  ==> Training Accuracy: 0.9935636363636363  ==> Validation Accuracy: 0.9872
Epoch:  30   ==> Loss: 0.014319925  ==> Training Accuracy: 0.9962545454545455  ==> Validation Accuracy: 0.9888
Epoch:  40   ==> Loss: 0.011600548  ==> Training Accuracy: 0.9965272727619517  ==> Validation Accuracy: 0.9892
Optimization Finished!
Model saved
Training time: 625.9656348228455
Test Accuracy: 0.9886


Also in this case we have a very good result, with a final test accuracy of 0.988. Altought it is less than before, it is still very very good, an close to 99%.