<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Lab Session 2 - 3 Hours </h1>
<h1 style="text-align:center"> Convolutional Neural Network (CNN) for Handwritten Digits Recognition</h1>

<b> Student 1:</b> Collura  
<b> Student 2:</b> Spano
 
 
The aim of this session is to practice with Convolutional Neural Networks. Answers and experiments should be made by groups of one or two students. Each group should fill and run appropriate notebook cells. 


Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an pdf document using print as PDF (Ctrl+P). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by May 29th 2017.

Send you pdf file to benoit.huet@eurecom.fr and olfa.ben-ahmed@eurecom.fr using **[DeepLearning_lab2]** as Subject of your email.

# Introduction

In the last Lab Session, you built a Multilayer Perceptron for recognizing hand-written digits from the MNIST data-set. The best achieved accuracy on testing data was about 97%.  Can  you do better than these results using a deep CNN ?
In this Lab Session, you will build, train and optimize in TensorFlow one of the early Convolutional Neural Networks:  **LeNet-5** to go to  more than 99% of accuracy. 






# Load MNIST Data in TensorFlow
Run the cell above to load the MNIST data that comes  with TensorFlow. You will use this data in **Section 1** and **Section 2**.

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
X_train, y_train           = mnist.train.images, mnist.train.labels
X_validation, y_validation = mnist.validation.images, mnist.validation.labels
X_test, y_test             = mnist.test.images, mnist.test.labels

print("Image Shape: {}".format(X_train[0].shape))
print("Training Set:   {} samples".format(len(X_train)))
print("Validation Set: {} samples".format(len(X_validation)))
print("Test Set:       {} samples".format(len(X_test)))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Image Shape: (784,)
Training Set:   55000 samples
Validation Set: 5000 samples
Test Set:       10000 samples


# Section 1 : My First Model in TensorFlow

Before starting with CNN, let's train and test in TensorFlow the example :
$y=softmax(Wx+b)$ seen in the DeepLearing course last week. 

This model reaches an accuracy of about 92 %.
You will also learn how to launch the tensorBoard https://www.tensorflow.org/get_started/summaries_and_tensorboard to  visualize the computation graph, statistics and learning curves. 

<b> Part 1 </b> : Read carefully the code in the cell below. Run it to perform training. 

In [2]:
from __future__ import print_function
import tensorflow as tf

# STEP 1

# Parameters
learning_rate = 0.01
training_epochs = 100
batch_size = 128
display_step = 1
logs_path = 'log_files/'  # useful for tensorboard

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 784], name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')

# Set model weights
W = tf.Variable(tf.zeros([784, 10]), name='Weights')
b = tf.Variable(tf.zeros([10]), name='Bias')

# Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
with tf.name_scope('SGD'):
    # Gradient Descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()


# STEP 2 


# Launch the graph for training
with tf.Session() as sess:
    sess.run(init)
    # op to write logs to Tensorboard
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop), cost op (to get loss value)
            # and summary nodes
            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost))

    print("Optimization Finished!")

    # Test model
    # Calculate accuracy
    print("Accuracy:", acc.eval({x: mnist.test.images, y: mnist.test.labels}))

Epoch:  01   =====> Loss= 1.286880525
Epoch:  02   =====> Loss= 0.731891140
Epoch:  03   =====> Loss= 0.600213391
Epoch:  04   =====> Loss= 0.536838975
Epoch:  05   =====> Loss= 0.497630020
Epoch:  06   =====> Loss= 0.471007370
Epoch:  07   =====> Loss= 0.451256786
Epoch:  08   =====> Loss= 0.436003850
Epoch:  09   =====> Loss= 0.423365174
Epoch:  10   =====> Loss= 0.412768778
Epoch:  11   =====> Loss= 0.404437468
Epoch:  12   =====> Loss= 0.396879767
Epoch:  13   =====> Loss= 0.390305265
Epoch:  14   =====> Loss= 0.384293149
Epoch:  15   =====> Loss= 0.379190461
Epoch:  16   =====> Loss= 0.374597114
Epoch:  17   =====> Loss= 0.370546950
Epoch:  18   =====> Loss= 0.366352998
Epoch:  19   =====> Loss= 0.362873820
Epoch:  20   =====> Loss= 0.359487665
Epoch:  21   =====> Loss= 0.356536449
Epoch:  22   =====> Loss= 0.353737314
Epoch:  23   =====> Loss= 0.351218547
Epoch:  24   =====> Loss= 0.348216219
Epoch:  25   =====> Loss= 0.346612365
Epoch:  26   =====> Loss= 0.344329399
Epoch:  27  

<b> Part 2  </b>: Using Tensorboard, we can  now visualize the created graph, giving you an overview of your architecture and how all of the major components  are connected. You can also see and analyse the learning curves. 

To launch tensorBoard: 
- Go to the **TP2** folder, 
- Open a Terminal and run the command line **"tensorboard --logdir= log_files/"**, it will generate an http link ,ex http://666.6.6.6:6006,
- Copy this  link into your web browser 


Enjoy It !! 


# Section 2 : The 99% MNIST Challenge !

<b> Part 1 </b> : LeNet5 implementation

One you are now familar with **tensorFlow** and **tensorBoard**, you are in this section to build, train and test the baseline [LeNet-5](http://yann.lecun.com/exdb/lenet/)  model for the MNIST digits recognition problem.  

In more advanced step you will make some optimizations to get more than 99% of accuracy. The best model can get to over 99.7% accuracy! 

For more information, have a look at this list of results : http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html


<img src="lenet.png",width="800" height="600" align="center">
<center><span>Figure 1: Lenet 5 </span></center>





The LeNet architecture accepts a 32x32xC image as input, where C is the number of color channels. Since MNIST images are grayscale, C is 1 in this case.

--------------------------
**Layer 1: Convolutional.** The output shape should be 28x28x6 **Activation.** sigmoid **Pooling.** The output shape should be 14x14x6.

**Layer 2: Convolutional.** The output shape should be 10x10x16. **Activation.** sigmoid **Pooling.** The output shape should be 5x5x16.

**Flatten.** Flatten the output shape of the final pooling layer such that it's 1D instead of 3D.  You may need to use **flatten** `from tensorflow.contrib.layers import flatten`

**Layer 3: Fully Connected.** This should have 120 outputs. **Activation.** sigmoid

**Layer 4: Fully Connected.** This should have 84 outputs. **Activation.** sigmoid

**Layer 5: Fully Connected.** This should have 10 outputs. **Activation.** SoftMax

<b> Question 2.1.1 </b>  Implement the Neural Network architecture described above.
For that, your will use classes and functions from  https://www.tensorflow.org/api_docs/python/tf/nn. 

We give you some helper functions for weigths and bias initilization. Also you can refer to section 1. 


In [3]:
# Helper functions  for weigths and bias initilization 

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

In [67]:
from tensorflow.contrib.layers import flatten

def LeNet5_Model(data, keep_prob, activation=tf.nn.sigmoid):    
    conv_1 = tf.nn.conv2d(data, weight_variable([5, 5, 1, 6]), [1, 1, 1, 1], 'VALID', name='conv_1')
    conv_1 = tf.nn.dropout(activation(conv_1 + bias_variable([24, 24, 6]), name='activation_conv_1'),
                           keep_prob=keep_prob, name='dropout_conv_1')
    conv_1 = tf.nn.max_pool(conv_1, [1, 2, 2, 1], [1, 2, 2, 1], 'VALID')
        
    conv_2 = tf.nn.conv2d(conv_1, weight_variable([5, 5, 6, 16]), [1, 1, 1, 1], 'VALID', name='conv_2')
    conv_2 = tf.nn.dropout(activation(conv_2 + bias_variable([8, 8, 16]), name='activation_conv_2'),
                           keep_prob=keep_prob, name='dropout_conv_2')
    conv_2 = tf.nn.max_pool(conv_2, [1, 2, 2, 1], [1, 2, 2, 1], 'VALID')
    
    flat = flatten(conv_2)
    
    layer_3 = tf.nn.dropout(activation(tf.matmul(flat, weight_variable([256, 120])) + bias_variable([1, 120]),
                                       name='activation_layer_3'),
                            keep_prob=keep_prob, name='dropout_layer_3')
    layer_4 = tf.nn.dropout(activation(tf.matmul(layer_3, weight_variable([120, 84])) + bias_variable([1, 84]),
                                       name='activation_layer_4'),
                            keep_prob=keep_prob, name='dropout_layer_4')
    layer_5 = tf.matmul(layer_4, weight_variable([84, 10])) + bias_variable([1, 10])
    return tf.nn.softmax(layer_5, name='output_layer')

<b> Question 2.1.2. </b>  Calculate the number of parameters of this model 

 Your answer goes here in details 
 
 https://www.tensorflow.org/api_docs/python/tf/nn/conv2d
 https://www.tensorflow.org/api_docs/python/tf/nn/max_pool

<b> Question 2.1.3. </b>  Start the training with the parameters cited below:

     Learning rate = 0.1
     Loss Function: Cross entropy
     Optimisateur: SGD
     Number of training iterations = 10000
     The batch size = 128

In [69]:
# Training parameters 
learning_rate = 0.1
batch_size = 128
training_epochs = 50

<b> Question 2.1.4. </b>  Implement the evaluation function for accuracy computation 

In [15]:
def evaluate(model, y):
    # Accuracy
    acc = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))
    return acc

<b> Question 2.1.5. </b>  Implement training pipeline and run the training data through it to train the model.

- Before each epoch, shuffle the training set. 
- Print the loss per mini batch and the training/validation accuracy per epoch. (Display results every 100 epochs)
- Save the model after training
- Print after training the final testing accuracy 



In [70]:
import time

def train(learning_rate=0.1, optimizer_function=tf.train.GradientDescentOptimizer, batch_size=128, 
          training_epochs=50, drop_out=1.0, activation=tf.nn.sigmoid):
    tf.reset_default_graph()

    # tf Graph Input:  mnist data image of shape 28*28=784
    x = tf.placeholder(tf.float32, [None, 28, 28, 1], name='InputData')
    # 0-9 digits recognition,  10 classes
    y = tf.placeholder(tf.float32, [None, 10], name='LabelData')
    keep_prob = tf.placeholder(tf.float32)

    # Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
    with tf.name_scope('Model'):
        # Model
        model = LeNet5_Model(x, keep_prob, activation)
    with tf.name_scope('LeNet_Loss'):
        # Minimize error using cross entropy
        cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(model), reduction_indices=1))
    with tf.name_scope('SGD'):
        # Gradient Descent
        optimizer = optimizer_function(learning_rate).minimize(cost)
    with tf.name_scope('LeNet_Accuracy'):
        acc = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
        acc = tf.reduce_mean(tf.cast(acc, tf.float32))

    # Initializing the variables
    init = tf.global_variables_initializer()
    # Create a summary to monitor cost tensor
    tf.summary.scalar("LeNet_Loss", cost)
    # Create a summary to monitor accuracy tensor
    tf.summary.scalar("LeNet_Accuracy", acc)
    # Merge all summaries into a single op
    merged_summary_op = tf.summary.merge_all()
    
    losses = []
    
    # Initializing the session 
    print ("Start Training!")
    t0 = time.time()
    
    with tf.Session() as sess:
        sess.run(init)
        # op to write logs to Tensorboard
        summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
        # Training cycle
        for epoch in range(training_epochs):
            avg_cost = 0.
            total_batch = int(mnist.train.num_examples / batch_size)
            # Loop over all batches
            for i in range(total_batch):
                batch_xs, batch_ys = mnist.train.next_batch(batch_size)
                batch_xs = batch_xs.reshape([-1, 28, 28, 1])
                # Run optimization op (backprop), cost op (to get loss value)
                # and summary nodes
                _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                         feed_dict={x: batch_xs, y: batch_ys, keep_prob: drop_out})
                # Write logs at every iteration
                summary_writer.add_summary(summary, epoch * total_batch + i)
                # Compute average loss
                avg_cost += c / total_batch
                losses.append(avg_cost)
            # Display logs per epoch step
            if (epoch + 1) % display_step == 0:
                print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss =", "{:.9f}".format(avg_cost))

        acc = acc.eval({x: mnist.test.images.reshape([-1, 28, 28]), y: mnist.test.labels, drop_out: 1.0})
        print("Accuracy:", acc)

    t = time.time() - t0
    print ("Training Finished!")
    return model, losses, acc, t

In [71]:
train(learning_rate, tf.train.GradientDescentOptimizer, batch_size, training_epochs)

Start Training!
Epoch:  01   =====> Loss = 2.307026547
Epoch:  02   =====> Loss = 2.305450683
Epoch:  03   =====> Loss = 2.304581747
Epoch:  04   =====> Loss = 2.304813800
Epoch:  05   =====> Loss = 2.303856222
Epoch:  06   =====> Loss = 2.303084319
Epoch:  07   =====> Loss = 2.302399401
Epoch:  08   =====> Loss = 2.300197180
Epoch:  09   =====> Loss = 2.295746867
Epoch:  10   =====> Loss = 2.273548650
Epoch:  11   =====> Loss = 2.039176574
Epoch:  12   =====> Loss = 1.525231023
Epoch:  13   =====> Loss = 0.894500645
Epoch:  14   =====> Loss = 0.573350070
Epoch:  15   =====> Loss = 0.437685168
Epoch:  16   =====> Loss = 0.355271042
Epoch:  17   =====> Loss = 0.302755189
Epoch:  18   =====> Loss = 0.264507526
Epoch:  19   =====> Loss = 0.239131950
Epoch:  20   =====> Loss = 0.219421898
Epoch:  21   =====> Loss = 0.193814208
Epoch:  22   =====> Loss = 0.185264922
Epoch:  23   =====> Loss = 0.171963959
Epoch:  24   =====> Loss = 0.160759799
Epoch:  25   =====> Loss = 0.147750343
Epoch:  2

ValueError: Cannot feed value of shape (10000, 28, 28) for Tensor 'InputData:0', which has shape '(?, 28, 28, 1)'

<b> Question 2.1.6 </b> : Use tensorBoard to visualise and save the LeNet5 Graph and all learning curves. 
Save all obtained figures in the folder **"TP2/MNIST_99_Challenge_Figures"**

In [None]:
#  insert your obtained figure here 

In [None]:
# your answer goas here

<b> Part 2 </b> : LeNET 5 Optimization


<b> Question 2.2.1 </b>  Change the sigmoid function with a Relu :

- Retrain your network with SGD and AdamOptimizer and then fill the table above  :


| Optimizer            |  Gradient Descent         |AdamOptimizer |
| -------------        |: -------------: | ---------:   
| Validation Accuracy  |         |    |      
| Testing Accuracy     |           |    |       
| Training Time        |           |        |  |  


- Try with different learning rates for each Optimizer (0.0001 and 0.001 ) and different Batch sizes (50 and 128) for 20000 Epochs. 

- For each optimizer, plot (on the same curve) the **testing accuracies** function to **(learning rate, batch size)** 



- Did you reach the 99% accuracy ? What are the optimal parametres that gave you the best results? 








In [52]:
optimizers = [tf.train.GradientDescentOptimizer, tf.train.AdamOptimizer]
learning_rates = [0.1, 0.001, 0.0001]
batch_sizes = [128, 50]
epochs = 20000

results = []

for optimizer in optimizers:
    for learning_rate in learning_rates:
        for batch_size in batch_sizes:
            model, losses, accuracy, t = train(learning_rate, optimizer, batch_size, epochs, activation=tf.nn.relu)
            results.append((optimizer, learning_rate, batch_size, model, losses, accuracy, t))

Start Training!


TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("InputData:0", shape=(?, 28, 28, 1), dtype=float32) is not an element of this graph.

<b> Question 2.2.2 </b>  What about applying a dropout layer on the Fully conntected layer and then retraining the model with the best Optimizer and parameters(Learning rate and Batsh size) obtained in *Question 2.2.1*  ? (probability to keep units=0.75). For this stage ensure that the keep prob is set to 1.0 to evaluate the 
performance of the network including all nodes.

In [None]:

model, (losses, accuracy) = create_model(learning_rate, optimizer, batch_size, epochs, 0.75)

In [None]:
Your comments go here