# Techforum : Deep Learning (part 1/3)

## Simple Softmax regression with Tensorflow

Objective:
- Discover Tensorflow
    - Classify handwritten digits 0-9 (MNIST dataset)
    - Use a simple neural Network (Softmax regression)
- Introduce Tensorboard

Note : this toy-example is ok to run on a CPU laptop :)
    
Next : Using High Level API (Keras) to do the same job in less lines of code

Notebook inspired by :https://www.tensorflow.org/get_started/mnist/beginners

### About this example :

#### MNIST
The MNIST database of handwritten digits, available from  http://yann.lecun.com/exdb/mnist/, has a training set of 60,000 examples, and a test set of 10,000 examples
 <img  src="assets/mnist_digits.png" width="400px"> 
 
#### Dataset of images 
Each images has a size of 28x28 pixels and has already been pre-processed (image centered, gery scale, normalization...)

 <img src="assets/MNIST-Matrix.png" width="600px">
 
In the MNIST dataset, the images have been reshaped in to a vector of size [1, 784] so that all the images fits into a matrix [55000, 784] (training set)

<img src="assets/mnist-train-xs.png" width="300px">
 
The target labels (supervised learning) have been "one-hot" encoded, thus it's a vector of [1, 10] for one image, [55000, 10] (training set)

<img src="assets/mnist-train-ys.png" width="300px">

#### Model 
In this notebook we will implement a simple Softmax Regression model to classified the handwritten images in 10 classes, corresponfing to the 0-9 digits. 

The softmax functionis a generalization of the logistic function that "squashes" a K-dimensional vector of arbitrary real values to a K-dimensional vecto of real values in the range (0, 1] that add up to 1.

 <img  src="assets/softmax-regression-scalargraph.png" width="400px">
 
 <img  src="assets/network_diagram.png" width="400px">


### Import Python libraries

In [1]:
import tensorflow as tf

import os
import timeit

# Use Tensorflow tutorial's helper to load/prepare the MNIST dataset
from tensorflow.examples.tutorials.mnist import input_data

### Load the MNIST dataset

In [2]:
# Import data (Thanks to helpers provided in Tensorflow tutorials !)
# Data are already pre-processed and ready to use
# All the more the training output is alredy "one-hot" encoded
mnist = input_data.read_data_sets('./', one_hot=True)

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


The result is that mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.

In [3]:
# Not important : Just a counter to separate logs directory between each training experiments
experiments = 1

#### If you want to perform another experiment, restart from here in Jupyter

In [4]:
# Reset the Tensorflow graph (allows multiple experiments in Jupyter)
tf.reset_default_graph()

### Define some Hyperparameters for the network

In [5]:
# How fast the network will learn, by making more or less small updates during training
#    too low, and the network will take too much time to learn
#    too high, and the network might never converge to a solution
learning_rate = 0.5 

# Number of training epoch (global iterations for the training)
epoch = 3000

# Number of images to process per batch iteration
batch_size = 100

# Path to home of the Tensorboard logs and Training Checkpoints
logs_path = "./logs/mnist/softmaxReg" 



### Define variables

In Tensorflow there are 3 types of variables (Tensors which are n-dimensional arrays):
- tf.placeholder : which are the entry point to feed  the data set
- tf.variable : which value can be updated during the execution
- tf.constant : self explanatory !

***keep in mind*** : To access the value of a variable, you must evaluate the variable within a session (sess.run(xxx) or xxx.eval() ), otherwise you will just get a tensor object 

In [6]:
# Specifying a name_scope is not necessary, but it allows to display the Tensorflow graph nicely in Tensorboard
with tf.name_scope('feed-dict'):
    
    # Create a Tensor to feed the input data  (images)
    #   A image is 28x28 pixels, but it is store in row thus 1 x 784 pixels  
    #   The number of images that will be fed when the session will be executed
    #   is not known when buiding the graph of operations : thus 'None'  for the numbers of rows
    inputs = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='inputs')

    # Create a Tensor to feed the real output values that the network will be trained on
    targets = tf.placeholder(dtype= tf.float32, shape=[None, 10], name='targets')


In [7]:
# Create some Tensor for the weights  and bias parameters of the Neural Network
# Here we initialize these tensors with zero values

#with tf.name_scope('weights'):
Weights = tf.Variable(tf.zeros(shape=[784, 10]), name='weights')

#with tf.name_scope('biases'):    
biases = tf.Variable(tf.zeros(shape = [10]), name='biases')

In [8]:
# Optional : Allow to inspect 3 images of the dataset in Tensorboard
with tf.name_scope('Tensorboard-img'):    

    inputs_image = tf.reshape(inputs, [-1, 28, 28, 1])
    tf.summary.image('input', inputs_image, 3)

### Build and Evaluate the accuracy of the model

In [9]:
with tf.name_scope('predictions'):  
    predictions = tf.matmul(inputs, Weights) + biases


In machine learning we typically define **what it means for a model to be bad. We call this the cost, or the loss** 
We try to minimize that error, and the smaller the error margin, the better our model is.
One very common, very **nice function to determine the loss of a model is called "cross-entropy"** 

In [10]:

with tf.name_scope('cross-entropy'):
    # Use cross-entropy to determine the loss/cost (error of our model)
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=targets,
                                                                           logits=predictions))

    # Tensorboard : summary.scalar  allows to collect summary information for this 
    # node so that its evolution during the training  can be plot as a graph in Tensorboard
    tf.summary.scalar("loss", cross_entropy)

Evaluate the training **accuracy of the model** 

In [11]:
                                                                       
with tf.name_scope('accuracy'): 
    
    # Argmax returns the index with the largest value across axes of a tensor.
    correct_prediction = tf.equal(tf.argmax(predictions, 1), tf.argmax(targets, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    
    # Tensorboard : summary.scalar  allows to collect summary information for this 
    # node so that its evolution during the training  can be plot as a graph in Tensorboard
    tf.summary.scalar("accuracy", accuracy)

### Define the Training Optimizer

Because TensorFlow knows the entire graph of the computations, it can **automatically use the backpropagation algorithm** 
to efficiently determine how the weight variables affect the loss you ask it to minimize. Then it can apply a choisen  
optimization algorithm to modify the variables and reduce the loss.

In [12]:

with tf.name_scope('train'): 
    train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy) 

### Train the model

In [13]:
# Open a Session in order to execute the computaions
# on the graph of operations defined above
with tf.Session() as sess:
    
    
    #### Prepare to write Summaries for Tensorboard (if you want to use Tensorboard)
    
    # create 2 log writers to logs the summaries for Tensorflow at training and at test steps
    # that's a tricky way (not well documented) to get both training and test graphs in the 
    # same chart in Tensorboard
    train_writer = tf.summary.FileWriter(str(logs_path + "-train-" +str(experiments)))
    test_writer = tf.summary.FileWriter(str(logs_path + "-test-" +str(experiments)))  
    
    # Merge all summaries together so it's easier to manl_acc: 0.9896age
    summary_op = tf.summary.merge_all()
    
    # Allow to display the graph of computations in Tensorboard
    train_writer.add_graph(sess.graph)
    test_writer.add_graph(sess.graph)
    
    # Will allow to regularly save checkpoint (save/reload teh model, ...)
    saver = tf.train.Saver()
    
     
    
    #####  Before Starting, Always initialize the variables defined in the computation graph
    init = tf.global_variables_initializer()
    sess.run(init)
    
    # Monitor execution time
    start_time = timeit.default_timer()
    
    
    ##### Train the model
    for iteration in range(epoch):
        
        #### Get a batch of images ow which to train
    
        # We use the Tensorflow MNIST tutorial helpers to get a 
        # batch of images on which to perform the training iteration
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Define a variable for convenience to store the Feed dictionary for training
        train_feed_dict = { inputs: batch_xs, targets: batch_ys }    

        
        #### This line actually performs the Training (by evaluating train_op, which is the optimizer in the graph)
        sess.run(train_op, feed_dict= train_feed_dict)
        
        
        #### Every 100 iterations, collect the Summary logs for Tensorboard and 
        #### evaluate the Accuracy of the current model against a TEST dataset (instead of the Training dataset)
        #### and print progress info  
    
        if iteration % 100 == 0:
            [train_accuracy, summary] = sess.run([accuracy, summary_op], feed_dict = train_feed_dict)
            train_writer.add_summary(summary, iteration)
                 
            # Now evaluate the Test Accuracy using the test dataset Test
            [test_accuracy, summary] = sess.run([accuracy, summary_op], feed_dict = {inputs: mnist.test.images, targets: mnist.test.labels})
            test_writer.add_summary(summary, iteration)
            
            print("iteration: ", iteration, "  training accuracy=", train_accuracy, "  test accuracy=", test_accuracy)
            
    
    
      #### Every 1000 iterations write a checkpoint for our model
        if iteration % 1000 == 0:
            saver.save(sess, os.path.join(str(logs_path +"-train-" + str(experiments)), "model.ckpt"), iteration)
        

   
    #### Training is done !
    print("Execution time= %4f sec" % (timeit.default_timer() - start_time)) 
          
    
    # Not important : increment our counter to avoid mixing up with our logs between experiments in Jupyter
    experiments +=1

    #### Gently close the opened writers and sesion
    train_writer.close()
    test_writer.close()
    sess.close()
            

iteration:  0   training accuracy= 0.57   test accuracy= 0.4075
iteration:  100   training accuracy= 0.96   test accuracy= 0.8948
iteration:  200   training accuracy= 0.97   test accuracy= 0.9031
iteration:  300   training accuracy= 0.97   test accuracy= 0.9074
iteration:  400   training accuracy= 0.88   test accuracy= 0.9037
iteration:  500   training accuracy= 0.92   test accuracy= 0.9125
iteration:  600   training accuracy= 0.93   test accuracy= 0.917
iteration:  700   training accuracy= 0.91   test accuracy= 0.9149
iteration:  800   training accuracy= 0.95   test accuracy= 0.9138
iteration:  900   training accuracy= 0.96   test accuracy= 0.915
iteration:  1000   training accuracy= 0.95   test accuracy= 0.9157
iteration:  1100   training accuracy= 0.94   test accuracy= 0.9196
iteration:  1200   training accuracy= 0.93   test accuracy= 0.9194
iteration:  1300   training accuracy= 0.95   test accuracy= 0.9174
iteration:  1400   training accuracy= 0.96   test accuracy= 0.9211
iteration

### Visualize the training summaries in Tensorboard

- in a terminal launch
```
tensorboard --logdir ./logs
``` 
- open tensorboard in a browser : http://localhost:6006/

Now have a look at :
    - the image section : you can use this section to visualize some images used in this experiment
    - the graphs : you can visualize the Tensorflow graph (usefull to look for errors)
    - the Scalar section : you can visualize  here the accuracy and loss (cross-entropy) and see how well or not your model is performing : 
        - Does it learn ? Does it learn fast ?
        - Does it Underfit ? Does it Overfit ?
        - Is it an accurate model ?
        - How do several models/hyper-parameters variations compare to each others ?
        - And much more !
