# Getting started with Tensorflow - Creating summary metrics and checkpoints of a model


Applying Simple 2 layer feed forward Neural Network to predict digits in the [MNIST dataset](http://yann.lecun.com/exdb/mnist/)

The noteboook uses the mnist python implementation example to build a feed forward network with two simple layers and ReLU activation function, as the main purpose of it is to show how to generate & write summary metrics during model training

This notebook has been written based on the Tensorflow tutorials. Just for practice and demostration purposes

Author: [@santteegt](https://santteegt.github.io/)

### Dependencies

The following dependencies can be easily installed using the anaconda navigator

* Tensorflow 1.1
 * Tensorflow MNIST implementation example

In [10]:
import time
import os
from six.moves import xrange  # pylint: disable=redefined-builtin

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import mnist

### Model hyperparameter definition

In [2]:
BATCH_SIZE = 100
HIDDEN_1_UNITS = 128
HIDDEN_2_UNITS = 32
MAX_STEPS = 2000
LEARNING_RATE = 0.01

### Import & Download MNIST dataset

In [3]:
from tensorflow.examples.tutorials.mnist import input_data
# one_hot parameter is not passed as we want labels to be represented by their own digit representation
data_sets = input_data.read_data_sets('MNIST_data', False)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### Helper functions to create the NN model & evaluate the training process

In [4]:
def placeholder_inputs(batch_size):
    images_placeholder = tf.placeholder(tf.float32, shape=[batch_size, mnist.IMAGE_PIXELS])
    labels_placeholder = tf.placeholder(tf.int32, shape=[batch_size])
    return images_placeholder, labels_placeholder

In [8]:
def fill_feed_dict(data_set, images_pl, labels_pl):
    
    images_feed, labels_feed = data_set.next_batch(BATCH_SIZE,
                                                 False) # fake_data=False flag is used for unit-testing purposes
    feed_dict = {images_pl: images_feed, labels_pl: labels_feed}
    
    return feed_dict

In [15]:
def do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_set):
    # And run one epoch of eval.
    true_count = 0  # Counts the number of correct predictions.
    steps_per_epoch = data_set.num_examples // BATCH_SIZE
    num_examples = steps_per_epoch * BATCH_SIZE
    for step in xrange(steps_per_epoch):
        feed_dict = fill_feed_dict(data_set, images_placeholder, labels_placeholder)
        true_count += sess.run(eval_correct, feed_dict=feed_dict)
        
        precision = float(true_count) / num_examples # Using precision as an evaluation metric
        
        print('Step: %d  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' %
        (step, num_examples, true_count, precision))

## Building the model in Tensorflow

### Creating summary metrics and model checkpoints

In [16]:
# Tell TensorFlow that the model will be built into the default Graph.
with tf.Graph().as_default():
    images_placeholder, labels_placeholder = placeholder_inputs(BATCH_SIZE)
    
    # Build a Graph that computes predictions from the inference model.
    logits = mnist.inference(images_placeholder, HIDDEN_1_UNITS, HIDDEN_2_UNITS)
    
    # Add to the Graph the Ops for loss calculation.
    loss = mnist.loss(logits, labels_placeholder)
#     loss = loss(logits, labels_placeholder)
    
    # Add to the Graph the Ops that calculate and apply gradients.
    train_op = mnist.training(loss, LEARNING_RATE)
    
    # Add the Op to compare the logits to the labels during evaluation.
    eval_correct = mnist.evaluation(logits, labels_placeholder)
    
    # Build the summary Tensor based on the TF collection of Summaries.
    summary = tf.summary.merge_all()
    
    # Add the variable initializer Op.
    init = tf.global_variables_initializer()

    # Create a saver for writing training checkpoints.
    saver = tf.train.Saver()

    # Create a session for running Ops on the Graph.
    sess = tf.Session()

    # Instantiate a SummaryWriter to output summaries and the Graph.
    summary_writer = tf.summary.FileWriter('logs/', sess.graph)

    # And then after everything is built:

    # Run the Op to initialize the variables.
    sess.run(init)
    
    
    # Start the training loop.
    for step in xrange(MAX_STEPS):
        
        start_time = time.time()

        # Fill a feed dictionary with the actual set of images and labels
        # for this particular training step.
        feed_dict = fill_feed_dict(data_sets.train, images_placeholder, labels_placeholder)

        # Run one step of the model.  The return values are the activations
        # from the `train_op` (which is discarded) and the `loss` Op.  To
        # inspect the values of your Ops or variables, you may include them
        # in the list passed to sess.run() and the value tensors will be
        # returned in the tuple from the call.
        _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)

        duration = time.time() - start_time

        # Write the summaries and print an overview fairly often.
        if step % 100 == 0:
            # Print status to stdout.
            print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))
            # Update the events file.
            summary_str = sess.run(summary, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)
            summary_writer.flush()

        # Save a checkpoint and evaluate the model periodically.
        if (step + 1) % 1000 == 0 or (step + 1) == MAX_STEPS:
            checkpoint_file = os.path.join('logs/', 'model.ckpt')
            saver.save(sess, checkpoint_file, global_step=step)
            # Evaluate against the training set.
            print('Training Data Eval:')
            
            do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.train)
            
            # Evaluate against the validation set.
            print('Validation Data Eval:')
            do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.validation)
            
            # Evaluate against the test set.
            print('Test Data Eval:')
            do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.test)

Step 0: loss = 2.30 (0.025 sec)
Step 100: loss = 2.14 (0.003 sec)
Step 200: loss = 1.84 (0.005 sec)
Step 300: loss = 1.51 (0.003 sec)
Step 400: loss = 1.20 (0.240 sec)
Step 500: loss = 0.91 (0.002 sec)
Step 600: loss = 0.83 (0.005 sec)
Step 700: loss = 0.74 (0.002 sec)
Step 800: loss = 0.75 (0.005 sec)
Step 900: loss = 0.66 (0.002 sec)
Training Data Eval:
Step: 0  Num examples: 55000  Num correct: 90  Precision @ 1: 0.0016
Step: 1  Num examples: 55000  Num correct: 181  Precision @ 1: 0.0033
Step: 2  Num examples: 55000  Num correct: 266  Precision @ 1: 0.0048
Step: 3  Num examples: 55000  Num correct: 357  Precision @ 1: 0.0065
Step: 4  Num examples: 55000  Num correct: 449  Precision @ 1: 0.0082
Step: 5  Num examples: 55000  Num correct: 531  Precision @ 1: 0.0097
Step: 6  Num examples: 55000  Num correct: 613  Precision @ 1: 0.0111
Step: 7  Num examples: 55000  Num correct: 705  Precision @ 1: 0.0128
Step: 8  Num examples: 55000  Num correct: 792  Precision @ 1: 0.0144
Step: 9  Num 

## See experiment results using Tensorboard

Execute on a terminal the following command

- `$ tensorboard --logdir ${REPO_HOME}/tensorflow/logs`