# See what's going on with `tf.Summary` and the `TensorBoard`

When you are training your model, you want want to keep track of how things are going, how the weights of your model are evolving, if gradients are exploding or vanishing (or, hopefully, none of these), which is the accuracy of your model, ecc.  
  
The naive way to do this would be to build a custom way to store, read and visualize such data but `TensorFlow` has a better solution: using `tf.Summary`, you can easily save them (as `protobufs`) and using the `TensorBoard` you can read and visualize them, together with the graph structure and many other features.  
  
First of all, of course...

In [1]:
import time
import tensorflow as tf

tf.reset_default_graph()
tf.set_random_seed(23)

# The Task
For this tutorial, we need a reference task. It is a super simple task so that we can focus on *how* we are doing things without paying too much attention to *what* we are doing. The task is the following: 
  
> given a vector of N=1024 components, label it with `1` if the sum of its component is `>=0`, otherwise label it with `0`.  
  
The model we are implementing is a two-layers MLP, where the first layer is [1024, 1024] with ReLU and the second is [1024, 1] with a sigmoid. We use cross-entropy as loss function and the Adam learning algorithm.  
  
First of all, we create a `global_step` variable, not trainable, which is used to keep the value of the global step of the training process. This is a common practice and it can be stored/accessed through the `tf.GraphKeys.GLOBAL_STEP` key. Then, we create the placeholders for the input data and the target data.

In [2]:
gs = tf.Variable(0, name='global_step', trainable=False, dtype=tf.int32)

with tf.variable_scope('Placeholder'):
    input = tf.placeholder(dtype=tf.float32, shape=[None, 1024])
    labels = tf.placeholder(dtype=tf.int32, shape=[None, 1])

Building the two hidden layers of the network, we build summaries for all the weights and biases, logits and activarions. We use the function `summarize()` that accepts as input a `Tensor` and attaches some summaries to it.

In [3]:
def summarize(var, scope='summaries'):
    with tf.name_scope(scope):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)

We build the first layer `Layer1` and add invoke the `summarize` function for all the tensors that have been created, one by one.

In [4]:
with tf.variable_scope('Layer1') as scope:
    w1 = tf.get_variable('w', shape=[1024, 1024])
    b1 = tf.get_variable('b', shape=[1024])
    z1 = tf.add(tf.matmul(input, w1), b1, name='potential')
    y1 = tf.nn.relu(z1, name='activation')
    
    summarize(w1, w1.op.name)
    summarize(b1, b1.op.name)
    summarize(z1, z1.op.name)
    summarize(y1, y1.op.name)

Now that what is going on is clearer, let's collect and add the variables in a smarter way, using the name scope and the `GLOBAL_VARIABLES` collection:

In [5]:
with tf.variable_scope('Layer2') as scope:
    w2 = tf.get_variable('w', shape=[1024, 1])
    b2 = tf.get_variable('b', shape=[1])
    logits = tf.add(tf.matmul(y1, w2), b2, name='logits')
    predicts = tf.cast(logits > 0, tf.int32)
    
    scope_vars = [v for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES) if v.name.startswith(scope.name)]
    for var in scope_vars:
        summarize(var, var.op.name)

We will track also the summaries for the loss and for the gradients -- but since the loss is a scalar value, we don't need to use the `summarize` function.

In [6]:
with tf.variable_scope('Loss'):
    losses = tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.cast(labels, tf.float32), logits=logits)
    loss_op = tf.reduce_mean(losses, name='loss')
    loss_summary_op = tf.summary.scalar(tensor=loss_op, name='loss_value')

with tf.variable_scope('BackProp'):
    adam = tf.train.GradientDescentOptimizer(0.1)
    trainable_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
    grads_and_vars = adam.compute_gradients(loss=loss_op, var_list=trainable_vars)
    train_op = adam.apply_gradients(grads_and_vars=grads_and_vars, name="train_op")
    for grad, _ in grads_and_vars:
        summarize(grad, grad.op.name)


We can also use the TF summaries to print evaluation metrics. In our case, we will use also attach a scalar summary to the accuracy that we measure.

In [7]:
with tf.variable_scope('Accuracy'):
    accuracy_op = tf.reduce_mean(tf.cast(tf.equal(predicts, labels), tf.float32), name='accuracy')
    accuracy_summary_op = tf.summary.scalar(tensor=accuracy_op, name='accuracy_value')

Before going to the actual training, let's inspect the `tf.GraphKeys.SUMMARIES` collection and check what's in there. We will find all the summaries we have been creating so far that are automatically added to the proper collection. Finally, we can merge all of them into a single op that will be computed at runtime.

In [8]:
for var in tf.get_collection(tf.GraphKeys.SUMMARIES):
    print var.name
print

summary_op = tf.summary.merge_all()
print('Summary Op: ' + summary_op)

Layer1/Layer1/w/mean:0
Layer1/Layer1/w/stddev_1:0
Layer1/Layer1/w/max:0
Layer1/Layer1/w/min:0
Layer1/Layer1/w/histogram:0
Layer1/Layer1/b/mean:0
Layer1/Layer1/b/stddev_1:0
Layer1/Layer1/b/max:0
Layer1/Layer1/b/min:0
Layer1/Layer1/b/histogram:0
Layer1/Layer1/potential/mean:0
Layer1/Layer1/potential/stddev_1:0
Layer1/Layer1/potential/max:0
Layer1/Layer1/potential/min:0
Layer1/Layer1/potential/histogram:0
Layer1/Layer1/activation/mean:0
Layer1/Layer1/activation/stddev_1:0
Layer1/Layer1/activation/max:0
Layer1/Layer1/activation/min:0
Layer1/Layer1/activation/histogram:0
Layer2/Layer2/w/mean:0
Layer2/Layer2/w/stddev_1:0
Layer2/Layer2/w/max:0
Layer2/Layer2/w/min:0
Layer2/Layer2/w/histogram:0
Layer2/Layer2/b/mean:0
Layer2/Layer2/b/stddev_1:0
Layer2/Layer2/b/max:0
Layer2/Layer2/b/min:0
Layer2/Layer2/b/histogram:0
Loss/loss_value:0
BackProp/BackProp/gradients/Layer1/MatMul_grad/tuple/control_dependency_1/mean:0
BackProp/BackProp/gradients/Layer1/MatMul_grad/tuple/control_dependency_1/stddev_1:0

We define a function `get_batch_tensors(batch_size=128)` that generates a batch of input data and output target labels og a given `batch_size`

In [9]:
def get_batch_tensors(batch_size=128):
    data = tf.random_normal([batch_size, 1024], mean=0, stddev=1)  # why 5000? It's a lot!
    labels = tf.cast(tf.reduce_sum(data, axis=1, keep_dims=True) > 0, tf.int32)
    return data, labels

To save the Graph structure and the summaries we attached to our variables, we have to create a `tf.summary.FileWriter` with a directory and a `tf.Graph` object.

In [10]:
import os
import shutil

LOGDIR = '/tmp/TF-102-04'
if os.path.isdir(LOGDIR):
    shutil.rmtree(LOGDIR)
os.mkdir(LOGDIR)

writer = tf.summary.FileWriter(logdir=LOGDIR, graph=tf.get_default_graph())

During the training, when we compute the summary op, we just need to add it to the `writer` indicating the global step, so that we can keep track of the evolution of the varlues we are monitoring.

In [11]:
STEPS = 100
EVERY_STEPS = 10

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    for i in range(STEPS):
        actual_input_tensor, actual_label_tensor = get_batch_tensors()
        actual_input, actual_labels = sess.run([actual_input_tensor, actual_label_tensor])
        
        feed_dict = {
            input: actual_input,
            labels: actual_labels
        }
        
        summary, loss, accuracy = sess.run(
            fetches=[summary_op, loss_op, accuracy_op],
            feed_dict=feed_dict)
        writer.add_summary(summary=summary, global_step=i)
        if i % EVERY_STEPS == 0:
            print('step %d: loss: %f, accuracy: %f' % (i, loss, accuracy))

step 0: loss: 0.812866, accuracy: 0.476562
step 10: loss: 0.809555, accuracy: 0.445312
step 20: loss: 0.819426, accuracy: 0.515625
step 30: loss: 0.787941, accuracy: 0.515625
step 40: loss: 0.784940, accuracy: 0.531250
step 50: loss: 0.742575, accuracy: 0.492188
step 60: loss: 0.823784, accuracy: 0.453125
step 70: loss: 0.808247, accuracy: 0.507812
step 80: loss: 0.854534, accuracy: 0.460938
step 90: loss: 0.752069, accuracy: 0.507812


Now we can run the `TensorBoard` on the log directory we are writing to

In [12]:
%%bash
python -m tensorflow.tensorboard --logdir=/tmp/TF-102-04

Process is interrupted.
