Will show how to track and evaluate model training progress in real time. And will learn how to use Tensorflow's logging capabilities and the Monitor API to audit the in-progress training of a neural network classifier for categorizing irises.
tf.contrib.learn offers a Monitor API designed to help us log metrics and evaluate our model while training is in progress. So we'll learn how to enable logging in Tensorflow, setup a ValidationMonitor to do streaming evaluations, and visualize our metrics using Tensorboard.

In [4]:
#Import 
import os
import urllib

import numpy as np
import tensorflow as tf

#Enable logging in tf. It uses five different levels for log messages. In order of ascending sevirity, they're
#DEBUG, INFO, WARN, ERROR and FATAL.
#By default, it is configured at a logging level of WARN.
tf.logging.set_verbosity(tf.logging.INFO)

#Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

def main():
    #If datasets aren't stored locally, download them.
    if not os.path.exists(IRIS_TRAINING):
        raw = urllib.urlopen(IRIS_TRAINING_URL).read()
        with open(IRIS_TRAINING, "w") as f:
            f.write(raw)
    if not os.path.exists(IRIS_TEST):
        raw = urllib.urlopen(IRIS_TEST_URL).read()
        with open(IRIS_TEST, "w") as f:
            f.write(raw)
    
    #Load datasets.
    #dataset in tf.contrib.learn are named tuples; we can access feature data and target values via the data and
    #target fields.
    training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
                filename = IRIS_TRAINING,
                target_dtype = np.int,
                features_dtype = np.float32)
    test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
                filename = IRIS_TEST,
                target_dtype = np.int,
                features_dtype = np.float32)
    
    #Specify that all features have real-value data
    #feature_columns defines the model's feature columns, which specify the data type for the features in the 
    #data set.
    feature_columns = [tf.contrib.layers.real_valued_column("", dimension = 4)]
    
    #Build 3 layers DNN with 10, 20, 10 units respectively.
    #tf.contrib.learn offers a variety of predefined models, called Estimators, which we can use "out of the box"
    # to run training and evaluaion operations on our data.
    #model_dir here means the path to store checkpoint data during model training.
    #As coming ValidationMonitor rely on saved checkpoints to perform evaluation operations, so we'll want to
    #add 'save_checkpoints_sec' into a new tf.contrib.learn.RunConfig.
    classifier = tf.contrib.learn.DNNClassifier(feature_columns = feature_columns,
                                                hidden_units = [10, 20, 10],
                                                n_classes = 3,
                                                model_dir = "/localdisk/tmp/iris_model",
                                                config = tf.contrib.learn.RunConfig(save_checkpoints_secs=1))
    
    #Define the training inputs
    def get_train_inputs():
        x = tf.constant(training_set.data)
        y = tf.constant(training_set.target)
        return x,y
    
        
    #tf.contrib.learn provides several high-level Monitors for us to attach to our fit operations to further
    #track metrics and / or debug lower-level Tensorflow operations during model training, including:
    #CaptureVariable: Saved a specified variable's values into a collection at every n steps of training;
    #PrintTensor: Logs a specified tensor's values at every n steps of training;
    #SummarySaver: Saves tf.Summary protocol buffers for a given tensor using a tf.summary.FileWriter at every
    #n steps of training;
    
    #ValidationMonitor:Logs a specified set of evaluation metrics at every n steps of training, and if desired,
    #implements a early stopping under certain conditions.
    #While logging training loss, also simultaneously evaluate against test data to see how well the model is
    #generalizing. We can accomplish this by configuring a ValidationMonitor with the test data(test_set.data and
    #test_set.target), and setting how often to evaluate with every_n_steps. The default value of every_n_steps 
    #is 100; here, set every_n_steps to 50.
    
    #By default if no evaluation metrics are specified, ValidationMonitor will log both loss and accuracy, but we
    #can customize the list of metrics that will be run every 50 steps. To specify the exact metrics, we can add
    #a metrics param to the ValidationMonitor constructor. metrics takes a dict of key/value pairs, where each key
    #is the name we'd like logged for the metric, and the corresponding value is a MetricSpec object.
    #The MetricSpec constructor accepts four parameters:
    #a. metric_fn: The function that calculates and returns the value of a metric, can be predefined function 
    ##available in the tf.contrib.metrics, such as 'tf.contrib.metrics.streaming_precision' or 'tf.contrib.metrics.
    ##streaming_recall';we can also define our own custom metric function, which must take predictions and labels
    ##tensors as arguments. The function must return the value of the metric in one of two formats: a single tensor;
    ##or a pair of ops(value_op, update_op), where value_op returns the metric value and update_op performs a 
    ##corresponding operation to update internal model state.
    #b. prediction_key: the key of the tensor containing the predictions returned by the model. 
    #c. label_key: the key of the tensor containing the labels returned by the model, as specified by the model's
    ##input_fn.
    #d. weights_key: the key of the tensor (returned by the input_fn) containing weights inputs for the metric_fn.
    validation_metrics = {
        "accuracy":
        tf.contrib.learn.MetricSpec(
            tf.contrib.metrics.streaming_accuracy,
            prediction_key=tf.contrib.learn.PredictionKey.CLASSES),
        "precision":
        tf.contrib.learn.MetricSpec(
            metric_fn=tf.contrib.metrics.streaming_precision,
            prediction_key=tf.contrib.learn.PredictionKey.CLASSES),
        "recall":
        tf.contrib.learn.MetricSpec(
            metric_fn=tf.contrib.metrics.streaming_recall,
            prediction_key=tf.contrib.learn.PredictionKey.CLASSES)        
    }
    
    validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
        test_set.data,
        test_set.target,
        every_n_steps=50,
        metrics=validation_metrics,
        #Below means if "loss" does not decrease over a period of 200 teps, model training will stop immediately
        #at that point, and not complete the full 2000 steps specified in 'fit'.
        early_stopping_metric="loss",
        early_stopping_metric_minimize=True,
        early_stopping_rounds=200)
    
    #Fit model.
    #The tf.contrib.learn API uses input functions, which create the Tensorflow Operations that generate data
    #for the model.
    #After we configured our DNN classifier well, we'll fit it to the Iris training data using the fit method.
    #The state of the model is preserved in the classifier, which means we can train iteratively if we like.
    #We can use a tensorflow monitor to track the models while it trains.    
    classifier.fit(input_fn = get_train_inputs, steps = 2000, monitors = [validation_monitor])
    
    #Define the test inputs
    def get_test_inputs():
        x = tf.constant(test_set.data)
        y = tf.constant(test_set.target)
        
        return x,y
    
    #Evaluate accuracy
    #After the model fit well, we can use evaluate to check its accuracy on test data
    accuracy_score = classifier.evaluate(input_fn = get_test_inputs, steps = 1)["accuracy"]
    
    print("\nTest Accuracy: {0:f}\n".format(accuracy_score))
    
    #Classify two new flower samples.
    def new_samples():
        return np.array(
            [[6.4, 3.2, 4.5, 1.5],
             [5.8, 3.1, 5.0, 1.7]])
    
    #Also we can use predict method to classify new samples.
    predictions = list(classifier.predict(input_fn = new_samples))
    print("New Samples, Class Predictions: {}\n".format(predictions))

main()

INFO:tensorflow:Using config: {'_model_dir': None, '_save_checkpoints_secs': 1, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x694fe90>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': ''}
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
INFO:tensorflow:Cr


Test Accuracy: 0.966667

Instructions for updating:
Please switch to predict_classes, or set `outputs` argument.
INFO:tensorflow:Restoring parameters from /localdisk/tmp/iris_model/model.ckpt-4701
New Samples, Class Predictions: [1, 2]

