# TensorFlow Mechanics 101

#### Changes Chong Min made

* Optimizer: from Gradient Descent to Adam
* batch size: set to 10
* size of hidden layer: 8
* weight initializer: random_normal_initializer
* max steps: 20000

Among the above changes, `batch size`, `optimizer`, and `max steps` values were critical.

Because batches were randomly generated, the accuracies were flucturated. It should be fixed.

- This tutorial is meant as a companion to the code [here](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials/mnist/)
- The goal of this tutorial is to show how to use TensorFlow to train and evaluate a simple feed-forward neural network for handwritten digit classification using the (classic) MNIST data set. 

- [`mnist.py`](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/mnist.py), the code for making a fully-connected MNIST model
- [`fully_connected_feed.py`](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/fully_connected_feed.py), the main code to train the built MNIST model against the downloaded dataset using a feed dictionary.

In [1]:
import sys
import math
import time
import random
from os import getcwd
from os.path import join, dirname

import tensorflow as tf
import numpy as np

sys.path.append(join(dirname(getcwd()), "src"))
from utils import (read_data, DataSet, inference, loss, training_adam,
                   training_gradient_descent, evaluation, fill_feed_dict,
                   do_eval)

In [2]:
data_path = join(dirname(getcwd()), "data", "test_data_revised")

## Using a Custom e-rater Data-set

In [3]:
# Read in data

# NOTE: Download test_data_revised.zip (in email since it can't be shared)
# and save it somewhere, preferably in the "tutorial_notebooks/data"
# directory. If it is somewhere else, just make sure to pass in the path when
# this function is used.

# Choose "micro" or "macro". This will change the types of features we're
# using. There are 220 "micro" features in total while thre are 9 macro
# features.
dataset_type = "macro"
# dataset_type = "micro"

(train_ids, train_features, train_labels,
 test_ids, test_features, test_labels,
 dev_ids, dev_features, dev_labels) = read_data(data_path,
                                                macro_or_micro=dataset_type,
                                                dev_set=False)
#random_sampler = False
random_sampler = True
train_data = DataSet(train_ids, train_features, train_labels, random_=random_sampler)
test_data = DataSet(test_ids, test_features, test_labels)
if dev_labels is not None:
    dev_data = DataSet(dev_ids, dev_features, dev_labels)

### Data and Data Shape

In [4]:
show_data = False
#show_data = True

In [5]:
if show_data:
    print("Shape of data:\n\tTraining: {}\n\t{}Test: {}"
          .format(train_features.shape,
                  "" if dev_features is None
                     else "Development: {}\n\t".format(dev_features.shape),
                  test_features.shape))
    print("Shape of labels data:\n\tTraining: {}\n\t{}Test: {}"
          .format(train_labels.shape,
                  "" if dev_labels is None
                     else "Development: {}\n\t".format(dev_labels.shape),
                  test_labels.shape))

In [6]:
# What the features look like
if show_data:
    train_features.head()

In [7]:
if show_data:
    test_features.head()

In [8]:
# Labels are on a 0 to 5 scale (scores 1 to 6)
if show_data:
    train_labels[:10]

In [9]:
if show_data:
    train_labels = np.array(train_labels, dtype=np.float32)
    test_labels = np.array(test_labels, dtype=np.float32)

In [10]:
if show_data:
    if dev_labels is not None:
        print(dev_labels.value_counts())

In [None]:
# Define some parameters
log_dir_path = join(getcwd(), "logs")
max_steps = 20000
optimizer_type = "adam"
#optimizer_type = "gradient descent"
if dataset_type == "macro":
    learning_rate = 0.01
    hidden1 = 8
    hidden2 = 8
    hidden3 = None
    NUM_FEATURES = 9
    batch_size = 10
else:
    learning_rate = 0.01
    hidden1 = 512
    hidden2 = 128
    hidden3 = 16
    NUM_FEATURES = 220
    batch_size = 200
NUM_CLASSES = 6

In [None]:
# Tell TensorFlow that the model will be built into the default Graph.
with tf.Graph().as_default():

    # Generate placeholders for the input feature data and labels.
    inputs_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
                                                           NUM_FEATURES))
    labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))

    # Build a Graph that computes predictions from the inference model.
    logits = inference(inputs_placeholder,
                       NUM_FEATURES,
                       NUM_CLASSES,
                       hidden1,
                       hidden2,
                       hidden3_units=hidden3)

    # Add to the Graph the Ops for loss calculation.
    loss_ = loss(logits, labels_placeholder)

    # Add to the Graph the Ops that calculate and apply gradients.
    if optimizer_type == "adam":
        train_op = training_adam(loss_, learning_rate)
    elif optimizer_type == "gradient descent":
        train_op = training_gradient_descent(loss_, learning_rate)
    else:
        raise ValueError("Choose either \"adam\" or \"gradient descent\" for "
                         "`optimizer_type`.")

    # Add the Op to compare the logits to the labels during evaluation.
    eval_correct = evaluation(logits, labels_placeholder)

    # Build the summary Tensor based on the TF collection of Summaries.
    summary = tf.summary.merge_all()

    # Add the variable initializer Op.
    init = tf.global_variables_initializer()

    # Create a saver for writing training checkpoints.
    saver = tf.train.Saver()

    # Create a session for running Ops on the Graph.
    sess = tf.Session()

    # Instantiate a SummaryWriter to output summaries and the Graph.
    summary_writer = tf.summary.FileWriter(log_dir_path, sess.graph)

    # And then after everything is built:

    # Run the Op to initialize the variables.
    sess.run(init)

    # Start the training loop.
    for step in range(max_steps):
        start_time = time.time()

        # Fill a feed dictionary with the actual set of images and labels
        # for this particular training step.
        feed_dict = fill_feed_dict(train_data,
                                   inputs_placeholder,
                                   labels_placeholder,
                                   batch_size)

        # Run one step of the model.  The return values are the activations
        # from the `train_op` (which is discarded) and the `loss` Op.  To
        # inspect the values of your Ops or variables, you may include them
        # in the list passed to sess.run() and the value tensors will be
        # returned in the tuple from the call.
        _, loss_value = sess.run([train_op, loss_],
                                 feed_dict=feed_dict)

        duration = time.time() - start_time

        # Write the summaries and print an overview fairly often.
        if step % 100 == 0:

            # Print status to stdout.
            print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))
            # Update the events file.
            summary_str = sess.run(summary, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)
            summary_writer.flush()
        
        # Save a checkpoint and evaluate the model periodically.
        if (step + 1) % 1000 == 0 or (step + 1) == max_steps:
            checkpoint_file = join(log_dir_path, 'model.ckpt')
            saver.save(sess, checkpoint_file, global_step=step)

            # Evaluate against the training set.
            print('Train Data Eval:')
            do_eval(sess,
                    eval_correct,
                    inputs_placeholder,
                    labels_placeholder,
                    train_data,
                    logits,
                    batch_size)

            # Evaluate against the development set.
            if dev_labels is not None:
                print('Development Data Eval:')
                do_eval(sess,
                        eval_correct,
                        inputs_placeholder,
                        labels_placeholder,
                        dev_data,
                        logits,
                        batch_size)

            # Evaluate against the test set.
            print('Test Data Eval:')
            do_eval(sess,
                    eval_correct,
                    inputs_placeholder,
                    labels_placeholder,
                    test_data,
                    logits,
                    batch_size)

Step 0: loss = 1.79 (0.005 sec)
Step 100: loss = 1.62 (0.001 sec)
Step 200: loss = 0.90 (0.001 sec)
Step 300: loss = 0.87 (0.001 sec)
Step 400: loss = 1.54 (0.001 sec)
Step 500: loss = 1.61 (0.001 sec)
Step 600: loss = 0.88 (0.001 sec)
Step 700: loss = 0.86 (0.001 sec)
Step 800: loss = 1.51 (0.001 sec)
Step 900: loss = 1.59 (0.001 sec)
Train Data Eval:
  Num examples: 4000  Num correct: 2086  Accuracy @ 1: 0.5215
Test Data Eval:
  Num examples: 2750  Num correct: 1395  Accuracy @ 1: 0.5073
Step 1000: loss = 0.89 (0.002 sec)
Step 1100: loss = 0.85 (0.002 sec)
Step 1200: loss = 1.42 (0.002 sec)
Step 1300: loss = 1.45 (0.001 sec)
Step 1400: loss = 0.85 (0.001 sec)
Step 1500: loss = 0.64 (0.001 sec)
Step 1600: loss = 1.21 (0.002 sec)
Step 1700: loss = 1.25 (0.001 sec)
Step 1800: loss = 0.67 (0.001 sec)
Step 1900: loss = 0.53 (0.001 sec)
Train Data Eval:
  Num examples: 4000  Num correct: 2437  Accuracy @ 1: 0.6092
Test Data Eval:
  Num examples: 2750  Num correct: 1694  Accuracy @ 1: 0.616