# Classifying Handwritten Digits with Neural Networks

![img](https://www.tensorflow.org/versions/r0.11/images/MNIST.png)

In this exercise, we will be classifying hand-written digits using the classic MNIST data set. Our goal is to map each input image to the correct numeric digit.  We will create a NN with a few hidden layers and a Softmax layer at the top to select the winning class.

First, let's import TensorFlow and other utilities, and load in the data set. Note that this data is a sample of the original MNIST training data; we've taken 20000 rows at random.

In [None]:
!wget https://storage.googleapis.com/ml_universities/mnist_train_small.csv -O /tmp/mnist_train_small.csv

In [None]:
import io
import math

from IPython import display
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn import metrics
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

mnist_dataframe = pd.read_csv(
  io.open("/tmp/mnist_train_small.csv", "r"),
  sep=",",
  header=None)

mnist_dataframe = mnist_dataframe.reindex(np.random.permutation(mnist_dataframe.index))
mnist_dataframe.head()

The first column contains the class label. The remaining columns contain the feature values, one per pixel for the `28×28=784` pixel values.  Most of these 784 pixel values are zero; you may want to take a minute to confirm that they're not *all* zero.

![img](https://www.tensorflow.org/versions/r0.11/images/MNIST-Matrix.png)

These examples are relatively low-resolution, high contrast images of handwritten numbers.  The ten digits `0-9` are each represented, with a unique class label for each possible digit.  Thus, this is a multi-class classification problem with 10 classes.

Now, let's parse out the labels and features and look at a few examples as a sanity check. Note the use of `loc` which allows us to pull out columns based on original location, since we don't have a header row in this data set.

In [None]:
def parse_labels_and_features(dataset):
  """Extracts labels and features.
  
  This is a good place to scale or transform the features if needed.
  
  Args:
    dataset: A Pandas `Dataframe`, containing the label on the first column and
      monochrome pixel values on the remaining columns, in row major order.
  Returns:
    A `tuple` `(labels, features)`:
      labels: A Pandas `Series`.
      features: A Pandas `DataFrame`.
  """
  labels = dataset[0]

  # DataFrame.loc index ranges are inclusive at both ends.
  features = dataset.loc[:,1:784]
  # Scale the data to [0, 1] by dividing out the max value, 255.
  features = features / 255

  return labels, features

In [None]:
training_targets, training_examples = parse_labels_and_features(mnist_dataframe.head(15000))
training_examples.describe()

In [None]:
validation_targets, validation_examples = parse_labels_and_features(mnist_dataframe.tail(5000))
validation_examples.describe()

Show a random example and its corresponding label.

In [None]:
rand_example = np.random.choice(training_examples.index)
_, ax = plt.subplots()
ax.matshow(training_examples.ix[rand_example].values.reshape(28, 28))
ax.set_title("Label: %i" % training_targets.ix[rand_example])
ax.grid(False)

### Task 1: Build a linear model for MNIST.

First, let's create a baseline model to compare against. The `LinearClassifier` provides a set of *k* one-vs-all classifiers, one for each of the *k* classes.

You'll notice that in addition to reporting accuracy, and plotting log-loss over time, we also show a [**confusion matrix**](https://en.wikipedia.org/wiki/Confusion_matrix).  The confusion matrix shows which classes were mis-classified as other classes.  Which digits get confused for each other?

Also note that we track the model's error using the `log_loss` function. This is not to be confused with the loss function internal to `LinearClassifier` that is used for training.

In [None]:
def create_training_input_fn(features, labels, batch_size):
  """A custom input_fn for sending mnist data to the estimator for training.

  Args:
    features: The training features.
    labels: The training labels.
    batch_size: Batch size to use during training.

  Returns:
    A function that returns batches of training features and labels during
    training.
  """
  def _input_fn():
    raw_features = tf.constant(features.values)
    raw_targets = tf.constant(labels.values)
    dataset_size = len(features)

    return tf.train.shuffle_batch(
        [raw_features, raw_targets],
        batch_size=batch_size,
        enqueue_many=True,
        capacity=2 * dataset_size,  # Must be greater than min_after_dequeue.
        min_after_dequeue=dataset_size)  # Important to ensure uniform randomness.

  return _input_fn

def create_predict_input_fn(features, labels):
  """A custom input_fn for sending mnist data to the estimator for predictions.

  Args:
    features: The features to base predictions on.
    labels: The labels of the prediction examples.

  Returns:
    A function that returns features and labels for predictions.
  """
  def _input_fn():
    raw_features = tf.constant(features.values)
    raw_targets = tf.constant(labels.values)
    return tf.train.limit_epochs(raw_features, 1), raw_targets

  return _input_fn

def train_linear_classification_model(
    learning_rate,
    steps,
    batch_size,
    training_examples,
    training_targets,
    validation_examples,
    validation_targets):
  """Trains a linear classification model for the MNIST digits dataset.
  
  In addition to training, this function also prints training progress information,
  a plot of the training and validation loss over time, and a confusion
  matrix.
  
  Args:
    learning_rate: An `int`, the learning rate to use.
    steps: A non-zero `int`, the total number of training steps. A training step
      consists of a forward and backward pass using a single batch.
    batch_size: A non-zero `int`, the batch size.
    training_examples: A `DataFrame` containing the training features.
    training_targets: A `DataFrame` containing the training labels.
    validation_examples: A `DataFrame` containing the validation features.
    validation_targets: A `DataFrame` containing the validation labels.
      
  Returns:
    The trained `LinearClassifier` object.
  """

  periods = 10
  steps_per_period = steps / periods
  
  # Create the input functions.
  predict_training_input_fn = create_predict_input_fn(
    training_examples, training_targets)
  predict_validation_input_fn = create_predict_input_fn(
    validation_examples, validation_targets)
  training_input_fn = create_training_input_fn(
    training_examples, training_targets, batch_size)

  # Create a linear classifier object.
  feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(
      training_examples)
  classifier = tf.contrib.learn.LinearClassifier(
      feature_columns=feature_columns,
      n_classes=10,
      optimizer=tf.train.AdagradOptimizer(learning_rate=learning_rate),
      gradient_clip_norm=5.0
  )

  # Train the model, but do so inside a loop so that we can periodically assess
  # loss metrics.
  print "Training model..."
  print "LogLoss error (on validation data):"
  training_errors = []
  validation_errors = []
  for period in range (0, periods):
    # Train the model, starting from the prior state.
    classifier.fit(
        input_fn=training_input_fn,
        steps=steps_per_period
    )
    # Take a break and compute predictions.
    training_predictions = list(classifier.predict_proba(input_fn=predict_training_input_fn))
    validation_predictions = list(classifier.predict_proba(input_fn=predict_validation_input_fn))
    # Compute training and validation errors.
    training_log_loss = metrics.log_loss(training_targets, training_predictions)
    validation_log_loss = metrics.log_loss(validation_targets, validation_predictions)
    # Occasionally print the current loss.
    print "  period %02d : %0.2f" % (period, validation_log_loss)
    # Add the loss metrics from this period to our list.
    training_errors.append(training_log_loss)
    validation_errors.append(validation_log_loss)
  print "Model training finished."

  # Calculate final predictions (not probabilities, as above).
  final_predictions = list(classifier.predict(input_fn=predict_validation_input_fn))
  accuracy = metrics.accuracy_score(validation_targets, final_predictions)
  print "Final accuracy (on validation data): %0.2f" % accuracy  

  # Output a graph of loss metrics over periods.
  plt.ylabel("LogLoss")
  plt.xlabel("Periods")
  plt.title("LogLoss vs. Periods")
  plt.plot(training_errors, label="training")
  plt.plot(validation_errors, label="validation")
  plt.legend()
  plt.show()
  
  # Output a plot of the confusion matrix.
  cm = metrics.confusion_matrix(validation_targets, final_predictions)
  # Normalize the confusion matrix by row (i.e by the number of samples
  # in each class)
  cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
  ax = sns.heatmap(cm_normalized, cmap="bone_r")
  ax.set_aspect(1)
  plt.title("Confusion matrix")
  plt.ylabel("True label")
  plt.xlabel("Predicted label")
  plt.show()

  return classifier

**Spend 5 minutes seeing how well you can do on accuracy with a linear model of this form. For this exercise, limit yourself to experimenting with the hyperparameters for batch size, learning rate and steps.**

Stop if you get anything above about 0.9 accuracy.

In [None]:
_ = train_linear_classification_model(
    learning_rate=0.02,
    steps=100,
    batch_size=10,
    training_examples=training_examples,
    training_targets=training_targets,
    validation_examples=validation_examples,
    validation_targets=validation_targets)

### Task 2: Replace the Linear Classifier with a Neural Network.

**Replace the LinearClassifier above with a [DNNClassifier](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/estimators/dnn.py#L298) and find a parameter combination that gives 0.95 or better accuracy.**

You may wish to experiment with additional regularization methods, such as dropout. These additional regularization methods are documented in the comments for the `DNNClassifier` class.

In [None]:
#
# Your code here: Replace the linear classifier with a neural network.
#

Once you have a good model, double check that you didn't over-fit the validation set by evaluating on the test data that we'll load below.


In [None]:
!wget https://storage.googleapis.com/ml_universities/mnist_test.csv -O /tmp/mnist_test.csv

In [None]:
mnist_test_dataframe = pd.read_csv(
  io.open("/tmp/mnist_test.csv", "r"),
  sep=",",
  header=None)

test_targets, test_examples = parse_labels_and_features(mnist_test_dataframe)
test_examples.describe()

In [None]:
#
# Your code here: Calculate accuracy on the test set.
#

### Task 3: Visualize the weights of the first hidden layer.

Let's take a few minutes to dig into our neural network and see what it has learned by accessing the `weights_` attribute of our model.

The input layer of our model has `784` weights corresponding to the `28×28` pixel input images. The first hidden layer will have `784×N` weights where `N` is the number of nodes in that layer. We can turn those weights back into `28×28` images by *reshaping* each of the `N` `1×784` arrays of weights into `N` arrays of size `28×28`.

Run the following cell to plot the weights. Note that this cell requires that a `DNNClassifier` called "classifier" has already been trained.

In [None]:
weights0 = classifier.weights_[0]

print "weights0 shape:", weights0.shape

num_nodes = weights0.shape[1]
num_rows = int(math.ceil(num_nodes / 10.0))
fig, axes = plt.subplots(num_rows, 10, figsize=(20, 2 * num_rows))
for coef, ax in zip(weights0.T, axes.ravel()):
    # Weights in coef is reshaped from 1x784 to 28x28.
    ax.matshow(coef.reshape(28, 28), cmap=plt.cm.pink)
    ax.set_xticks(())
    ax.set_yticks(())

plt.show()

The first hidden layer of the neural network should be modeling some pretty low level features, so visualizing the weights will probably just show some fuzzy blobs or possibly a few parts of digits.  You may also see some neurons that are essentially noise -- these are either unconverged or they are being ignored by higher layers.

It can be interesting to stop training at different numbers of iterations and see the effect.

**Train the classifier for 10, 100 and respectively 1000 steps. Then run this visualization again.**

What differences do you see visually for the different levels of convergence?