# Exercise: putting everything together

In this you will write code for a model that learns to classify mnist digits. You will use tensorflow, tracking training progress with matplotlib.

For each sub-exercise, you have seen an example solution for it in one of the colabs leading up to this one.

In [0]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import random
import seaborn as sns
import numpy as np
import tensorflow as tf
import datetime

from matplotlib import pyplot as plt
from google.colab import files
from scipy.stats import multivariate_normal
from IPython.display import clear_output, Image, display, HTML

sns.set_style('ticks')


In [0]:
tf.reset_default_graph()

In [3]:
# Fetch the mnist data from tf.keras.datasets.mnist.

mnist_train, mnist_test = tf.keras.datasets.mnist.load_data()

# Check what the data is like:
print('Training dataset:')
train_input, train_label = mnist_train
print('* input shape:', train_input.shape)
print('* input min, mean, max:', train_input.min(), train_input.mean(), train_input.max())
print('* input dtype:', train_input.dtype)
print('* label shape:', train_label.shape)
print('* label min, mean, max:', train_label.min(), train_label.mean(), train_label.max())
print('* label dtype:', train_label.dtype)

test_input, test_label = mnist_test
print('Number of test examples:', test_input.shape[0])

Training dataset:
* input shape: (60000, 28, 28)
* input min, mean, max: 0 33.318421449829934 255
* input dtype: uint8
* label shape: (60000,)
* label min, mean, max: 0 4.4539333333333335 9
* label dtype: uint8
Number of test examples: 10000


Normalize the data into the \[0, 1\] interval. It's also a good idea to check the class distribution, but here we know that this is OK.



In [0]:
# Normalize both train_input and test_input so that it is in [0, 1].
#
# Also ensure the following data types:
#
# * train_input and test_input need to be np.float32.
# * the labels need to be converted to np.int32.


In [0]:
# We can visualize the first few training examples using matplotlib.imshow()
# in combination with the gallery function we defined.
#
# Copy the gallery function in this cell.



In [0]:
# Show the first 6 training images on a 1x6 grid.
# Remember to use grayscale plotting.
# Also print their corresponding labels in the same order.


In [0]:
# Write a function that turns the data into tensorflow datasets and into
# tensors corresponding to batches of examples, returning these tensors.
#
# The train data should be
#
# * shuffled across the full dataset
# * repeated indefinitely
# * batched at size 64.
#
# Simply batch the test data.
#
# IMPORTANT: Add a final (singleton) axis to the inputs; the conv nets that
# we will use will expect this.




In [0]:
# Creata function that returns a network with the following structure:
#
# 1. Conv2D with 16 filters, kernel shape 3, stride 1, padding 'SAME'
# 2. max pooling with window_shape [3, 3], srides [2, 2], padding 'SAME'
# 3. ReLU
# 4. Conv2D with 16 filters, kernel shape 3, stride 1, padding 'SAME'
# 5. Flatten the final conv features using snt.BatchFlatten
# 5. A Dense layer with output_size = 10, the number of classes.
#
# Make sure you use variable scoping to be able to share the underlying
# variables.




In [0]:
tf.reset_default_graph()
(train_inputs, train_labels), (test_inputs, test_labels) = get_tf_data()


In [0]:
# * Get the output of the network on the training data,
# * and the output of the *same* network with same weights on the test data.
# * Use the `tf.nn.sparse_softmax_cross_entropy_with_logits` op to define the loss
# * Define the train_op that minimizes the loss (averaged over the batch)
#   using the `GradientDescentOptimizer`. Set the learning rate to 0.01.
# * Get the initialization op.


In [0]:
# Write a function that takes a list of losses and plots them.



In [0]:
# Run the training loop, keeping track of losses and potentially the accuracy
# on the training set. Plot the loss curve intermittently.
#
# The simplest solution would add a new plot with each plotting call. You
# can play with the frequency of plotting (and recording) a bit in order
# to find something that works.
#
# Based on the loss curves, decide how to set your total number of training
# iterations. Once you are satified, add some code that evaluates your
# prediction accuracy (not loss!) on the test set.
#
# Note that the outputs from the network are logits; for prediction accuracy
# we can pick the most likely label and see if it is correct.

# The accuracy (on the training set) you should expect:
#
# * Roughly 90% after 1000 training steps.
# * 96-97% after 8k training steps.
#
# First iterate with 1k steps, if that works, train for 8k. 8k steps will
# be roughly 8 minutes on CPU.
#
# The final test accuracy should also be ~96%.
