Deep Learning
=============

Assignment 2
------------

Previously in `1_notmnist.ipynb`, we created a pickle with formatted datasets for training, development and testing on the [notMNIST dataset](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html).

The goal of this assignment is to progressively train deeper and more accurate models using TensorFlow.

In [5]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

First reload the data we generated in `1_notmnist.ipynb`.

In [6]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

In [7]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
    dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
    # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
print(train_labels[0])

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)
[ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]


We're first going to train a multinomial logistic regression using simple gradient descent.

TensorFlow works like this:
* First you describe the computation that you want to see performed: what the inputs, the variables, and the operations look like. These get created as nodes over a computation graph. This description is all contained within the block below:

      with graph.as_default():
          ...

* Then you can run the operations on this graph as many times as you want by calling `session.run()`, providing it outputs to fetch from the graph that get returned. This runtime operation is all contained in the block below:

      with tf.Session(graph=graph) as session:
          ...

Let's load all the data into TensorFlow and build the computation graph corresponding to our training:

In [8]:
# With gradient descent training, even this much data is prohibitive.
# Subset the training data for faster turnaround.
train_subset = 10000

graph = tf.Graph()
with graph.as_default():

    # Input data.
    # Load the training, validation and test data into constants that are
    # attached to the graph.
    # Tensorflow API: tf.constant()
    tf_train_dataset = tf.constant(train_dataset[:train_subset, :])
    tf_train_labels = tf.constant(train_labels[:train_subset])
    
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    # These are the parameters that we are going to be training. The weight
    # matrix will be initialized using random values following a (truncated)
    # normal distribution. The biases get initialized to zero.
    # X * W: 1x784 x 784x10 = 1x10
    # Tensorflow API: tf.truncated_normal(), tf.Variable(), tf.zeros()
    
    weights = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels]))
    biases = tf.Variable(tf.zeros([num_labels]))

    # Training computation.
    # We multiply the inputs with the weight matrix, and add biases. We compute
    # the softmax and cross-entropy (it's one operation in TensorFlow, because
    # it's very common, and it can be optimized). We take the average of this
    # cross-entropy across all training examples: that's our loss.
    # Tensorflow API: tf.matmul(), tf.softmax_cross_entropy_with_logits(), tf.reduce_mean()
    
    logits = tf.matmul(tf_train_dataset, weights) + biases
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))

    # Optimizer.
    # We are going to find the minimum of this loss using gradient descent.
    # Tensorflow API: tf.train.GradientDescentOptimizer()
    
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    # These are not part of training, but merely here so that we can report
    # accuracy figures as we train.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
    test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

Let's run this computation and iterate:

In [13]:
num_steps = 801

def accuracy(predictions, labels):
    # numpy.argmax(a, axis=None, out=None): Returns the indices of the maximum values along an axis.
    
    # array = [[1,3,5,7,9],[10,8,6,4,2]]
    # lables = [[0,0,0,0,1],[1,0,0,0,0]]
    
    # np.argmax(array, 1) = [4, 0]       
    # np.argmax(labels, 1) = [4, 0]
    # np.argmax(array, 1) == np.argmax(lables, 1): [True, True]
    # np.sum([True, True]) = 2; np.sum([True, False]) = 1
    
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])

with tf.Session(graph=graph) as session:
    # This is a one-time operation which ensures the parameters get initialized as
    # we described in the graph: random weights for the matrix, zeros for the biases. 
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        # Run the computations. We tell .run() that we want to run the optimizer,
        # and get the loss value and the training predictions returned as numpy
        # arrays.
        _, l, predictions = session.run([optimizer, loss, train_prediction])
        if (step % 100 == 0):
            print('Loss at step %d: %f' % (step, l))
            print('Training accuracy: %.1f%%' % accuracy(predictions, train_labels[:train_subset, :]))
            # Calling .eval() on valid_prediction is basically like calling run(), but
            # just to get that one numpy array. Note that it recomputes all its graph
            # dependencies.
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Instructions for updating:
Use `tf.global_variables_initializer` instead.
Initialized
Loss at step 0: 18.436390
Training accuracy: 6.5%
Validation accuracy: 8.9%
Loss at step 100: 2.253838
Training accuracy: 71.9%
Validation accuracy: 70.3%
Loss at step 200: 1.814669
Training accuracy: 74.8%
Validation accuracy: 72.8%
Loss at step 300: 1.577829
Training accuracy: 76.1%
Validation accuracy: 73.7%
Loss at step 400: 1.416602
Training accuracy: 76.8%
Validation accuracy: 74.4%
Loss at step 500: 1.296365
Training accuracy: 77.5%
Validation accuracy: 74.6%
Loss at step 600: 1.202043
Training accuracy: 78.0%
Validation accuracy: 74.7%
Loss at step 700: 1.125529
Training accuracy: 78.5%
Validation accuracy: 74.9%
Loss at step 800: 1.061988
Training accuracy: 79.0%
Validation accuracy: 75.0%
Test accuracy: 82.8%


In [4]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def variable_summaries(var):
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)

def define_input(image_size=28, number_labels=10):
    with tf.name_scope('input'):
        x = tf.placeholder(tf.float32, [None, image_size * image_size], name='x-input')
        y_ = tf.placeholder(tf.float32, [None, number_labels], name='y-input')

    with tf.name_scope('input_reshape'):
        image_shaped_input = tf.reshape(x, [-1, image_size, image_size, 1])
        tf.summary.image('input', image_shaped_input, number_labels)
    return x, y_

def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
    with tf.name_scope(layer_name):
        with tf.name_scope('weights'):
            #--------------------------------------------------------------------
            # create {input_dim * output_dim} weights matrix
            
            # weights = ????????
            
            #--------------------------------------------------------------------
            variable_summaries(weights)

        with tf.name_scope('biases'):
            #--------------------------------------------------------------------
            # create output_dim dimension biases vector
            
            # biases =  ????????
            
            #--------------------------------------------------------------------
            variable_summaries(biases)

        with tf.name_scope('Wx_plus_b'):
            #--------------------------------------------------------------------
            # linear computation: W * X + b
            
            # preactivate = ???????
            
            #--------------------------------------------------------------------
            tf.summary.histogram('pre_activations', preactivate)

        #--------------------------------------------------------------------
        # activate output of linear computation via activation function: act

        # activations = ???????

        #--------------------------------------------------------------------
        
        tf.summary.histogram('activations', activations)
    return activations



# Input: N x 784
# Hidden_nodes = 1024(N x 1024) => hidden_weight = 784 x 1024
# output: N x 10 => output_weight = 1024 x 10
def main(learning_rate=0.05, max_steps=3001, batch_size=128):
    sess = tf.InteractiveSession()

    
    #---------------------------------------------------------------------------------
    # define input, get train_data:x, train_label:y_: the image size is 28 and labels is 10

    # x, y_ = ????

    #---------------------------------------------------------------------------------
    
    
    #--------------------------------------------------------------------------------
    # define first layer called 'layer1' with 1024 neurons

    # hidden1 = ???????

    #--------------------------------------------------------------------------------
    

    #################################################################
    # with tf.name_scope('dropout'):
    #     keep_prob = tf.placeholder(tf.float32)
    #     tf.summary.scalar('dropout_keep_probability', keep_prob)
    #     droped = tf.nn.dropout(hidden1, keep_prob)
    #################################################################

    
    #--------------------------------------------------------------------------------
    # define second layer called 'layer2' with 10 neurons, 
    # and the activation is tf.identity

    # y = ??????

    #--------------------------------------------------------------------------------
    

    with tf.name_scope('cross_entropy'):
        #--------------------------------------------------------------------------------
        # define loss function: tf.nn.softmax_cross_entropy_with_logits(labels=?, logits=?)
        # and calculate the sum: tf.reduce_mean(per_loss)

        # cross_entropy = ??????

        #--------------------------------------------------------------------------------
    tf.summary.scalar('cross_entropy', cross_entropy)

    with tf.name_scope('train'):
        #--------------------------------------------------------------------------------
        # optimizer cross_entropy

        # train_step = ??????

        #--------------------------------------------------------------------------------
        

    with tf.name_scope('accuracy'):
        with tf.name_scope('correct_prediction'):
            correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
        with tf.name_scope('accuracy'):
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar('accuracy', accuracy)

    merged = tf.summary.merge_all()
    train_writer = tf.summary.FileWriter('./summary/train', sess.graph)

    tf.global_variables_initializer().run()

    for step in range(max_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size)]
        batch_labels = train_labels[offset:(offset + batch_size)]
        feed_dict = {x: batch_data, y_: batch_labels}

        if step % 500 == 99:
            run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
            run_metadata = tf.RunMetadata()
            summary, _, acc = sess.run([merged, train_step, accuracy],
                                       feed_dict=feed_dict,
                                       options=run_options,
                                       run_metadata=run_metadata)
            train_writer.add_run_metadata(run_metadata, 'step%03d' % step)
            train_writer.add_summary(summary, step)
            print('Adding run metadata for %s and the accuracy is %s' % (step, acc))
        else:
            #--------------------------------------------------------------------------------
            # training merged, tain_step, accuracy 

            # summary, _, acc = ???

            #--------------------------------------------------------------------------------
            
            
            
            train_writer.add_summary(summary, step)

            
        if (step % 500 == 0):
            summary, acc = sess.run([merged, accuracy], feed_dict={x: valid_dataset, y_: valid_labels})

            train_writer.add_summary(summary, step)
            print('Accuracy at step %s: %s' % (step, acc))

    summary, acc = sess.run([merged, accuracy], feed_dict={x: test_dataset, y_: test_labels})

    train_writer.add_summary(summary, step + 1)
    print('Total Test Accuracy at step %s: %s' % (step + 1, acc))

    train_writer.close()

main(0.02, 3001, 128)


Accuracy at step 0: 0.3577
Adding run metadata for 99 and the accuracy is 0.773438
Accuracy at step 500: 0.7683
Adding run metadata for 599 and the accuracy is 0.75
Accuracy at step 1000: 0.7868
Adding run metadata for 1099 and the accuracy is 0.734375
Accuracy at step 1500: 0.7895
Adding run metadata for 1599 and the accuracy is 0.820312
Accuracy at step 2000: 0.8029
Adding run metadata for 2099 and the accuracy is 0.820312
Accuracy at step 2500: 0.8036
Adding run metadata for 2599 and the accuracy is 0.867188
Accuracy at step 3000: 0.8143
Total Test Accuracy at step 3001: 0.8859
