Deep Learning
=============

Assignment 3
------------

Previously in `2_fullyconnected.ipynb`, you trained a logistic regression and a neural network model.

The goal of this assignment is to explore regularization techniques.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
import math
from six.moves import cPickle as pickle

First reload the data we generated in `1_notmnist.ipynb`.

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  dedup_valid_dataset = save['dedup_valid_dataset']
  dedup_valid_labels = save['dedup_valid_labels']
  dedup_test_dataset = save['dedup_test_dataset']
  dedup_test_labels = save['dedup_test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)
  print('Deduplicated validation set', dedup_valid_dataset.shape, dedup_valid_labels.shape)
  print('Deduplicated test set', dedup_test_dataset.shape, dedup_test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)
Deduplicated validation set (8093, 28, 28) (8093,)
Deduplicated test set (7741, 28, 28) (7741,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
  labels = (np.equal(np.arange(num_labels), labels[:,None])).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
orig_valid_dataset, orig_valid_labels = reformat(valid_dataset, valid_labels)
orig_test_dataset, orig_test_labels = reformat(test_dataset, test_labels)
dedup_valid_dataset, dedup_valid_labels = reformat(dedup_valid_dataset, dedup_valid_labels)
dedup_test_dataset, dedup_test_labels = reformat(dedup_test_dataset, dedup_test_labels)

print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', orig_valid_dataset.shape, orig_valid_labels.shape)
print('Test set', orig_test_dataset.shape, orig_test_labels.shape)
print('Deduplicated validation set', dedup_valid_dataset.shape, dedup_valid_labels.shape)
print('Deduplicated test set', dedup_test_dataset.shape, dedup_test_labels.shape)

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)
Deduplicated validation set (8093, 784) (8093, 10)
Deduplicated test set (7741, 784) (7741, 10)


In [4]:
valid_dataset = dedup_valid_dataset
valid_labels = dedup_valid_labels
test_dataset = dedup_test_dataset
test_labels = dedup_test_labels

In [5]:
valid_dataset = orig_valid_dataset
valid_labels = orig_valid_labels
test_dataset = orig_test_dataset
test_labels = orig_test_labels

In [6]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.equal(np.argmax(predictions, 1), np.argmax(labels, 1)))
          / predictions.shape[0])

---
Problem 1
---------

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor `t` using `nn.l2_loss(t)`. The right amount of regularization should improve your validation / test accuracy.

---

In [30]:
batch_size = 128
l2_loss_coeff = 0.005

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_labels]))
  biases = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  logits = tf.matmul(tf_train_dataset, weights) + biases
  loss = (tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    + l2_loss_coeff * tf.nn.l2_loss(weights))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(
    tf.matmul(tf_valid_dataset, weights) + biases)
  test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

In [31]:
num_steps = 10001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 35.214622
Minibatch accuracy: 8.6%
Validation accuracy: 8.4%
Minibatch loss at step 500: 1.889750
Minibatch accuracy: 75.8%
Validation accuracy: 75.6%
Minibatch loss at step 1000: 0.780084
Minibatch accuracy: 82.0%
Validation accuracy: 80.3%
Minibatch loss at step 1500: 0.932671
Minibatch accuracy: 81.2%
Validation accuracy: 80.1%
Minibatch loss at step 2000: 0.596656
Minibatch accuracy: 83.6%
Validation accuracy: 79.5%
Minibatch loss at step 2500: 0.626066
Minibatch accuracy: 83.6%
Validation accuracy: 79.6%
Minibatch loss at step 3000: 0.829144
Minibatch accuracy: 79.7%
Validation accuracy: 79.9%
Minibatch loss at step 3500: 0.869050
Minibatch accuracy: 73.4%
Validation accuracy: 79.9%
Minibatch loss at step 4000: 0.671679
Minibatch accuracy: 81.2%
Validation accuracy: 79.3%
Minibatch loss at step 4500: 0.849229
Minibatch accuracy: 78.9%
Validation accuracy: 79.0%
Minibatch loss at step 5000: 0.654628
Minibatch accuracy: 85.9%
Validation accuracy

In [32]:
batch_size = 128
hidden_size = 1024
l2_loss_coeff = 0.005

graph = tf.Graph()

with graph.as_default():
    train_set = tf.placeholder(tf.float32, shape = (batch_size, image_size * image_size))
    train_labs = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    valid_set = tf.constant(valid_dataset)
    test_set = tf.constant(test_dataset)
    
    h1_weights = tf.Variable(tf.truncated_normal([image_size * image_size, hidden_size]))
    h2_weights = tf.Variable(tf.truncated_normal([hidden_size, num_labels]))
    h1_biases = tf.Variable(tf.zeros([hidden_size]))
    h2_biases = tf.Variable(tf.zeros([num_labels]))
    
    def compute_logits(dataset):
        h1_res = tf.nn.relu(tf.matmul(dataset, h1_weights) + h1_biases)
        return tf.matmul(h1_res, h2_weights) + h2_biases
        
    logits = compute_logits(train_set)
    
    loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_labs, logits=logits))
            + l2_loss_coeff * (tf.nn.l2_loss(h1_weights) + tf.nn.l2_loss(h2_weights)))
    
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
    
    train_predictions = tf.nn.softmax(logits)
    valid_predictions = tf.nn.softmax(compute_logits(valid_set))
    test_predictions = tf.nn.softmax(compute_logits(test_set))

In [34]:
num_steps = 2001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    for step in range(num_steps):
        offset = step * batch_size % (len(train_dataset) - batch_size)
        batch_set = train_dataset[offset:(offset+batch_size)]
        batch_labs = train_labels[offset:(offset+batch_size)]
        
        feed_dict = { train_set : batch_set, train_labs : batch_labs }
        _, l, train_p = session.run([optimizer, loss, train_predictions], feed_dict=feed_dict)
        
        if (step % 100 == 0):
            print('Accuracy after step %d' % step)
            print('Loss: %f' % l)
            print('Train accuracy: %.1f%%' % accuracy(train_p, batch_labs))
            print('Valid accuracy: %.1f%%' % accuracy(valid_predictions.eval(), valid_labels))
        
    print('Final test accuracy: %.1f%%' % accuracy(test_predictions.eval(), test_labels))

Initialized
Accuracy after step 0
Loss: 1849.760010
Train accuracy: 11.7%
Valid accuracy: 29.6%
Accuracy after step 100
Loss: 970.021240
Train accuracy: 76.6%
Valid accuracy: 76.5%
Accuracy after step 200
Loss: 583.892761
Train accuracy: 76.6%
Valid accuracy: 77.5%
Accuracy after step 300
Loss: 350.385864
Train accuracy: 78.9%
Valid accuracy: 77.1%
Accuracy after step 400
Loss: 210.571640
Train accuracy: 80.5%
Valid accuracy: 78.6%
Accuracy after step 500
Loss: 127.205711
Train accuracy: 81.2%
Valid accuracy: 79.6%
Accuracy after step 600
Loss: 76.992065
Train accuracy: 82.8%
Valid accuracy: 82.0%
Accuracy after step 700
Loss: 46.914280
Train accuracy: 81.2%
Valid accuracy: 82.7%
Accuracy after step 800
Loss: 28.659807
Train accuracy: 82.0%
Valid accuracy: 83.7%
Accuracy after step 900
Loss: 17.560654
Train accuracy: 85.2%
Valid accuracy: 83.9%
Accuracy after step 1000
Loss: 10.885038
Train accuracy: 85.2%
Valid accuracy: 83.6%
Accuracy after step 1100
Loss: 6.858540
Train accuracy: 89

---
Problem 2
---------
Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

---

In [43]:
batch_size = 128
hidden_size = 1024
l2_loss_coeff = 0.0

graph = tf.Graph()

with graph.as_default():
    train_set = tf.placeholder(tf.float32, shape = (batch_size, image_size * image_size))
    train_labs = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    valid_set = tf.constant(valid_dataset)
    test_set = tf.constant(test_dataset)
    
    h1_weights = tf.Variable(tf.truncated_normal([image_size * image_size, hidden_size]))
    h2_weights = tf.Variable(tf.truncated_normal([hidden_size, num_labels]))
    h1_biases = tf.Variable(tf.zeros([hidden_size]))
    h2_biases = tf.Variable(tf.zeros([num_labels]))
    
    def compute_logits(dataset):
        h1_res = tf.nn.relu(tf.matmul(dataset, h1_weights) + h1_biases)
        return tf.matmul(h1_res, h2_weights) + h2_biases
        
    logits = compute_logits(train_set)
    
    loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_labs, logits=logits))
            + l2_loss_coeff * (tf.nn.l2_loss(h1_weights) + tf.nn.l2_loss(h2_weights)))
    
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
    
    train_predictions = tf.nn.softmax(logits)
    valid_predictions = tf.nn.softmax(compute_logits(valid_set))
    test_predictions = tf.nn.softmax(compute_logits(test_set))

In [44]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    for step in range(num_steps):
        offset = (step % 5) * batch_size
        batch_set = train_dataset[offset:(offset+batch_size)]
        batch_labs = train_labels[offset:(offset+batch_size)]
        
        feed_dict = { train_set : batch_set, train_labs : batch_labs }
        _, l, train_p = session.run([optimizer, loss, train_predictions], feed_dict=feed_dict)
        
        if (step % 100 == 0):
            print('Accuracy after step %d' % step)
            print('Loss: %f' % l)
            print('Train accuracy: %.1f%%' % accuracy(train_p, batch_labs))
            print('Valid accuracy: %.1f%%' % accuracy(valid_predictions.eval(), valid_labels))
        
    print('Final test accuracy: %.1f%%' % accuracy(test_predictions.eval(), test_labels))

Initialized
Accuracy after step 0
Loss: 333.970337
Train accuracy: 10.9%
Valid accuracy: 24.3%
Accuracy after step 100
Loss: 0.000003
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 200
Loss: 0.000002
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 300
Loss: 0.000002
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 400
Loss: 0.000002
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 500
Loss: 0.000001
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 600
Loss: 0.000001
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 700
Loss: 0.000001
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 800
Loss: 0.000001
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 900
Loss: 0.000001
Train accuracy: 100.0%
Valid accuracy: 74.2%
Accuracy after step 1000
Loss: 0.000001
Train accuracy: 100.0%
Valid accuracy: 74.2%
Final test accuracy: 81.2%


---
Problem 3
---------
Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides `nn.dropout()` for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?

---

In [54]:
batch_size = 128
hidden_size = 1024
dropout_rate = 0.25

graph = tf.Graph()

with graph.as_default():
    train_set = tf.placeholder(tf.float32, shape = (batch_size, image_size * image_size))
    train_labs = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    valid_set = tf.constant(valid_dataset)
    test_set = tf.constant(test_dataset)
    
    h1_weights = tf.Variable(tf.truncated_normal([image_size * image_size, hidden_size]))
    h2_weights = tf.Variable(tf.truncated_normal([hidden_size, num_labels]))
    h1_biases = tf.Variable(tf.zeros([hidden_size]))
    h2_biases = tf.Variable(tf.zeros([num_labels]))
    
    def compute_logits(dataset, training=False):
        h1_res = tf.nn.relu(tf.matmul(dataset, h1_weights) + h1_biases)
        if training:
            h1_res = tf.nn.dropout(h1_res, dropout_rate)
        return tf.matmul(h1_res, h2_weights) + h2_biases
        
    logits = compute_logits(train_set, training=True)
    
    loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_labs, logits=logits)))
    
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
    
    train_predictions = tf.nn.softmax(logits)
    valid_predictions = tf.nn.softmax(compute_logits(valid_set))
    test_predictions = tf.nn.softmax(compute_logits(test_set))

In [55]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    for step in range(num_steps):
        offset = (step % 10) * batch_size
        batch_set = train_dataset[offset:(offset+batch_size)]
        batch_labs = train_labels[offset:(offset+batch_size)]
        
        feed_dict = { train_set : batch_set, train_labs : batch_labs }
        _, l, train_p = session.run([optimizer, loss, train_predictions], feed_dict=feed_dict)
        
        if (step % 100 == 0):
            print('Accuracy after step %d' % step)
            print('Loss: %f' % l)
            print('Train accuracy: %.1f%%' % accuracy(train_p, batch_labs))
            print('Valid accuracy: %.1f%%' % accuracy(valid_predictions.eval(), valid_labels))
        
    print('Final test accuracy: %.1f%%' % accuracy(test_predictions.eval(), test_labels))

Initialized
Accuracy after step 0
Loss: 649.715881
Train accuracy: 7.0%
Valid accuracy: 40.7%
Accuracy after step 100
Loss: 64.285126
Train accuracy: 78.1%
Valid accuracy: 77.4%
Accuracy after step 200
Loss: 36.085220
Train accuracy: 87.5%
Valid accuracy: 78.1%
Accuracy after step 300
Loss: 6.643983
Train accuracy: 95.3%
Valid accuracy: 78.3%
Accuracy after step 400
Loss: 30.625895
Train accuracy: 86.7%
Valid accuracy: 77.8%
Accuracy after step 500
Loss: 17.820826
Train accuracy: 90.6%
Valid accuracy: 78.4%
Accuracy after step 600
Loss: 6.656956
Train accuracy: 96.1%
Valid accuracy: 79.0%
Accuracy after step 700
Loss: 6.210125
Train accuracy: 94.5%
Valid accuracy: 79.0%
Accuracy after step 800
Loss: 11.989132
Train accuracy: 92.2%
Valid accuracy: 79.2%
Accuracy after step 900
Loss: 3.325966
Train accuracy: 97.7%
Valid accuracy: 78.6%
Accuracy after step 1000
Loss: 4.547933
Train accuracy: 94.5%
Valid accuracy: 78.9%
Final test accuracy: 85.3%


---
Problem 4
---------

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is [97.1%](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html?showComment=1391023266211#c8758720086795711595).

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.5, global_step, ...)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 
 ---


In [9]:
batch_size = 100
hidden_sizes = [1500, 500, 150, 50]
dropout_keep_prob = 0.5
l2_loss_coeff = 0.000
num_steps = 20001

graph = tf.Graph()

with graph.as_default():
    train_set = tf.placeholder(tf.float32, shape = (batch_size, image_size * image_size))
    train_labs = tf.placeholder(tf.float32, shape = (batch_size, num_labels))
    valid_set = tf.constant(valid_dataset)
    test_set = tf.constant(test_dataset)
    
    padded_hidden_sizes = [image_size * image_size] + hidden_sizes + [num_labels]
    weights = []
    biases = []
    for i in range(1, len(padded_hidden_sizes)):
        stddev = math.sqrt(2.0 / padded_hidden_sizes[i - 1])
        weights.append(tf.Variable(tf.truncated_normal([padded_hidden_sizes[i - 1], padded_hidden_sizes[i]], stddev=stddev)))
        biases.append(tf.Variable(tf.zeros([padded_hidden_sizes[i]])))
    
    def compute_logits(dataset, training=False):
        res = dataset
        for i in range(len(hidden_sizes)):
            res = tf.nn.relu(tf.matmul(res, weights[i]) + biases[i])
            if training:
                res = tf.nn.dropout(res, dropout_keep_prob)
        return tf.matmul(res, weights[-1]) + biases[-1]
        
    logits = compute_logits(train_set, training=False)
    
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_labs, logits=logits))
    for weight in weights:
        loss = loss + l2_loss_coeff * tf.nn.l2_loss(weight)
    
    global_step = tf.Variable(0)
    learning_rate = tf.train.exponential_decay(0.5, global_step, 500, 0.9)
    #learning_rate = tf.train.polynomial_decay(0.5, global_step, num_steps)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    
    train_predictions = tf.nn.softmax(logits)
    valid_predictions = tf.nn.softmax(compute_logits(valid_set))
    test_predictions = tf.nn.softmax(compute_logits(test_set))

In [10]:
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    
    prev_valid_acc = 0.0
    prev_test_acc = 0.0
    final_test_acc = None
    
    for step in range(num_steps):
        offset = step * batch_size % (len(train_dataset) - batch_size)
        batch_set = train_dataset[offset:(offset+batch_size)]
        batch_labs = train_labels[offset:(offset+batch_size)]
        
        feed_dict = { train_set : batch_set, train_labs : batch_labs }
        _, l, train_p = session.run([optimizer, loss, train_predictions], feed_dict=feed_dict)
    
        valid_acc = None
        if (step % 100 == 0):
            valid_acc = accuracy(valid_predictions.eval(), valid_labels)
            
            print('Accuracy after step %d' % step)
            print('Loss: %f' % l)
            print('Train accuracy: %.1f%%' % accuracy(train_p, batch_labs))
            print('Valid accuracy: %.1f%%' % valid_acc)
        
        if (step % 2000 == 0):
            if valid_acc < prev_valid_acc:
                print('stopping early after %d steps' % step)
                final_test_acc = prev_test_acc
                #break
            else:
                prev_valid_acc = valid_acc
                prev_test_acc = accuracy(test_predictions.eval(), test_labels)
                
    if final_test_acc == None:
        final_test_acc = accuracy(test_predictions.eval(), test_labels)
    print('Final test accuracy: %.1f%%' % final_test_acc)

Initialized
Accuracy after step 0
Loss: 2.338303
Train accuracy: 16.0%
Valid accuracy: 25.2%
Accuracy after step 100
Loss: 0.588078
Train accuracy: 83.0%
Valid accuracy: 79.3%
Accuracy after step 200
Loss: 0.520857
Train accuracy: 84.0%
Valid accuracy: 83.5%
Accuracy after step 300
Loss: 0.591900
Train accuracy: 85.0%
Valid accuracy: 83.1%
Accuracy after step 400
Loss: 0.618255
Train accuracy: 81.0%
Valid accuracy: 84.6%
Accuracy after step 500
Loss: 0.620643
Train accuracy: 79.0%
Valid accuracy: 85.1%
Accuracy after step 600
Loss: 0.607025
Train accuracy: 82.0%
Valid accuracy: 85.6%
Accuracy after step 700
Loss: 0.379936
Train accuracy: 90.0%
Valid accuracy: 86.2%
Accuracy after step 800
Loss: 0.653869
Train accuracy: 82.0%
Valid accuracy: 85.5%
Accuracy after step 900
Loss: 0.496533
Train accuracy: 85.0%
Valid accuracy: 86.4%
Accuracy after step 1000
Loss: 0.542236
Train accuracy: 82.0%
Valid accuracy: 87.0%
Accuracy after step 1100
Loss: 0.428788
Train accuracy: 83.0%
Valid accuracy

Accuracy after step 9800
Loss: 0.179338
Train accuracy: 96.0%
Valid accuracy: 90.8%
Accuracy after step 9900
Loss: 0.192461
Train accuracy: 95.0%
Valid accuracy: 90.8%
Accuracy after step 10000
Loss: 0.155192
Train accuracy: 94.0%
Valid accuracy: 90.8%
Accuracy after step 10100
Loss: 0.082076
Train accuracy: 98.0%
Valid accuracy: 90.8%
Accuracy after step 10200
Loss: 0.146969
Train accuracy: 96.0%
Valid accuracy: 91.2%
Accuracy after step 10300
Loss: 0.145314
Train accuracy: 95.0%
Valid accuracy: 91.1%
Accuracy after step 10400
Loss: 0.081979
Train accuracy: 98.0%
Valid accuracy: 90.9%
Accuracy after step 10500
Loss: 0.239343
Train accuracy: 94.0%
Valid accuracy: 91.0%
Accuracy after step 10600
Loss: 0.118128
Train accuracy: 96.0%
Valid accuracy: 90.8%
Accuracy after step 10700
Loss: 0.251653
Train accuracy: 91.0%
Valid accuracy: 90.9%
Accuracy after step 10800
Loss: 0.153650
Train accuracy: 95.0%
Valid accuracy: 91.0%
Accuracy after step 10900
Loss: 0.086768
Train accuracy: 97.0%
Vali

Accuracy after step 19500
Loss: 0.079373
Train accuracy: 98.0%
Valid accuracy: 91.1%
Accuracy after step 19600
Loss: 0.072904
Train accuracy: 97.0%
Valid accuracy: 91.0%
Accuracy after step 19700
Loss: 0.078256
Train accuracy: 98.0%
Valid accuracy: 91.0%
Accuracy after step 19800
Loss: 0.048843
Train accuracy: 98.0%
Valid accuracy: 91.1%
Accuracy after step 19900
Loss: 0.106828
Train accuracy: 98.0%
Valid accuracy: 91.0%
Accuracy after step 20000
Loss: 0.123958
Train accuracy: 97.0%
Valid accuracy: 91.1%
stopping early after 20000 steps
Final test accuracy: 96.4%
