Deep Learning
=============

Assignment 2
------------

Previously in `1_notmnist.ipynb`, we created a pickle with formatted datasets for training, development and testing on the [notMNIST dataset](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html).

The goal of this assignment is to progressively train deeper and more accurate models using TensorFlow.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range
import pickle as pkl

First reload the data we generated in `1_notmnist.ipynb`.

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f, encoding='latin1')
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)


We're first going to train a multinomial logistic regression using simple gradient descent.

TensorFlow works like this:
* First you describe the computation that you want to see performed: what the inputs, the variables, and the operations look like. These get created as nodes over a computation graph. This description is all contained within the block below:

      with graph.as_default():
          ...

* Then you can run the operations on this graph as many times as you want by calling `session.run()`, providing it outputs to fetch from the graph that get returned. This runtime operation is all contained in the block below:

      with tf.Session(graph=graph) as session:
          ...

Let's load all the data into TensorFlow and build the computation graph corresponding to our training:

In [5]:
# With gradient descent training, even this much data is prohibitive.
# Subset the training data for faster turnaround.
train_subset = 10000

graph = tf.Graph()
with graph.as_default():

  # Input data.
  # Load the training, validation and test data into constants that are
  # attached to the graph.
  tf_train_dataset = tf.constant(train_dataset[:train_subset, :])
  tf_train_labels = tf.constant(train_labels[:train_subset])
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  # These are the parameters that we are going to be training. The weight
  # matrix will be initialized using random values following a (truncated)
  # normal distribution. The biases get initialized to zero.
  weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_labels]))
  biases = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  # We multiply the inputs with the weight matrix, and add biases. We compute
  # the softmax and cross-entropy (it's one operation in TensorFlow, because
  # it's very common, and it can be optimized). We take the average of this
  # cross-entropy across all training examples: that's our loss.
  logits = tf.matmul(tf_train_dataset, weights) + biases
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
  
  # Optimizer.
  # We are going to find the minimum of this loss using gradient descent.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  # These are not part of training, but merely here so that we can report
  # accuracy figures as we train.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(
    tf.matmul(tf_valid_dataset, weights) + biases)
  test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

Let's run this computation and iterate:

In [8]:
num_steps = 801

def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

with tf.Session(graph=graph) as session:
  # This is a one-time operation which ensures the parameters get initialized as
  # we described in the graph: random weights for the matrix, zeros for the
  # biases. 
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    # Run the computations. We tell .run() that we want to run the optimizer,
    # and get the loss value and the training predictions returned as numpy
    # arrays.
    _, l, predictions = session.run([optimizer, loss, train_prediction])
    if (step % 100 == 0):
      print('-------------------------------')
      print('Loss at step %d: %f' % (step, l))
      print('Training accuracy: %.1f%%' % accuracy(
        predictions, train_labels[:train_subset, :]))
      # Calling .eval() on valid_prediction is basically like calling run(), but
      # just to get that one numpy array. Note that it recomputes all its graph
      # dependencies.
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
-------------------------------
Loss at step 0: 17.453949
Training accuracy: 14.3%
Validation accuracy: 16.0%
-------------------------------
Loss at step 100: 2.355172
Training accuracy: 72.0%
Validation accuracy: 70.5%
-------------------------------
Loss at step 200: 1.901910
Training accuracy: 74.9%
Validation accuracy: 72.5%
-------------------------------
Loss at step 300: 1.654243
Training accuracy: 76.2%
Validation accuracy: 73.4%
-------------------------------
Loss at step 400: 1.484323
Training accuracy: 77.1%
Validation accuracy: 74.0%
-------------------------------
Loss at step 500: 1.357427
Training accuracy: 77.7%
Validation accuracy: 74.4%
-------------------------------
Loss at step 600: 1.257688
Training accuracy: 78.3%
Validation accuracy: 74.6%
-------------------------------
Loss at step 700: 1.176400
Training accuracy: 78.7%
Validation accuracy: 74.9%
-------------------------------
Loss at step 800: 1.108428
Training accuracy: 79.3%
Validation accura

Let's now switch to stochastic gradient descent training instead, which is much faster.

The graph will be similar, except that instead of holding all the training data into a constant node, we create a `Placeholder` node which will be fed actual data at every call of `session.run()`.

In [16]:
batch_size = 128

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_labels]))
  biases = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  logits = tf.matmul(tf_train_dataset, weights) + biases
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
  
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(
    tf.matmul(tf_valid_dataset, weights) + biases)
  test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

Let's run it:

In [19]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print('-------------------------------')  
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
offset:	0
step:	0
batch_size:	0
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
-------------------------------
Minibatch loss at step 0: 17.652439
Minibatch accuracy: 4.7%
Validation accuracy: 13.6%
offset:	128
step:	1
batch_size:	128
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	256
step:	2
batch_size:	256
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	384
step:	3
batch_size:	384
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	512
step:	4
batch_size:	512
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	640
step:	5
batch_size:	640
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	768
step:	6
batch_size:	768
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1

offset:	15104
step:	118
batch_size:	15104
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	15232
step:	119
batch_size:	15232
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	15360
step:	120
batch_size:	15360
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	15488
step:	121
batch_size:	15488
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	15616
step:	122
batch_size:	15616
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	15744
step:	123
batch_size:	15744
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	15872
step:	124
batch_size:	15872
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	16000
step:	125
batch_size:	16000
train_labels.shape[0]:	(200000, 10)
train

offset:	25472
step:	199
batch_size:	25472
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	25600
step:	200
batch_size:	25600
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	25728
step:	201
batch_size:	25728
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	25856
step:	202
batch_size:	25856
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	25984
step:	203
batch_size:	25984
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	26112
step:	204
batch_size:	26112
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	26240
step:	205
batch_size:	26240
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	26368
step:	206
batch_size:	26368
train_labels.shape[0]:	(200000, 10)
train

offset:	34816
step:	272
batch_size:	34816
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	34944
step:	273
batch_size:	34944
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	35072
step:	274
batch_size:	35072
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	35200
step:	275
batch_size:	35200
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	35328
step:	276
batch_size:	35328
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	35456
step:	277
batch_size:	35456
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	35584
step:	278
batch_size:	35584
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	35712
step:	279
batch_size:	35712
train_labels.shape[0]:	(200000, 10)
train

train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	44416
step:	347
batch_size:	44416
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	44544
step:	348
batch_size:	44544
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	44672
step:	349
batch_size:	44672
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	44800
step:	350
batch_size:	44800
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	44928
step:	351
batch_size:	44928
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	45056
step:	352
batch_size:	45056
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	45184
step:	353
batch_size:	45184
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	45312
step:	354
batch_

train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	56704
step:	443
batch_size:	56704
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	56832
step:	444
batch_size:	56832
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	56960
step:	445
batch_size:	56960
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	57088
step:	446
batch_size:	57088
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	57216
step:	447
batch_size:	57216
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	57344
step:	448
batch_size:	57344
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	57472
step:	449
batch_size:	57472
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.

offset:	68992
step:	539
batch_size:	68992
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	69120
step:	540
batch_size:	69120
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	69248
step:	541
batch_size:	69248
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	69376
step:	542
batch_size:	69376
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	69504
step:	543
batch_size:	69504
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	69632
step:	544
batch_size:	69632
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	69760
step:	545
batch_size:	69760
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	69888
step:	546
batch_size:	69888
train_labels.shape[0]:	(200000, 10)
train

offset:	78976
step:	617
batch_size:	78976
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	79104
step:	618
batch_size:	79104
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	79232
step:	619
batch_size:	79232
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	79360
step:	620
batch_size:	79360
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	79488
step:	621
batch_size:	79488
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	79616
step:	622
batch_size:	79616
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	79744
step:	623
batch_size:	79744
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	79872
step:	624
batch_size:	79872
train_labels.shape[0]:	(200000, 10)
train

train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89216
step:	697
batch_size:	89216
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89344
step:	698
batch_size:	89344
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89472
step:	699
batch_size:	89472
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89600
step:	700
batch_size:	89600
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89728
step:	701
batch_size:	89728
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89856
step:	702
batch_size:	89856
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89984
step:	703
batch_size:	89984
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	90112
step:	704
batch_

step:	778
batch_size:	99584
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	99712
step:	779
batch_size:	99712
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	99840
step:	780
batch_size:	99840
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	99968
step:	781
batch_size:	99968
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100096
step:	782
batch_size:	100096
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100224
step:	783
batch_size:	100224
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100352
step:	784
batch_size:	100352
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100480
step:	785
batch_size:	100480
train_labels.shape[0]:	(200000, 10)
train_label

offset:	111360
step:	870
batch_size:	111360
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111488
step:	871
batch_size:	111488
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111616
step:	872
batch_size:	111616
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111744
step:	873
batch_size:	111744
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111872
step:	874
batch_size:	111872
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	112000
step:	875
batch_size:	112000
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	112128
step:	876
batch_size:	112128
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	112256
step:	877
batch_size:	112256
train_labels.shape[0]:	(2

train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	122880
step:	960
batch_size:	122880
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	123008
step:	961
batch_size:	123008
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	123136
step:	962
batch_size:	123136
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	123264
step:	963
batch_size:	123264
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	123392
step:	964
batch_size:	123392
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	123520
step:	965
batch_size:	123520
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	123648
step:	966
batch_size:	123648
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	123776
s

train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	131840
step:	1030
batch_size:	131840
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	131968
step:	1031
batch_size:	131968
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	132096
step:	1032
batch_size:	132096
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	132224
step:	1033
batch_size:	132224
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	132352
step:	1034
batch_size:	132352
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	132480
step:	1035
batch_size:	132480
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	132608
step:	1036
batch_size:	132608
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	1

train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	142848
step:	1116
batch_size:	142848
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	142976
step:	1117
batch_size:	142976
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	143104
step:	1118
batch_size:	143104
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	143232
step:	1119
batch_size:	143232
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	143360
step:	1120
batch_size:	143360
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	143488
step:	1121
batch_size:	143488
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	143616
step:	1122
batch_size:	143616
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0

offset:	153856
step:	1202
batch_size:	153856
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	153984
step:	1203
batch_size:	153984
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	154112
step:	1204
batch_size:	154112
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	154240
step:	1205
batch_size:	154240
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	154368
step:	1206
batch_size:	154368
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	154496
step:	1207
batch_size:	154496
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	154624
step:	1208
batch_size:	154624
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	154752
step:	1209
batch_size:	154752
train_labels.shap

offset:	168960
step:	1320
batch_size:	168960
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	169088
step:	1321
batch_size:	169088
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	169216
step:	1322
batch_size:	169216
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	169344
step:	1323
batch_size:	169344
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	169472
step:	1324
batch_size:	169472
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	169600
step:	1325
batch_size:	169600
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	169728
step:	1326
batch_size:	169728
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	169856
step:	1327
batch_size:	169856
train_labels.shap

batch_size:	179072
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	179200
step:	1400
batch_size:	179200
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	179328
step:	1401
batch_size:	179328
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	179456
step:	1402
batch_size:	179456
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	179584
step:	1403
batch_size:	179584
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	179712
step:	1404
batch_size:	179712
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	179840
step:	1405
batch_size:	179840
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	179968
step:	1406
batch_size:	179968
train_labels.shape[0]:	(200000, 10)
train_l

offset:	187904
step:	1468
batch_size:	187904
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	188032
step:	1469
batch_size:	188032
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	188160
step:	1470
batch_size:	188160
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	188288
step:	1471
batch_size:	188288
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	188416
step:	1472
batch_size:	188416
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	188544
step:	1473
batch_size:	188544
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	188672
step:	1474
batch_size:	188672
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	188800
step:	1475
batch_size:	188800
train_labels.shap

offset:	197888
step:	1546
batch_size:	197888
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	198016
step:	1547
batch_size:	198016
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	198144
step:	1548
batch_size:	198144
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	198272
step:	1549
batch_size:	198272
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	198400
step:	1550
batch_size:	198400
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	198528
step:	1551
batch_size:	198528
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	198656
step:	1552
batch_size:	198656
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	198784
step:	1553
batch_size:	198784
train_labels.shap

offset:	8000
step:	1624
batch_size:	8000
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	8128
step:	1625
batch_size:	8128
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	8256
step:	1626
batch_size:	8256
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	8384
step:	1627
batch_size:	8384
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	8512
step:	1628
batch_size:	8512
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	8640
step:	1629
batch_size:	8640
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	8768
step:	1630
batch_size:	8768
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	8896
step:	1631
batch_size:	8896
train_labels.shape[0]:	(200000, 10)
train_label[ 

batch_size:	16832
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	16960
step:	1694
batch_size:	16960
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	17088
step:	1695
batch_size:	17088
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	17216
step:	1696
batch_size:	17216
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	17344
step:	1697
batch_size:	17344
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	17472
step:	1698
batch_size:	17472
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	17600
step:	1699
batch_size:	17600
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	17728
step:	1700
batch_size:	17728
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0

offset:	28736
step:	1786
batch_size:	28736
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	28864
step:	1787
batch_size:	28864
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	28992
step:	1788
batch_size:	28992
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	29120
step:	1789
batch_size:	29120
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	29248
step:	1790
batch_size:	29248
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	29376
step:	1791
batch_size:	29376
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	29504
step:	1792
batch_size:	29504
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	29632
step:	1793
batch_size:	29632
train_labels.shape[0]:	(200000, 1

offset:	39360
step:	1869
batch_size:	39360
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	39488
step:	1870
batch_size:	39488
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	39616
step:	1871
batch_size:	39616
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	39744
step:	1872
batch_size:	39744
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	39872
step:	1873
batch_size:	39872
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	40000
step:	1874
batch_size:	40000
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	40128
step:	1875
batch_size:	40128
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	40256
step:	1876
batch_size:	40256
train_labels.shape[0]:	(200000, 1

offset:	48320
step:	1939
batch_size:	48320
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	48448
step:	1940
batch_size:	48448
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	48576
step:	1941
batch_size:	48576
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	48704
step:	1942
batch_size:	48704
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	48832
step:	1943
batch_size:	48832
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	48960
step:	1944
batch_size:	48960
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	49088
step:	1945
batch_size:	49088
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	49216
step:	1946
batch_size:	49216
train_labels.shape[0]:	(200000, 1

offset:	59712
step:	2028
batch_size:	59712
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	59840
step:	2029
batch_size:	59840
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	59968
step:	2030
batch_size:	59968
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	60096
step:	2031
batch_size:	60096
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	60224
step:	2032
batch_size:	60224
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	60352
step:	2033
batch_size:	60352
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	60480
step:	2034
batch_size:	60480
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	60608
step:	2035
batch_size:	60608
train_labels.shape[0]:	(200000, 1

offset:	70592
step:	2113
batch_size:	70592
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	70720
step:	2114
batch_size:	70720
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	70848
step:	2115
batch_size:	70848
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	70976
step:	2116
batch_size:	70976
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	71104
step:	2117
batch_size:	71104
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	71232
step:	2118
batch_size:	71232
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	71360
step:	2119
batch_size:	71360
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	71488
step:	2120
batch_size:	71488
train_labels.shape[0]:	(200000, 1

offset:	79936
step:	2186
batch_size:	79936
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	80064
step:	2187
batch_size:	80064
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	80192
step:	2188
batch_size:	80192
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	80320
step:	2189
batch_size:	80320
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	80448
step:	2190
batch_size:	80448
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	80576
step:	2191
batch_size:	80576
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	80704
step:	2192
batch_size:	80704
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	80832
step:	2193
batch_size:	80832
train_labels.shape[0]:	(200000, 1

step:	2251
batch_size:	88256
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	88384
step:	2252
batch_size:	88384
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	88512
step:	2253
batch_size:	88512
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	88640
step:	2254
batch_size:	88640
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	88768
step:	2255
batch_size:	88768
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	88896
step:	2256
batch_size:	88896
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89024
step:	2257
batch_size:	89024
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	89152
step:	2258
batch_size:	89152
train_labels.shape[0]:	(200000, 10)
train_label

train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	99648
step:	2340
batch_size:	99648
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	99776
step:	2341
batch_size:	99776
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	99904
step:	2342
batch_size:	99904
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100032
step:	2343
batch_size:	100032
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100160
step:	2344
batch_size:	100160
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100288
step:	2345
batch_size:	100288
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100416
step:	2346
batch_size:	100416
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	100544


offset:	111168
step:	2430
batch_size:	111168
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111296
step:	2431
batch_size:	111296
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111424
step:	2432
batch_size:	111424
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111552
step:	2433
batch_size:	111552
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111680
step:	2434
batch_size:	111680
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111808
step:	2435
batch_size:	111808
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	111936
step:	2436
batch_size:	111936
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	112064
step:	2437
batch_size:	112064
train_labels.shap

train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	121408
step:	2510
batch_size:	121408
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	121536
step:	2511
batch_size:	121536
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	121664
step:	2512
batch_size:	121664
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	121792
step:	2513
batch_size:	121792
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	121920
step:	2514
batch_size:	121920
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	122048
step:	2515
batch_size:	122048
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	122176
step:	2516
batch_size:	122176
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	1

offset:	130112
step:	2578
batch_size:	130112
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	130240
step:	2579
batch_size:	130240
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	130368
step:	2580
batch_size:	130368
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	130496
step:	2581
batch_size:	130496
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	130624
step:	2582
batch_size:	130624
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	130752
step:	2583
batch_size:	130752
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	130880
step:	2584
batch_size:	130880
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	131008
step:	2585
batch_size:	131008
train_labels.shap

step:	2654
batch_size:	139840
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	139968
step:	2655
batch_size:	139968
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	140096
step:	2656
batch_size:	140096
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	140224
step:	2657
batch_size:	140224
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	140352
step:	2658
batch_size:	140352
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	140480
step:	2659
batch_size:	140480
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	140608
step:	2660
batch_size:	140608
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	140736
step:	2661
batch_size:	140736
train_labels.shape[0]:	(200000, 

offset:	151744
step:	2747
batch_size:	151744
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	151872
step:	2748
batch_size:	151872
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	152000
step:	2749
batch_size:	152000
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	152128
step:	2750
batch_size:	152128
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	152256
step:	2751
batch_size:	152256
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	152384
step:	2752
batch_size:	152384
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	152512
step:	2753
batch_size:	152512
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	152640
step:	2754
batch_size:	152640
train_labels.shap

offset:	161984
step:	2827
batch_size:	161984
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	162112
step:	2828
batch_size:	162112
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	162240
step:	2829
batch_size:	162240
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	162368
step:	2830
batch_size:	162368
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	162496
step:	2831
batch_size:	162496
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	162624
step:	2832
batch_size:	162624
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	162752
step:	2833
batch_size:	162752
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	162880
step:	2834
batch_size:	162880
train_labels.shap

train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	172864
step:	2912
batch_size:	172864
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	172992
step:	2913
batch_size:	172992
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	173120
step:	2914
batch_size:	173120
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	173248
step:	2915
batch_size:	173248
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	173376
step:	2916
batch_size:	173376
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	173504
step:	2917
batch_size:	173504
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	173632
step:	2918
batch_size:	173632
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	1

offset:	182720
step:	2989
batch_size:	182720
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	182848
step:	2990
batch_size:	182848
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	182976
step:	2991
batch_size:	182976
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	183104
step:	2992
batch_size:	183104
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	183232
step:	2993
batch_size:	183232
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	183360
step:	2994
batch_size:	183360
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	183488
step:	2995
batch_size:	183488
train_labels.shape[0]:	(200000, 10)
train_label[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
offset:	183616
step:	2996
batch_size:	183616
train_labels.shap

---
Problem
-------

Turn the logistic regression example with SGD into a 1-hidden layer neural network with rectified linear units [nn.relu()](https://www.tensorflow.org/versions/r0.7/api_docs/python/nn.html#relu) and 1024 hidden nodes. This model should improve your validation / test accuracy.

---

In [15]:
batch_size = 128
num_nodes= 1024

graph = tf.Graph()
with graph.as_default():
    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32,shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    # Variables.
    weights1 = tf.Variable(tf.truncated_normal([image_size * image_size, num_nodes]))
    biases1 = tf.Variable(tf.zeros([num_nodes]))
    weights2 = tf.Variable(tf.truncated_normal([num_nodes, num_labels]))
    biases2 = tf.Variable(tf.zeros([num_labels]))
    
    # Training computation.
    layer1 = tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1)
    logit2 = tf.matmul(layer1, weights2) + biases2
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logit2))
    
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
    
    # Predictions for the training, validation, and test data.
    layer1 = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
    test_prediction = tf.nn.relu(tf.matmul(layer1, weights2) + biases2)
    
    layer1 = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
    valid_prediction = tf.nn.relu(tf.matmul(layer1, weights2) + biases2)

In [21]:
num_steps = 3001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 500 == 0):
            print('-------------------------------')  
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            print("Validation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels))
            print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
-------------------------------
Minibatch loss at step 0: 16.377649
Minibatch accuracy: 7.8%
Validation accuracy: 11.2%
Test accuracy: 11.5%
-------------------------------
Minibatch loss at step 500: 1.203869
Minibatch accuracy: 77.3%
Validation accuracy: 75.9%
Test accuracy: 83.4%
-------------------------------
Minibatch loss at step 1000: 1.452421
Minibatch accuracy: 73.4%
Validation accuracy: 76.8%
Test accuracy: 84.3%
-------------------------------
Minibatch loss at step 1500: 0.760515
Minibatch accuracy: 83.6%
Validation accuracy: 77.1%
Test accuracy: 84.9%
-------------------------------
Minibatch loss at step 2000: 0.829501
Minibatch accuracy: 78.9%
Validation accuracy: 77.5%
Test accuracy: 85.3%
-------------------------------
Minibatch loss at step 2500: 0.960108
Minibatch accuracy: 78.1%
Validation accuracy: 78.4%
Test accuracy: 85.9%
-------------------------------
Minibatch loss at step 3000: 1.044487
Minibatch accuracy: 76.6%
Validation accuracy: 78.7%
Test 