# MNIST master

For a demo we shall solve the same digit recognition problem, but at a different scale
* images are now 28x28
* 10 different digits
* 50k samples

Before doing this homework, read some code examples written in tensorflow. There is a good repository with code examples: https://github.com/aymericdamien/TensorFlow-Examples. As we already know, we need many samples to learn :)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


In [3]:
from mnist import load_dataset
X_train,y_train,X_val,y_val,X_test,y_test = load_dataset()

print(X_train.shape,y_train.shape)

Downloading train-images-idx3-ubyte.gz
Downloading train-labels-idx1-ubyte.gz
Downloading t10k-images-idx3-ubyte.gz
Downloading t10k-labels-idx1-ubyte.gz
(50000, 1, 28, 28) (50000,)


In [4]:
#reshaping for convolution
X_train=X_train.reshape(-1,28,28,1)
X_val=X_val.reshape(-1,28,28,1)
X_test=X_test.reshape(-1,28,28,1)

In [5]:
tf.reset_default_graph()

In [6]:
#defining placeholders for input and target
input_X = tf.placeholder(tf.float32, shape=[None, 28, 28, 1], 
                         name="X")
target_y = tf.placeholder(tf.int32, shape=[None], 
                          name="target_Y_integer")

Defining network architecture

In [7]:
input_X.shape

TensorShape([Dimension(None), Dimension(28), Dimension(28), Dimension(1)])

In [8]:
l1 = tf.layers.max_pooling2d(tf.layers.conv2d(input_X, 45, 5, activation=tf.nn.relu), 2, 2)

l2 = tf.layers.max_pooling2d(tf.layers.conv2d(l1, 30, 4, activation=tf.nn.relu), 2, 2)

l2_1 = tf.layers.conv2d(l2, 25, 3, activation=tf.nn.relu)

l2_2 = tf.layers.batch_normalization(tf.contrib.layers.flatten(l2_1))

# l3 = tf.layers.dense(l2_2, units=100, 
#                      activation=tf.nn.relu6)

# l4 = tf.layers.dense(l2_2, units=1000, 
#                      activation=tf.nn.relu6)

l4_2 = tf.layers.batch_normalization(tf.layers.dense(l2_2, units=100, 
                     activation=tf.nn.relu6))

l4_3 = tf.layers.batch_normalization(tf.layers.dense(l4_2, units=50, 
                     activation=tf.nn.relu6))

l5 = tf.layers.dense(l4_3, units=10, activation=None)

l_out = tf.nn.softmax(l5)

y_predicted = tf.argmax(l_out, axis=-1)

Instructions for updating:
Use the retry module or similar alternatives.


In [9]:
weights = tf.trainable_variables()
weights

[<tf.Variable 'conv2d/kernel:0' shape=(5, 5, 1, 45) dtype=float32_ref>,
 <tf.Variable 'conv2d/bias:0' shape=(45,) dtype=float32_ref>,
 <tf.Variable 'conv2d_1/kernel:0' shape=(4, 4, 45, 30) dtype=float32_ref>,
 <tf.Variable 'conv2d_1/bias:0' shape=(30,) dtype=float32_ref>,
 <tf.Variable 'conv2d_2/kernel:0' shape=(3, 3, 30, 25) dtype=float32_ref>,
 <tf.Variable 'conv2d_2/bias:0' shape=(25,) dtype=float32_ref>,
 <tf.Variable 'batch_normalization/gamma:0' shape=(100,) dtype=float32_ref>,
 <tf.Variable 'batch_normalization/beta:0' shape=(100,) dtype=float32_ref>,
 <tf.Variable 'dense/kernel:0' shape=(100, 100) dtype=float32_ref>,
 <tf.Variable 'dense/bias:0' shape=(100,) dtype=float32_ref>,
 <tf.Variable 'batch_normalization_1/gamma:0' shape=(100,) dtype=float32_ref>,
 <tf.Variable 'batch_normalization_1/beta:0' shape=(100,) dtype=float32_ref>,
 <tf.Variable 'dense_1/kernel:0' shape=(100, 50) dtype=float32_ref>,
 <tf.Variable 'dense_1/bias:0' shape=(50,) dtype=float32_ref>,
 <tf.Variable 'b

### Than you could simply
* define loss function manually
* compute error gradient over all weights
* define updates
* But that's a whole lot of work and life's short
  * not to mention life's too short to wait for SGD to converge

Instead, we shall use Tensorflow builtins

In [10]:
# Mean categorical crossentropy as a loss function
# - similar to logistic loss but for multiclass targets
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=target_y, logits=l_out, name="softmax_loss"))

In [11]:
accuracy, update_accuracy = tf.metrics.accuracy(target_y, y_predicted)
tf.local_variables()

[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]

In [28]:
optimzer = tf.train.AdamOptimizer(learning_rate=0.0001)
train_step = optimzer.minimize(loss)

### That's all, now let's train it!
* We got a lot of data, so it's recommended that you use SGD
* So let's implement a function that splits the training sample into minibatches

In [13]:
# An auxilary function that returns mini-batches for neural network training

#Parameters
# inputs - a tensor of images with shape (many, 1, 28, 28), e.g. X_train
# outputs - a vector of answers for corresponding images e.g. Y_train
#batch_size - a single number - the intended size of each batches

def iterate_minibatches(inputs, targets, batchsize):
    assert len(inputs) == len(targets)
    indices = np.arange(len(inputs))
    np.random.shuffle(indices)
    for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
        excerpt = indices[start_idx:start_idx + batchsize]
        yield inputs[excerpt], targets[excerpt]

# Training loop

Model saver.
<br>
See more:
http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/

In [14]:
model_path = "./checkpoints/model.ckpt"
saver = tf.train.Saver()#max_to_keep=3)

In [15]:
y_train

array([5, 0, 4, ..., 8, 4, 8], dtype=uint8)

In [29]:
import time

num_epochs = 100 # amount of passes through the data

batch_size = 100 # number of samples processed at each function call

with tf.Session() as sess:
    # initialize global wariables
    sess.run(tf.global_variables_initializer())
    
    saver.restore(sess, saver.last_checkpoints[-1])
    for epoch in range(num_epochs):
        # In each epoch, we do a full pass over the training data:
        train_err = 0
        train_batches = 0
        start_time = time.time()

        sess.run(tf.local_variables_initializer())
        
        
        for inputs, targets in iterate_minibatches(X_train[:], y_train[:],batch_size):

            _, train_err_batch, _ = sess.run(
                [train_step, loss, update_accuracy], 
                feed_dict={input_X: inputs, target_y:targets}
            )
            train_err += train_err_batch
            train_batches += 1
        train_acc = sess.run(accuracy)

        # And a full pass over the validation data:
        sess.run(tf.local_variables_initializer())
        for inputs, targets in iterate_minibatches(X_val, y_val, batch_size):
            sess.run(update_accuracy, feed_dict={input_X: inputs, 
                                                 target_y:targets})
        val_acc = sess.run(accuracy)


        # Then we print the results for this epoch:
        print("Epoch {} of {} took {:.3f}s".format(
            epoch + 1, num_epochs, time.time() - start_time))

        print("  training loss (in-iteration):\t\t{:.6f}".format(train_err / train_batches))
        print("  train accuracy:\t\t{:.2f} %".format(
            train_acc * 100))
        print("  validation accuracy:\t\t{:.2f} %".format(
            val_acc * 100))
        
        # save model
        save_path = saver.save(sess, model_path, global_step=epoch)
        print("  Model saved in file: %s" % save_path)

INFO:tensorflow:Restoring parameters from ./checkpoints/model.ckpt-10
Epoch 1 of 100 took 44.569s
  training loss (in-iteration):		1.462350
  train accuracy:		99.88 %
  validation accuracy:		99.10 %
  Model saved in file: ./checkpoints/model.ckpt-0
Epoch 2 of 100 took 44.537s
  training loss (in-iteration):		1.462370
  train accuracy:		99.88 %
  validation accuracy:		99.11 %
  Model saved in file: ./checkpoints/model.ckpt-1
Epoch 3 of 100 took 44.515s
  training loss (in-iteration):		1.462350
  train accuracy:		99.88 %
  validation accuracy:		99.14 %
  Model saved in file: ./checkpoints/model.ckpt-2
Epoch 4 of 100 took 44.439s
  training loss (in-iteration):		1.462350
  train accuracy:		99.88 %
  validation accuracy:		99.10 %
  Model saved in file: ./checkpoints/model.ckpt-3
Epoch 5 of 100 took 44.527s
  training loss (in-iteration):		1.462350
  train accuracy:		99.88 %
  validation accuracy:		99.08 %
  Model saved in file: ./checkpoints/model.ckpt-4
Epoch 6 of 100 took 44.520s
  train

KeyboardInterrupt: 

Now we can restore saved parameters:

In [30]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    load_path = saver.restore(sess, saver.last_checkpoints[-1])
    print("Model restored from file: %s" % save_path)
    
    sess.run(tf.local_variables_initializer())
    for batch in iterate_minibatches(X_test, y_test, 500):
        inputs, targets = batch
        sess.run(update_accuracy, feed_dict={input_X: inputs, 
                                                   target_y:targets})
    test_acc = sess.run(accuracy)
    print("Final results:")
    print("  test accuracy:\t\t{:.2f} %".format(
        test_acc* 100))

    if test_acc * 100 > 99.5:
        print ("Achievement unlocked: 80lvl Warlock!")
    else:
        print ("We need more magic!")

INFO:tensorflow:Restoring parameters from ./checkpoints/model.ckpt-44
Model restored from file: ./checkpoints/model.ckpt-44
Final results:
  test accuracy:		99.38 %
We need more magic!


# Now improve it!

* Moar layers!
* Moar units!
* Different nonlinearities!