Deep Learning
=============

Assignment 3
------------

Previously in `2_fullyconnected.ipynb`, you trained a logistic regression and a neural network model.

The goal of this assignment is to explore regularization techniques.

In [10]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle

First reload the data we generated in `1_notmnist.ipynb`.

In [11]:
pickle_file = '../../Data/Tutorial/notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

In [12]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 1 to [0.0, 1.0, 0.0 ...], 2 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)


In [13]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

---
Problem 1
---------

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor `t` using `nn.l2_loss(t)`. The right amount of regularization should improve your validation / test accuracy.

---

---
Solution 1
---------

The idea is to find all the trainable variables and calculate l2_loss using built-in `tf.nn.l2_loss` function. Then add all of them to the actual loss, so that optimizer finds a way to minimize it too. Source https://stackoverflow.com/a/38466108/5330223

In [28]:
batch_size = 128
hidden_layer_size = 250

graph = tf.Graph()

with graph.as_default() :
    
    # Layer 1
    weights_layer_1 = tf.Variable(tf.truncated_normal([image_size * image_size, hidden_layer_size]), name="weight_layer_1")
    biases_layer_1 = tf.Variable(tf.zeros(hidden_layer_size), name="biases_layer_1")
    
    # Layer 2
    weights_layer_2 = tf.Variable(tf.truncated_normal([hidden_layer_size, num_labels]), name="weight_layer_2")
    biases_layer_2 = tf.Variable(tf.zeros(num_labels), name="biases_layer_2")
    
    def model(tf_train_dataset) : 
        out_layer_1 = tf.matmul(tf_train_dataset, weights_layer_1) + biases_layer_1
        #dense_layer = tf.layers.dense(inputs=out_layer_1, units=hidden_layer_size, activation=tf.nn.relu)
        return tf.matmul(tf.nn.relu(out_layer_1), weights_layer_2) + biases_layer_2
    
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labelset = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    logits = model(tf_train_dataset)
    
    #calculate l2 regularization loss
    vars   = tf.trainable_variables()
    lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars ]) * 0.001
    
    #add l2 regularization loss to actual loss, so that optimizer finds a way to minimize it too.
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labelset)) + lossL2
    
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(loss)
    train_predictions = tf.nn.softmax(logits=logits)
    valid_predictions = tf.nn.softmax(logits=model(tf_valid_dataset))
    test_predictions = tf.nn.softmax(logits=model(tf_test_dataset))

Looks like it improves the score by a bit. Same thing without regularization in assignment 2 got about 86 (<90) on test set, here it's greater than 90. See below

In [29]:
num_steps = 3001

with tf.Session(graph=graph) as sess :
    tf.global_variables_initializer().run()
    print ("Initialized...")
    
    for step in range(num_steps) :
        
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        batch_data = train_dataset[offset:offset + batch_size, : ]
        batch_labels = train_labels[offset:offset + batch_size, : ]
        
        feed_dict = {
            tf_train_dataset : batch_data
            , tf_train_labelset : batch_labels
        }
        
        _, l, predictions = sess.run([optimizer, loss, train_predictions], feed_dict)
        
        
        
        if (step % 500 == 0) :
            print ("Loss after mini batch step %d is : %f" % (step, l))
            print ("Mini batch accuracy : %f" % accuracy(predictions, batch_labels))
            print ("Validation accuracy : %f" % accuracy(valid_predictions.eval(), valid_labels))
    print ("Test accuracy : %f" % accuracy(test_predictions.eval(), test_labels))

Initialized...
Loss after mini batch step 0 is : 280.232422
Mini batch accuracy : 4.687500
Validation accuracy : 21.140000
Loss after mini batch step 500 is : 48.212273
Mini batch accuracy : 66.406250
Validation accuracy : 74.970000
Loss after mini batch step 1000 is : 28.050369
Mini batch accuracy : 78.125000
Validation accuracy : 81.090000
Loss after mini batch step 1500 is : 17.036791
Mini batch accuracy : 82.031250
Validation accuracy : 82.990000
Loss after mini batch step 2000 is : 10.930830
Mini batch accuracy : 79.687500
Validation accuracy : 83.580000
Loss after mini batch step 2500 is : 6.759264
Mini batch accuracy : 79.687500
Validation accuracy : 85.800000
Loss after mini batch step 3000 is : 4.049703
Mini batch accuracy : 88.281250
Validation accuracy : 86.060000
Test accuracy : 91.890000


---
Problem 2
---------
Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

---

In [34]:
num_steps = 3001

with tf.Session(graph=graph) as sess :
    tf.global_variables_initializer().run()
    print ("Initialized...")
    
    for step in range(num_steps) :
        
        offset = ((step % 10) * batch_size) % (train_labels.shape[0] - batch_size)
        
        batch_data = train_dataset[offset:offset + batch_size, : ]
        batch_labels = train_labels[offset:offset + batch_size, : ]
        
        feed_dict = {
            tf_train_dataset : batch_data
            , tf_train_labelset : batch_labels
        }
        
        _, l, predictions = sess.run([optimizer, loss, train_predictions], feed_dict)
        
        
        
        if (step % 500 == 0) :
            print ("Loss after mini batch step %d is : %f" % (step, l))
            print ("Mini batch accuracy : %f" % accuracy(predictions, batch_labels))
            print ("Validation accuracy : %f" % accuracy(valid_predictions.eval(), valid_labels))
    print ("Test accuracy : %f" % accuracy(test_predictions.eval(), test_labels))

Initialized...
Loss after mini batch step 0 is : 242.390121
Mini batch accuracy : 15.625000
Validation accuracy : 27.590000
Loss after mini batch step 500 is : 46.527313
Mini batch accuracy : 100.000000
Validation accuracy : 75.050000
Loss after mini batch step 1000 is : 28.218557
Mini batch accuracy : 100.000000
Validation accuracy : 75.110000
Loss after mini batch step 1500 is : 17.116915
Mini batch accuracy : 100.000000
Validation accuracy : 75.490000
Loss after mini batch step 2000 is : 10.387437
Mini batch accuracy : 100.000000
Validation accuracy : 76.110000
Loss after mini batch step 2500 is : 6.311393
Mini batch accuracy : 100.000000
Validation accuracy : 76.920000
Loss after mini batch step 3000 is : 3.845309
Mini batch accuracy : 100.000000
Validation accuracy : 77.990000
Test accuracy : 84.630000


As expected reducing training set to 1st 10 batches overfitting arises, meaning train set gets 100% accuracy but validation and train set shows poor result.

---
Problem 3
---------
Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides `nn.dropout()` for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?

---

Designing network for dropout only on training time

In [37]:
batch_size = 128
hidden_layer_size = 250

graph = tf.Graph()

with graph.as_default() :
    
    # Layer 1
    weights_layer_1 = tf.Variable(tf.truncated_normal([image_size * image_size, hidden_layer_size]), name="weight_layer_1")
    biases_layer_1 = tf.Variable(tf.zeros(hidden_layer_size), name="biases_layer_1")
    
    # Layer 2
    weights_layer_2 = tf.Variable(tf.truncated_normal([hidden_layer_size, num_labels]), name="weight_layer_2")
    biases_layer_2 = tf.Variable(tf.zeros(num_labels), name="biases_layer_2")
    
    def model(tf_train_dataset, training = True) : 
        out_layer_1 = tf.matmul(tf_train_dataset, weights_layer_1) + biases_layer_1
        relu_layer = tf.nn.relu(out_layer_1)
        if (training == True) :
            relu_layer = tf.nn.dropout(relu_layer, keep_prob=0.5)
        return tf.matmul(relu_layer, weights_layer_2) + biases_layer_2
    
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labelset = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    logits = model(tf_train_dataset)
    
    #calculate l2 regularization loss
    vars   = tf.trainable_variables()
    lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars ]) * 0.001
    
    #add l2 regularization loss to actual loss, so that optimizer finds a way to minimize it too.
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labelset)) + lossL2
    
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(loss)
    train_predictions = tf.nn.softmax(logits=logits)
    valid_predictions = tf.nn.softmax(logits=model(tf_valid_dataset, False))
    test_predictions = tf.nn.softmax(logits=model(tf_test_dataset, False))

Now start training normally

In [39]:
num_steps = 3001

with tf.Session(graph=graph) as sess :
    tf.global_variables_initializer().run()
    print ("Initialized...")
    
    for step in range(num_steps) :
        
        offset = ((step % 10) * batch_size) % (train_labels.shape[0] - batch_size)
        
        batch_data = train_dataset[offset:offset + batch_size, : ]
        batch_labels = train_labels[offset:offset + batch_size, : ]
        
        feed_dict = {
            tf_train_dataset : batch_data
            , tf_train_labelset : batch_labels
        }
        
        _, l, predictions = sess.run([optimizer, loss, train_predictions], feed_dict)
        
        
        
        if (step % 500 == 0) :
            print ("Loss after mini batch step %d is : %f" % (step, l))
            print ("Mini batch accuracy : %f" % accuracy(predictions, batch_labels))
            print ("Validation accuracy : %f" % accuracy(valid_predictions.eval(), valid_labels))
    print ("Test accuracy : %f" % accuracy(test_predictions.eval(), test_labels))

Initialized...
Loss after mini batch step 0 is : 368.047760
Mini batch accuracy : 5.468750
Validation accuracy : 36.120000
Loss after mini batch step 500 is : 46.534801
Mini batch accuracy : 96.093750
Validation accuracy : 78.720000
Loss after mini batch step 1000 is : 28.358191
Mini batch accuracy : 97.656250
Validation accuracy : 79.190000
Loss after mini batch step 1500 is : 17.160471
Mini batch accuracy : 98.437500
Validation accuracy : 79.810000
Loss after mini batch step 2000 is : 10.407832
Mini batch accuracy : 99.218750
Validation accuracy : 80.110000
Loss after mini batch step 2500 is : 6.344682
Mini batch accuracy : 99.218750
Validation accuracy : 80.350000
Loss after mini batch step 3000 is : 3.876447
Mini batch accuracy : 98.437500
Validation accuracy : 80.630000
Test accuracy : 86.790000


Little imporvement while using dropout than the previous overfit case. As we are still training on only 1st 10 batch of the data which had 77 and 84% accuracy on validation and test sets correspondingly.

---
Problem 4
---------

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is [97.1%](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html?showComment=1391023266211#c8758720086795711595).

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.5, global_step, ...)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 
 ---


Using dropout and regularization as the previous problem, increasing batch size and hidden layer size and making it deep. Adding learning rate decay too.

In [97]:
batch_size = 128
hidden_layer_size_1 = 256
hidden_layer_size_2 = 128

graph = tf.Graph()

with graph.as_default() :
    
    # Layer 1
    weights_layer_1 = tf.Variable(tf.truncated_normal([image_size * image_size, hidden_layer_size_1]), name="weight_layer_1")
    biases_layer_1 = tf.Variable(tf.zeros(hidden_layer_size_1), name="biases_layer_1")
    
    # Layer 2
    weights_layer_2 = tf.Variable(tf.truncated_normal([hidden_layer_size_1, hidden_layer_size_2]), name="weight_layer_2")
    biases_layer_2 = tf.Variable(tf.zeros(hidden_layer_size_2), name="biases_layer_2")
    
    # Layer 3
    weights_layer_3 = tf.Variable(tf.truncated_normal([hidden_layer_size_2, num_labels]), name="weight_layer_3")
    biases_layer_3 = tf.Variable(tf.zeros(num_labels), name="biases_layer_3")
    
    def model(tf_train_dataset, training = True) : 
        out_layer_1 = tf.matmul(tf_train_dataset, weights_layer_1) + biases_layer_1
        relu_layer1 = tf.nn.relu(out_layer_1)
        if (training == True) :
            relu_layer1 = tf.nn.dropout(relu_layer1, keep_prob=0.8)
            
        out_layer_2 = tf.matmul(relu_layer1, weights_layer_2) + biases_layer_2
        relu_layer2 = tf.nn.relu(out_layer_2)
        if (training == True) :
            relu_layer2 = tf.nn.dropout(relu_layer2, keep_prob=0.6)
        return tf.matmul(relu_layer2, weights_layer_3) + biases_layer_3
    
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labelset = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    logits = model(tf_train_dataset)
    
    #calculate l2 regularization loss
    vars   = tf.trainable_variables()
    lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars ]) * 0.001
    
    #add l2 regularization loss to actual loss, so that optimizer finds a way to minimize it too.
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labelset)) + lossL2
    
    global_step = tf.Variable(0, trainable=False)
    starter_learning_rate = 0.1
    learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                               100000, 0.90, staircase=True)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    
    train_predictions = tf.nn.softmax(logits=logits)
    valid_predictions = tf.nn.softmax(logits=model(tf_valid_dataset, False))
    test_predictions = tf.nn.softmax(logits=model(tf_test_dataset, False))

Now train it longer time

In [98]:
num_steps = 300001

with tf.Session(graph=graph) as sess :
    tf.global_variables_initializer().run()
    print ("Initialized...")
    
    for step in range(num_steps) :
        
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        batch_data = train_dataset[offset:offset + batch_size, : ]
        batch_labels = train_labels[offset:offset + batch_size, : ]
        
        feed_dict = {
            tf_train_dataset : batch_data
            , tf_train_labelset : batch_labels
        }
        
        _, l, lr, predictions = sess.run([optimizer, loss, learning_rate, train_predictions], feed_dict)
        
        
        
        if (step % 1000 == 0) :
            print ("Loss after mini batch step %d is : %f learning rate is : %f" % (step, l, lr))
            print ("Mini batch accuracy : %f" % accuracy(predictions, batch_labels))
            print ("Validation accuracy : %f" % accuracy(valid_predictions.eval(), valid_labels))
    print ("Test accuracy : %f" % accuracy(test_predictions.eval(), test_labels))

Initialized...
Loss after mini batch step 0 is : 1846.132568 learning rate is : 0.100000
Mini batch accuracy : 10.156250
Validation accuracy : 12.770000
Loss after mini batch step 1000 is : 79.591560 learning rate is : 0.100000
Mini batch accuracy : 12.500000
Validation accuracy : 12.720000
Loss after mini batch step 2000 is : 65.594231 learning rate is : 0.100000
Mini batch accuracy : 13.281250
Validation accuracy : 15.480000
Loss after mini batch step 3000 is : 53.917740 learning rate is : 0.100000
Mini batch accuracy : 19.531250
Validation accuracy : 18.160000
Loss after mini batch step 4000 is : 44.463432 learning rate is : 0.100000
Mini batch accuracy : 17.968750
Validation accuracy : 19.990000
Loss after mini batch step 5000 is : 37.034874 learning rate is : 0.100000
Mini batch accuracy : 25.000000
Validation accuracy : 34.740000
Loss after mini batch step 6000 is : 30.103436 learning rate is : 0.100000
Mini batch accuracy : 33.593750
Validation accuracy : 36.380000
Loss after mi

Loss after mini batch step 59000 is : 0.519087 learning rate is : 0.100000
Mini batch accuracy : 86.718750
Validation accuracy : 90.090000
Loss after mini batch step 60000 is : 0.425134 learning rate is : 0.100000
Mini batch accuracy : 91.406250
Validation accuracy : 89.750000
Loss after mini batch step 61000 is : 0.490805 learning rate is : 0.100000
Mini batch accuracy : 87.500000
Validation accuracy : 89.940000
Loss after mini batch step 62000 is : 0.513245 learning rate is : 0.100000
Mini batch accuracy : 88.281250
Validation accuracy : 90.030000
Loss after mini batch step 63000 is : 0.399332 learning rate is : 0.100000
Mini batch accuracy : 91.406250
Validation accuracy : 89.930000
Loss after mini batch step 64000 is : 0.476382 learning rate is : 0.100000
Mini batch accuracy : 89.062500
Validation accuracy : 89.920000
Loss after mini batch step 65000 is : 0.382084 learning rate is : 0.100000
Mini batch accuracy : 92.187500
Validation accuracy : 90.190000
Loss after mini batch step 

Loss after mini batch step 118000 is : 0.480749 learning rate is : 0.090000
Mini batch accuracy : 91.406250
Validation accuracy : 90.120000
Loss after mini batch step 119000 is : 0.473226 learning rate is : 0.090000
Mini batch accuracy : 89.843750
Validation accuracy : 90.350000
Loss after mini batch step 120000 is : 0.408854 learning rate is : 0.090000
Mini batch accuracy : 91.406250
Validation accuracy : 90.230000
Loss after mini batch step 121000 is : 0.459798 learning rate is : 0.090000
Mini batch accuracy : 92.187500
Validation accuracy : 90.330000
Loss after mini batch step 122000 is : 0.502025 learning rate is : 0.090000
Mini batch accuracy : 89.843750
Validation accuracy : 90.280000
Loss after mini batch step 123000 is : 0.470537 learning rate is : 0.090000
Mini batch accuracy : 91.406250
Validation accuracy : 90.360000
Loss after mini batch step 124000 is : 0.477596 learning rate is : 0.090000
Mini batch accuracy : 89.062500
Validation accuracy : 90.310000
Loss after mini batc

Loss after mini batch step 177000 is : 0.466668 learning rate is : 0.090000
Mini batch accuracy : 89.843750
Validation accuracy : 90.330000
Loss after mini batch step 178000 is : 0.445436 learning rate is : 0.090000
Mini batch accuracy : 89.843750
Validation accuracy : 90.420000
Loss after mini batch step 179000 is : 0.435805 learning rate is : 0.090000
Mini batch accuracy : 89.843750
Validation accuracy : 90.430000
Loss after mini batch step 180000 is : 0.510778 learning rate is : 0.090000
Mini batch accuracy : 88.281250
Validation accuracy : 90.440000
Loss after mini batch step 181000 is : 0.413918 learning rate is : 0.090000
Mini batch accuracy : 90.625000
Validation accuracy : 90.440000
Loss after mini batch step 182000 is : 0.436806 learning rate is : 0.090000
Mini batch accuracy : 90.625000
Validation accuracy : 90.480000
Loss after mini batch step 183000 is : 0.496015 learning rate is : 0.090000
Mini batch accuracy : 86.718750
Validation accuracy : 90.300000
Loss after mini batc

Loss after mini batch step 236000 is : 0.579356 learning rate is : 0.081000
Mini batch accuracy : 86.718750
Validation accuracy : 90.330000
Loss after mini batch step 237000 is : 0.493520 learning rate is : 0.081000
Mini batch accuracy : 88.281250
Validation accuracy : 90.530000
Loss after mini batch step 238000 is : 0.326500 learning rate is : 0.081000
Mini batch accuracy : 92.968750
Validation accuracy : 90.340000
Loss after mini batch step 239000 is : 0.425218 learning rate is : 0.081000
Mini batch accuracy : 89.843750
Validation accuracy : 90.500000
Loss after mini batch step 240000 is : 0.374700 learning rate is : 0.081000
Mini batch accuracy : 92.187500
Validation accuracy : 90.530000
Loss after mini batch step 241000 is : 0.458786 learning rate is : 0.081000
Mini batch accuracy : 90.625000
Validation accuracy : 90.300000
Loss after mini batch step 242000 is : 0.537933 learning rate is : 0.081000
Mini batch accuracy : 86.718750
Validation accuracy : 90.570000
Loss after mini batc

Loss after mini batch step 295000 is : 0.341995 learning rate is : 0.081000
Mini batch accuracy : 93.750000
Validation accuracy : 90.480000
Loss after mini batch step 296000 is : 0.555845 learning rate is : 0.081000
Mini batch accuracy : 87.500000
Validation accuracy : 90.650000
Loss after mini batch step 297000 is : 0.480229 learning rate is : 0.081000
Mini batch accuracy : 87.500000
Validation accuracy : 90.380000
Loss after mini batch step 298000 is : 0.355616 learning rate is : 0.081000
Mini batch accuracy : 92.968750
Validation accuracy : 90.650000
Loss after mini batch step 299000 is : 0.378070 learning rate is : 0.081000
Mini batch accuracy : 91.406250
Validation accuracy : 90.470000
Loss after mini batch step 300000 is : 0.425348 learning rate is : 0.072900
Mini batch accuracy : 91.406250
Validation accuracy : 90.510000
Test accuracy : 95.260000
