# Fully connected neural network with Tensorflow for MNIST

## Introduction

Tensorflow is is a symbolic math library and one of the widely used libraries for implementing Machine learning/other algorithms involving large number of mathematical operations. Tensorflow was developed by Google and it’s open source now. It is used for both research and production at Google e.g. for implementing Machine learning in almost all applications 
- Google photos 
- Google voice search 

In this notebook we are going to build a fully connected neural network with Tensorflow

## Requirements

### Imports

In [1]:
import tensorflow as tf
from deep_teaching_commons.data.fundamentals.mnist import Mnist
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

### Loading dataset

The MNIST dataset is a classic Machine Learning dataset you can get it and more information about it from the website of Yann Lecun. MNIST contains handwrittin digits and is split into a tranings set of 60000 examples and a test set of 10000 examples. We use the ```deep_teaching_commons``` package to load the MNIST dataset in a convenient way.

In [2]:
train_images, train_labels, test_images, test_labels = Mnist().get_all_data(one_hot_enc=True, flatten=False)
train_images, test_images = train_images.reshape(60000, 28, 28, 1), test_images.reshape(10000,28,28,1)
print('train shapes:', train_images.shape, train_labels.shape)
print('test shapes:', test_images.shape, test_labels.shape)

auto download is active, attempting download
mnist data directory already exists, download aborted
train shapes: (60000, 28, 28, 1) (60000, 10)
test shapes: (10000, 28, 28, 1) (10000, 10)


### Placeholders

So far we have used numpy arrays to manage our data, but in order to build a model in tensorflow we need another structure, the placeholder. A placeholder is simply a variable that we will assign data to at a later date. It allows us to create our operations and build our computation graph, without needing the data. In TensorFlow terminology, we then feed data into the graph through these placeholders.

In [3]:
# input X: 28x28 grayscale images, the first dimension (None) will index the images in the mini-batch
X = tf.placeholder(tf.float32, [None, 28, 28, 1])
# correct answers will go here
Y = tf.placeholder(tf.float32, [None, 10])

## Fully connected neural network for MNIST


### Initializing the weights
By initializing the weights of our neural network (the learnable parameter), we already define how our network is going to look like. We decided to use a neural network with 3 layer with a ReLU and Dropout function on top of each layer. 

In [None]:
# our neural network architecture:
#
#    · · · · · · ·           (input data, flattened pixels)               X [batch, 784]   # 784 = 28*28
#     \x/x\x/x\x/           -- fully connected layer (ReLU + Droput)      W1 [784, 256]    B3[256]
#      · · · · ·                                                          Y1 [batch, 256]
#       \x/x\x/             -- fully connected layer (ReLU + Droput)      W2 [256, 128]    B4[128]
#        · · ·                                                            Y2 [batch, 128]
#         \x/               -- fully connected layer (softmax)            W3 [128, 10]        B5[10]
#          ·                                                              Y3 [batch, 10]

W1 = tf.Variable(tf.truncated_normal([784, 256], stddev=0.1))  # 784 = 28 * 28
B1 = tf.Variable(tf.zeros([256]))
W2 = tf.Variable(tf.truncated_normal([256, 128], stddev=0.1))
B2 = tf.Variable(tf.zeros([128]))
W3 = tf.Variable(tf.truncated_normal([128, 10], stddev=0.1))
B3 = tf.Variable(tf.zeros([10]))

### Dropout

Dropout is a regularization technique which tries to prevent overfitting. Overfitting means that our network can't perform very well on images it haven't seen before which is obviously really bad.

In [None]:
# Probability of keeping a node during dropout = 1.0 at test time (no dropout) and 0.75 at training time
pkeep = tf.placeholder(tf.float32)

### Building the network

We have, as described above, a 3 layer fully connected neural network with ReLU and Dropout on top of each layer.

In [None]:
flatten = tf.reshape(X, [-1, 784])
hidden1 = tf.nn.relu(tf.matmul(flatten, W1) + B1)
dropout1 = tf.nn.dropout(hidden1, pkeep)
hidden2 = tf.nn.relu(tf.matmul(dropout1, W2) + B2)
dropout2 = tf.nn.dropout(hidden2, pkeep)
output = tf.nn.relu(tf.matmul(dropout2, W3) + B3)

In our **Use Case**, we need a kind of prediction layer on top of our output layer. We use a, so called, Softmax layer or the prediction which we put on top of the output layer. 

In [None]:
prediction = tf.nn.softmax(output)

### Cross Entropy Loss function
In general, the loss functions tells us how "good" or how "bad" our neural network is. This function is then minimized by the neural network so that the neural network gives us the best performance based on the defined loss function. For this purpose we are going to use the cross entropy loss function which is used very heavily in neural networks and seems to work very well.

**Note:** TensorFlow provides the ```softmax_cross_entropy_with_logits``` function to avoid numerical stability problems with log(0) which is NaN

In [None]:
# cross-entropy loss function (= -sum(Y_i * log(Yi)) ), normalised for batches of 100  images
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=Y)
cross_entropy = tf.reduce_mean(cross_entropy)

### Optimizer
We are going to use the gradient descent method **Adam** to minimize our loss function. We also add a learning rate with an exponential decay. In our setting we start at a learning rate of $0.003$ and exponentially reduce it to $0.00001$.

In [None]:
# step for variable learning rate
step = tf.placeholder(tf.int32)

# the learning rate is: # 0.0001 + 0.003 * (1/e)^(step/2000)
learning_rate = 0.0001 +  tf.train.exponential_decay(0.003, step, 2000, 1/np.exp(1))
optimizer = tf.train.AdamOptimizer(learning_rate)
train_step = optimizer.minimize(cross_entropy)

### Training the network

We define a ```accuracy``` so that we can see whether our network actually improves while training

In [None]:
correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

**Hyperparameter**

In [None]:
epochs = 100
batch_size = 64

**Main**

In [None]:
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

In [None]:
loss_history = []
for e in range(epochs):
    for batch_i in tqdm(range(0, train_images.shape[0], batch_size)):
        data, label = train_images[batch_i:batch_i + batch_size], train_labels[batch_i:batch_i + batch_size]
        # run the computational graph and calculate loss + training step
        # optimizer will not return something which is why we store it into a variable called empty
        loss, empty = sess.run([cross_entropy, train_step], feed_dict={X: data, Y: label,  pkeep: 0.75, step: e})
        # append to loss history
        loss_history.append(loss)

    train_acc = sess.run(accuracy, feed_dict={X:train_images, Y: train_labels,  pkeep: 1})
    test_acc = sess.run(accuracy, feed_dict={X:test_images, Y: test_labels,  pkeep: 1})
    print('epoch:', e, 'loss:', loss)
    print('test accuracy', test_acc, 'train accuracy', train_acc)

100%|██████████| 938/938 [00:01<00:00, 622.74it/s]
  8%|▊         | 73/938 [00:00<00:01, 727.45it/s]

epoch: 0 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 735.45it/s]
  8%|▊         | 75/938 [00:00<00:01, 742.32it/s]

epoch: 1 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 690.95it/s]
  7%|▋         | 70/938 [00:00<00:01, 695.99it/s]

epoch: 2 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 730.47it/s]
  9%|▊         | 80/938 [00:00<00:01, 798.99it/s]

epoch: 3 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 725.69it/s]
  8%|▊         | 74/938 [00:00<00:01, 738.35it/s]

epoch: 4 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 719.35it/s]
  8%|▊         | 76/938 [00:00<00:01, 755.01it/s]

epoch: 5 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 736.63it/s]
  8%|▊         | 78/938 [00:00<00:01, 774.98it/s]

epoch: 6 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 744.21it/s]
  8%|▊         | 75/938 [00:00<00:01, 749.49it/s]

epoch: 7 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 746.26it/s]
  8%|▊         | 78/938 [00:00<00:01, 774.00it/s]

epoch: 8 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 753.40it/s]
  7%|▋         | 64/938 [00:00<00:01, 639.94it/s]

epoch: 9 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 727.46it/s]
  8%|▊         | 72/938 [00:00<00:01, 716.02it/s]

epoch: 10 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 743.42it/s]
  8%|▊         | 71/938 [00:00<00:01, 706.53it/s]

epoch: 11 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 747.44it/s]
  8%|▊         | 76/938 [00:00<00:01, 756.43it/s]

epoch: 12 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 734.14it/s]
  8%|▊         | 79/938 [00:00<00:01, 783.64it/s]

epoch: 13 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 780.12it/s]
  8%|▊         | 77/938 [00:00<00:01, 766.56it/s]

epoch: 14 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 751.71it/s]
  8%|▊         | 76/938 [00:00<00:01, 751.30it/s]

epoch: 15 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 729.84it/s]
  7%|▋         | 69/938 [00:00<00:01, 687.56it/s]

epoch: 16 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 722.00it/s]
  8%|▊         | 74/938 [00:00<00:01, 732.25it/s]

epoch: 17 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 728.82it/s]
  8%|▊         | 74/938 [00:00<00:01, 730.93it/s]

epoch: 18 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 718.81it/s]
  8%|▊         | 78/938 [00:00<00:01, 770.54it/s]

epoch: 19 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 747.19it/s]
  7%|▋         | 69/938 [00:00<00:01, 683.53it/s]

epoch: 20 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 725.82it/s]
  8%|▊         | 78/938 [00:00<00:01, 770.04it/s]

epoch: 21 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 733.23it/s]
  8%|▊         | 75/938 [00:00<00:01, 741.58it/s]

epoch: 22 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 734.20it/s]
  8%|▊         | 75/938 [00:00<00:01, 747.53it/s]

epoch: 23 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 730.77it/s]
  8%|▊         | 74/938 [00:00<00:01, 734.80it/s]

epoch: 24 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 693.21it/s]
  8%|▊         | 71/938 [00:00<00:01, 708.57it/s]

epoch: 25 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 722.90it/s]
  8%|▊         | 76/938 [00:00<00:01, 752.98it/s]

epoch: 26 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 756.70it/s]
  9%|▊         | 82/938 [00:00<00:01, 814.73it/s]

epoch: 27 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 769.18it/s]
  8%|▊         | 71/938 [00:00<00:01, 706.18it/s]

epoch: 28 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 713.95it/s]
  8%|▊         | 73/938 [00:00<00:01, 722.84it/s]

epoch: 29 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 736.21it/s]
  9%|▊         | 81/938 [00:00<00:01, 807.08it/s]

epoch: 30 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 685.70it/s]
  8%|▊         | 72/938 [00:00<00:01, 716.84it/s]

epoch: 31 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 747.40it/s]
  8%|▊         | 77/938 [00:00<00:01, 769.85it/s]

epoch: 32 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 695.14it/s]
  8%|▊         | 75/938 [00:00<00:01, 743.33it/s]

epoch: 33 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 713.96it/s]
  6%|▌         | 53/938 [00:00<00:01, 525.18it/s]

epoch: 34 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 668.71it/s]
  8%|▊         | 78/938 [00:00<00:01, 778.15it/s]

epoch: 35 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 673.21it/s]
  8%|▊         | 78/938 [00:00<00:01, 777.69it/s]

epoch: 36 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 753.07it/s]
  8%|▊         | 74/938 [00:00<00:01, 736.96it/s]

epoch: 37 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 742.25it/s]
  8%|▊         | 77/938 [00:00<00:01, 766.12it/s]

epoch: 38 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 752.24it/s]
  8%|▊         | 78/938 [00:00<00:01, 772.12it/s]

epoch: 39 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 743.46it/s]
  8%|▊         | 78/938 [00:00<00:01, 773.70it/s]

epoch: 40 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 731.63it/s]
  8%|▊         | 75/938 [00:00<00:01, 747.49it/s]

epoch: 41 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 742.08it/s]
  8%|▊         | 75/938 [00:00<00:01, 742.04it/s]

epoch: 42 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 737.25it/s]
  8%|▊         | 75/938 [00:00<00:01, 748.12it/s]

epoch: 43 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 754.24it/s]
  8%|▊         | 75/938 [00:00<00:01, 745.48it/s]

epoch: 44 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 758.87it/s]
  8%|▊         | 73/938 [00:00<00:01, 724.29it/s]

epoch: 45 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 751.93it/s]
  7%|▋         | 69/938 [00:00<00:01, 685.73it/s]

epoch: 46 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 723.78it/s]
  7%|▋         | 69/938 [00:00<00:01, 688.16it/s]

epoch: 47 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 735.15it/s]
  8%|▊         | 74/938 [00:00<00:01, 738.92it/s]

epoch: 48 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 749.03it/s]
  8%|▊         | 78/938 [00:00<00:01, 774.38it/s]

epoch: 49 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 733.06it/s]
  8%|▊         | 75/938 [00:00<00:01, 742.25it/s]

epoch: 50 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 746.75it/s]
  8%|▊         | 78/938 [00:00<00:01, 770.60it/s]

epoch: 51 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 750.60it/s]
  8%|▊         | 78/938 [00:00<00:01, 777.66it/s]

epoch: 52 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 735.79it/s]
  8%|▊         | 73/938 [00:00<00:01, 723.30it/s]

epoch: 53 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 741.02it/s]
  8%|▊         | 71/938 [00:00<00:01, 707.44it/s]

epoch: 54 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 760.50it/s]
  8%|▊         | 73/938 [00:00<00:01, 728.30it/s]

epoch: 55 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 750.90it/s]
  8%|▊         | 71/938 [00:00<00:01, 706.45it/s]

epoch: 56 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 748.02it/s]
  8%|▊         | 74/938 [00:00<00:01, 732.27it/s]

epoch: 57 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 744.14it/s]
  8%|▊         | 77/938 [00:00<00:01, 763.24it/s]

epoch: 58 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 756.91it/s]
  8%|▊         | 73/938 [00:00<00:01, 726.36it/s]

epoch: 59 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 737.63it/s]
  8%|▊         | 74/938 [00:00<00:01, 739.09it/s]

epoch: 60 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 754.71it/s]
  8%|▊         | 74/938 [00:00<00:01, 736.47it/s]

epoch: 61 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 753.33it/s]
  8%|▊         | 74/938 [00:00<00:01, 733.31it/s]

epoch: 62 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 738.29it/s]
  8%|▊         | 79/938 [00:00<00:01, 780.81it/s]

epoch: 63 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 724.73it/s]
  8%|▊         | 74/938 [00:00<00:01, 731.80it/s]

epoch: 64 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 728.13it/s]
  8%|▊         | 71/938 [00:00<00:01, 701.35it/s]

epoch: 65 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 715.16it/s]
  8%|▊         | 78/938 [00:00<00:01, 772.79it/s]

epoch: 66 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 737.03it/s]
  8%|▊         | 73/938 [00:00<00:01, 721.04it/s]

epoch: 67 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 728.26it/s]
  8%|▊         | 76/938 [00:00<00:01, 752.65it/s]

epoch: 68 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 743.68it/s]
  8%|▊         | 76/938 [00:00<00:01, 759.37it/s]

epoch: 69 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 736.68it/s]
  7%|▋         | 70/938 [00:00<00:01, 690.14it/s]

epoch: 70 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 721.96it/s]
  8%|▊         | 74/938 [00:00<00:01, 733.58it/s]

epoch: 71 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 722.13it/s]
  8%|▊         | 74/938 [00:00<00:01, 739.44it/s]

epoch: 72 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 752.19it/s]
  8%|▊         | 77/938 [00:00<00:01, 762.00it/s]

epoch: 73 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 749.05it/s]
  8%|▊         | 71/938 [00:00<00:01, 706.59it/s]

epoch: 74 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 740.00it/s]
  8%|▊         | 74/938 [00:00<00:01, 739.55it/s]

epoch: 75 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 753.81it/s]
  8%|▊         | 73/938 [00:00<00:01, 726.07it/s]

epoch: 76 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 741.65it/s]
  8%|▊         | 74/938 [00:00<00:01, 731.16it/s]

epoch: 77 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 748.50it/s]
  8%|▊         | 78/938 [00:00<00:01, 771.61it/s]

epoch: 78 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 741.05it/s]
  8%|▊         | 77/938 [00:00<00:01, 766.16it/s]

epoch: 79 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 744.90it/s]
  8%|▊         | 74/938 [00:00<00:01, 730.35it/s]

epoch: 80 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 745.01it/s]
  8%|▊         | 75/938 [00:00<00:01, 748.92it/s]

epoch: 81 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 724.26it/s]
  8%|▊         | 75/938 [00:00<00:01, 741.49it/s]

epoch: 82 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 744.94it/s]
  8%|▊         | 79/938 [00:00<00:01, 788.59it/s]

epoch: 83 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 747.31it/s]
  8%|▊         | 72/938 [00:00<00:01, 718.25it/s]

epoch: 84 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 745.16it/s]
  8%|▊         | 75/938 [00:00<00:01, 742.21it/s]

epoch: 85 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 734.00it/s]
  8%|▊         | 76/938 [00:00<00:01, 756.21it/s]

epoch: 86 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


100%|██████████| 938/938 [00:01<00:00, 756.29it/s]
  8%|▊         | 73/938 [00:00<00:01, 729.69it/s]

epoch: 87 loss: 23.025852
test accuracy 0.098 train accuracy 0.09871667


 55%|█████▍    | 513/938 [00:00<00:00, 733.35it/s]

### Evaluate model
Let us look at the optimization results. Final loss tells us how far we could reduce costs during traning process. Further we can use the first loss value as a sanity check and validate our implementation of the loss function works as intended. To visulize the whole tranings process we can plot losss values from each iteration as a loss curve. 

In [None]:
# check loss after last and first iteration
print('last iteration loss:',loss_history[-1])
print('first iteration loss:',loss_history[0])
# Plot a loss curve
plt.plot(loss_history)
plt.ylabel('loss')
plt.xlabel('iterations')

Evaluation above gave us some inside about the optimization process but did not quantified our final model. One possibility is to calculate model accuracy.

In [None]:
acc = sess.run(accuracy, feed_dict={X:test_images, Y: test_labels,  pkeep: 1})
print(acc)

**Reset**

In [None]:
tf.reset_default_graph()