# Convolutional neural network with Tensorflow

## Introduction
TODO

## Requirements
### Imports

In [1]:
import tensorflow as tf
import numpy as np
from deep_teaching_commons.data.fundamentals.mnist import Mnist
import matplotlib.pyplot as plt
from tqdm import tqdm

### Data

In [2]:
train_images, train_labels, test_images, test_labels = Mnist().get_all_data(one_hot_enc=True, flatten=False)
train_images, test_images = train_images.reshape(60000, 28, 28, 1), test_images.reshape(10000,28,28,1)
print('train shapes:', train_images.shape, train_labels.shape)
print('test shapes:', test_images.shape, test_labels.shape)

auto download is active, attempting download
mnist data directory already exists, download aborted
train shapes: (60000, 28, 28, 1) (60000, 10)
test shapes: (10000, 28, 28, 1) (10000, 10)


### Placeholders

In [3]:
# input X: 28x28 grayscale images, the first dimension (None) will index the images in the mini-batch
X = tf.placeholder(tf.float32, [None, 28, 28, 1])
# correct answers will go here
Y = tf.placeholder(tf.float32, [None, 10])

## Convolutional Neural Network for MNIST

### Initializing the weights

In [4]:
# neural network architecture:
#
# · · · · · · · · · ·      (input data, 1-deep)                 X [batch, 28, 28, 1]
#   @ @ @ @ @ @ @ @     -- conv. layer stride 2                 W1 [3, 3, 1, 64]       B1 [64]
#   ∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶                                             Y1 [batch, 14, 14, 64]
#     @ @ @ @ @ @       -- conv. layer stride 2                 W2 [3, 3, 64, 32]      B2 [32]
#     ∶∶∶∶∶∶∶∶∶∶∶                                               Y2 [batch, 7, 7, 32] => reshaped to YY [batch, 7*7*32]
#      \x/x\x\x/        -- fully connected layer (relu)         W3 [7*7*32, 256]       B3 [256]
#       · · · ·                                                 Y3 [batch, 256]
#       \x/x\x/         -- fully connected layer (softmax)      W4 [256, 10]           B4 [10]
#        · · ·                                                  Y4 [batch, 10]

# convolution layer weights
W1 = tf.Variable(tf.truncated_normal([3, 3, 1, 64], stddev=0.1)) 
B1 = tf.Variable(tf.ones([64])/10)
W2 = tf.Variable(tf.truncated_normal([3, 3, 64, 32], stddev=0.1))
B2 = tf.Variable(tf.ones([32])/10)

# fully connected weights
W3 = tf.Variable(tf.truncated_normal([7 * 7 * 32, 256], stddev=0.1))
B3 = tf.Variable(tf.ones([256])/10)
W4 = tf.Variable(tf.truncated_normal([256, 10], stddev=0.1))
B4 = tf.Variable(tf.ones([10])/10)

### Building the Network

In [5]:
# The model
conv1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 2, 2, 1], padding='SAME') + B1)
conv2 = tf.nn.relu(tf.nn.conv2d(conv1, W2, strides=[1, 2, 2, 1], padding='SAME') + B2)

# reshape the output from the third convolution for the fully connected layer
flatten = tf.reshape(conv2, shape=[-1, 7 * 7 * 32])

# fully connected
dense1 = tf.nn.relu(tf.matmul(flatten, W3) + B3)
output = tf.matmul(dense1, W4) + B4

In our **Use Case**, we need a kind of prediction layer on top of our output layer. We use a, so called, Softmax layer or the prediction which we put on top of the output layer. 

In [6]:
prediction = tf.nn.softmax(output)

### Loss function and Optimizer

In general, the loss functions tells us how "good" or how "bad" our neural network is. This function is then minimized by the neural network so that the neural network gives us the best performance based on the defined loss function. For this purpose we are going to use the cross entropy loss function which is used very heavily in neural networks and seems to work very well.

**Note:** TensorFlow provides the ```softmax_cross_entropy_with_logits``` function to avoid numerical stability problems with log(0) which is NaN

In [7]:
# cross-entropy loss function (= -sum(Y_i * log(Yi)) ), normalised for batches of 100  images
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=Y)
cross_entropy = tf.reduce_mean(cross_entropy)*10

We are going to use the gradient descent method **Adam** to minimize our loss function. We also add a learning rate with an exponential decay. In our setting we start at a learning rate of $0.003$ and exponentially reduce it to $0.00001$.

In [8]:
# step for variable learning rate
step = tf.placeholder(tf.int32)

# the learning rate is: # 0.0001 + 0.003 * (1/e)^(step/2000)
learning_rate = 0.0001 +  tf.train.exponential_decay(0.003, step, 2000, 1/np.exp(1))
optimizer = tf.train.AdamOptimizer(learning_rate)
train_step = optimizer.minimize(cross_entropy)

### Training the network

We define a ```accuracy``` so that we can see whether our network actually improves while training

In [9]:
correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

**Hyperparameter**

In [10]:
epochs = 100
batch_size = 256

**Main**

In [None]:
loss_history = []
for e in range(epochs):
    for batch_i in tqdm(range(0, train_images.shape[0], batch_size)):
        data, label = train_images[batch_i:batch_i + batch_size], train_labels[batch_i:batch_i + batch_size]

        # run the computational graph and calculate loss + training step
        with tf.Session() as sess:
            init = tf.global_variables_initializer()
            sess.run(init)
            # optimizer will not return something which is why we store it into a variable called empty
            loss, empty = sess.run([cross_entropy, train_step], feed_dict={X: data, Y: label, step: e})
        # append to loss history
        loss_history.append(loss)

    with tf.Session() as sess:
        init = tf.global_variables_initializer()
        sess.run(init)
        train_acc = sess.run(accuracy, feed_dict={X:train_images, Y: train_labels})
        test_acc = sess.run(accuracy, feed_dict={X:test_images, Y: test_labels})
    print('epoch:', e, 'loss:', loss)
    print('test accuracy', test_acc, 'train accuracy', train_acc)

100%|██████████| 235/235 [00:40<00:00,  5.84it/s]
  0%|          | 1/235 [00:00<00:36,  6.48it/s]

epoch: 0 loss: 813.623
test accuracy 0.0729 train accuracy 0.072783336


100%|██████████| 235/235 [00:41<00:00,  5.62it/s]
  0%|          | 1/235 [00:00<00:35,  6.59it/s]

epoch: 1 loss: 1208.2542
test accuracy 0.1412 train accuracy 0.13365


100%|██████████| 235/235 [00:41<00:00,  5.71it/s]
  0%|          | 0/235 [00:00<?, ?it/s]

epoch: 2 loss: 752.2262
test accuracy 0.1732 train accuracy 0.16976666


100%|██████████| 235/235 [00:57<00:00,  4.12it/s]
  0%|          | 0/235 [00:00<?, ?it/s]

epoch: 3 loss: 1309.861
test accuracy 0.047 train accuracy 0.051916666


100%|██████████| 235/235 [01:04<00:00,  3.63it/s]
  0%|          | 0/235 [00:00<?, ?it/s]

epoch: 4 loss: 865.4551
test accuracy 0.1227 train accuracy 0.11986667


100%|██████████| 235/235 [01:10<00:00,  3.34it/s]
  0%|          | 0/235 [00:00<?, ?it/s]

epoch: 5 loss: 1136.0391
test accuracy 0.0991 train accuracy 0.0984


 76%|███████▌  | 179/235 [00:58<00:18,  3.04it/s]

### Evaluate model
Let us look at the optimization results. Final loss tells us how far we could reduce costs during traning process. Further we can use the first loss value as a sanity check and validate our implementation of the loss function works as intended. Recall loss value after first iteration should be $ log\:c$ with $c$ being number of classes. To visulize the whole tranings process we can plot losss values from each iteration as a loss curve. 

In [None]:
# check loss after last and first iteration
print('last iteration loss:',loss_history[-1])
print('first iteration loss:',loss_history[0])
# Plot a loss curve
plt.plot(loss_history)
plt.ylabel('loss')
plt.xlabel('iterations')

Evaluation above gave us some inside about the optimization process but did not quantified our final model. One possibility is to calculate model accuracy.

In [None]:
with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    acc = sess.run(accuracy, feed_dict={X:test_images, Y: test_labels,  pkeep: 1})

print(acc)