# A Convolutional Neural Network in Tensorflow

[Prashant Brahmbhatt](www.github.com/hashbanger)

Classifying MNIST dataset.  
Detailed Account of each step.

_____

## Getting the data

In [0]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

In [2]:
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py fr

In [4]:
print("Size of:")
print("- Training-set:\t\t{}".format(len(mnist.train.labels)))
print("- Test-set:\t\t{}".format(len(mnist.test.labels)))

Size of:
- Training-set:		55000
- Test-set:		10000


Hyperparameters that will be required

In [0]:
learning_rate = 0.0001
epochs = 10
batch_size = 50

Now we declare the training data placeholders

x : Since the images in this data are 28 * 28, after flattened input required will be of size  [ batch_size, 784 ]  

In [0]:
x = tf.placeholder(tf.float32, [None, 784])

Before we can use this data in the **TensorFlow convolution** and **pooling** functions, such as **conv2d()** and **max_pool()**   we need to reshape the data as these functions take **4D** data only.   
The format of the data to be supplied is **[i, j, k, l]** where   

**i** is the number of training samples,   

**j** is the height of the image,   

**k** is the weight and   

**l** is the channel number.  

Because we have a greyscale image, **l** will always be equal to 1 (if we had an **RGB image**, it would be equal to **3**).  

Since we don’t know the size of the first dimension of x, so we don’t know what **i** is.  However, **tf.reshape()** allows us to put -1 in place of i and it will dynamically reshape based on the number of training samples as the training is performed. 

In [0]:
x_shaped = tf.reshape(x, [-1, 28, 28, 1])

Now we declare output data placeholder.  

for y : Since we used one hot encoder, there will be 10 columns, one for each digit.

In [0]:
y = tf.placeholder(tf.float32, [None, 10])

We have to use mnist.train.next_batch() to extract the digits labels as a one-hot vector   
a digit of “3” will be represented in one-hot encoding as **[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]**

## Setting up the Convolutional Layer

We have to implement the following: 
    
- To hold the shape of the weights that determine the behaviour of the **5×5 **convolutional filter.  The format that the **conv2d()** function receives for the filter is: **[filter_height, filter_width, in_channels, out_channels].**  The height and width of the filter are provided in the **filter_shape** variables (in this case **[5, 5]**).  

 **Input Channels** for first layer it will be 1, but in further layers if will be depending upon previous layer output so will become 32 for second layer.  
 

- Set up **weights** and **biases** as per the required dimensions. We use Normal Distribution.  


- Now we setup convolutional layer, using **tf.nn.conv2d()** , whose first two arguments are input_data and weights,
  The size of the weights tensor shows TensorFlow what size the convolutional filter should be.  
  third argument is the **strides**. Its format is **[st, sx, sy, sc]**. where,  
  
  **sx**: stride in x direction (using 1 here)  
  
  **sy**: stride along y axis (using 1 here)  
  
  **st** and **sc** : always are used 1 or we would move the filter between training examples and between channels which is undesirable.
 
 
-  The final parameter is the padding. Padding determines the output size of each channel and when it is set to **“SAME”** it produces dimensions of:  

  **out_height = ceil( float ( in_height ) / float ( strides[1] ) )  
  out_width  = ceil( float ( in_width ) / float ( strides[2] ) )**
  
  So, the output dims remain same as input, otherwise for 28 \* 28 images output would be **(x-n+1,y-m+1)** = 24 \* 24.
  
  
  
- Add **bias** and **activate**.


- **Max Pooling**, with the initial arg as the convolution output layer, **ksize** is the size of the pooling window and the **strides** is the striding argument same as previously defined. In pooling we use strides of 2.  
 As the above fomula, if we use **SAME** padding and stride of 2 will cut our dimensions to just half.

In [0]:
def create_new_conv_layer(input_data, num_input_channels, num_filters, filter_shape, pool_shape, name):
    # the input shape to be used in tf.nn.conv2d()
    conv_filt_shape = [filter_shape[0], filter_shape[1], num_input_channels,
                      num_filters]

    # initialization of the bias and the weights for the filter
    weights = tf.Variable(tf.truncated_normal(conv_filt_shape, stddev=0.03),
                                      name=name+'_W')
    bias = tf.Variable(tf.truncated_normal([num_filters]), name=name+'_b')

    # forming a covolution layer
    out_layer = tf.nn.conv2d(input_data, weights, [1, 1, 1, 1], padding='SAME')

    # adding the bias
    out_layer += bias

    # applying a ReLU non-linear activation
    out_layer = tf.nn.relu(out_layer)

    # the max pooling operation on the activated output layer
    ksize = [1, pool_shape[0], pool_shape[1], 1]
    strides = [1, 2, 2, 1]
    out_layer = tf.nn.max_pool(out_layer, ksize=ksize, strides=strides, 
                               padding='SAME')

    return out_layer

### Creating two convolution layers

In [0]:
layer1 = create_new_conv_layer(x_shaped, 1, 32, [5, 5], [2, 2], name='layer1')
layer2 = create_new_conv_layer(layer1, 32, 64, [5, 5], [2, 2], name='layer2')

### The Fully Connected Layers

Now first we have to **flatten** out our output of pooling.  
Since we have pooled two times the image dimensions have been halved twice.

In [0]:
flattened = tf.reshape(layer2, [-1, 7 * 7 * 64])

Initializing the **weights** and **bias** for the neural net, with 1000 nodes in the hidden layer.  
We initialise the values of the weights using a **random normal distribution** with a **mean of zero** and a standard deviation of **0.03**.    
Then activating.

In [0]:
# setup some weights and bias values for this layer, then activate with ReLU
wd1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1000], stddev=0.03), name='wd1')
bd1 = tf.Variable(tf.truncated_normal([1000], stddev=0.01), name='bd1')
dense_layer1 = tf.matmul(flattened, wd1) + bd1
dense_layer1 = tf.nn.relu(dense_layer1)

Now we define second dense layer.  
In Tensorflow **Logits** is a name that it is thought to imply that this Tensor is the quantity that is being mapped to probabilities by the Softmax function.

In [0]:
# another layer with softmax activations
wd2 = tf.Variable(tf.truncated_normal([1000, 10], stddev=0.03), name='wd2')
bd2 = tf.Variable(tf.truncated_normal([10], stddev=0.01), name='bd2')
dense_layer2 = tf.matmul(dense_layer1, wd2) + bd2
y_ = tf.nn.softmax(dense_layer2)

### The Cross Entropy Cost Function

Here we now use the cross entropy function denoted as :

### $$J = -\frac{1}{m} \sum_{i=1}^m \sum_{j=1}^n y_j^{(i)}log(y_j\_^{(i)}) + (1 – y_j^{(i)})log(1 – y_j\_^{(i)})$$

Where 
$ y_j^{(i)} $ is the ith training label for output node j,   
$y_j\_^{(i)}$ is the ith predicted label for output node j,  
**m** is the number of training / batch samples and **n** is the number of output nodes

TensorFlow provides a handy function which applies soft-max followed by cross-entropy loss.  

 The function first takes the soft-max of the matrix multiplication, then compares it to the training target using cross-entropy.  The result is the cross-entropy calculation per training sample, so we need to reduce this tensor into a scalar (a single value).  To do this we use **tf.reduce_mean()** which takes a mean of the tensor.

In [0]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=dense_layer2, labels=y))

### Training the Neural Net

Using the **Mini-Batch Gradient Descent** approach to train. we proceed as ;  


- Create an optimiser  
    We are using the Adam optimizer learn about that [here](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/)
- Create correct prediction and accuracy evaluation operations
- Initialise the operations
- Determine the number of batch runs within an training epoch
  - For each epoch
    * For each batch
          - Extract the batch data  
          - Run the optimiser and cross-entropy operations  
          - Add to the average cost    
         
         
     * Calculate the current test accuracy
     * Print out some results
- Calculate the final test accuracy and print


In [15]:
# creating an optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

# operations to determine accuracy 
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# setup the initialisation operator
init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    # initialise the variables
    sess.run(init_op)
    
    total_batch = int(len(mnist.train.labels) / batch_size)
    
    for epoch in range(epochs):
        
        avg_cost = 0
        
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
            
            _, c = sess.run([optimizer, cross_entropy], 
                            feed_dict={x: batch_x, y: batch_y})
            #We don't care much about output from optimizer so assigned it to _
            
            avg_cost += c / total_batch
            
        test_acc = sess.run(accuracy, 
                       feed_dict={x: mnist.test.images, y: mnist.test.labels})
        
        print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost), "test accuracy: {:.3f}".format(test_acc))

    print("\nTraining complete!")
    print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))

Epoch: 1 cost = 0.712 test accuracy: 0.939
Epoch: 2 cost = 0.156 test accuracy: 0.969
Epoch: 3 cost = 0.099 test accuracy: 0.978
Epoch: 4 cost = 0.072 test accuracy: 0.977
Epoch: 5 cost = 0.058 test accuracy: 0.983
Epoch: 6 cost = 0.048 test accuracy: 0.985
Epoch: 7 cost = 0.041 test accuracy: 0.987
Epoch: 8 cost = 0.035 test accuracy: 0.989
Epoch: 9 cost = 0.030 test accuracy: 0.987
Epoch: 10 cost = 0.026 test accuracy: 0.990

Training complete!
0.99


So we got accuracy of around 99% which is pretty cool

### de nada!