# Convolutional Neural Networks
In Q2, we will build a convolution neural neural net with two convolution layers. For the tutorial on the convolution neural networks, please check the [link](https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2/).

In [1]:
# I had several errors (that I found because of shape errors)
# tf.get_shape is a wonderful function

In [2]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

## Step 1: Read in data
Use tf learn's built-in function to load MNIST data from
the folder MNIST_data.

HINT: Use `input_data.read_data_sets()` as in Q1.

In [3]:
# TODO
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


## Step 2: Define paramaters for the model

In [4]:
LEARNING_RATE = 1e-4
BATCH_SIZE = 32

## Step 3: Create placeholders for data points and labels
HINT: use `tf.placeholder` as in Q1

In [5]:
x = tf.placeholder(tf.float32, [None, 784], name='x')
y = tf.placeholder(tf.int32, [None, 10], name='y')

## Step 4: Reshape x

Convolution is an operator for 2d images. Therefore, we reshape the flattened image to a 2d array. The shape of x is (batch_size, height * width). Convert `x` to `x_image` which is a 4-d tensor of the shape (batch_size, height, width, channels).

In [6]:
x_image = tf.reshape(x, [-1, 28, 28, 1])

## Step 5: Create the first convolution layer
Let's build our first convolution layer. [5, 5, 1, 32] stands for [filter_height, filter_width, in_channels, out_channels]. The number of `in_channels` is 1, since we have grayscale images, thus only 1 color channel. We will be computing 32 features for each 5x5 patch, hence 32 `out_channels`.

The weights are randomly initialized following a normal distribution. Don't forget to add the bias term to the convolution layer. Then, we feed the output of the convolution layer into the activation function. We use a [ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks) activation layer. It is followed by the [max-pooling](https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer) layer.

In [7]:
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1)) #truncated normal distr to avoid extreme values
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32])) #slightly positive bias works better with ReLU
h_conv1 = tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1
h_activate1 = tf.nn.relu(h_conv1)
h_pool1 = tf.nn.max_pool(h_activate1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
#Note: the 4 dimensions of strides and ksize correspond to each dimension of the input tensor image

## Step 6: Create the second convolution layer
We need to implement another convolution layer. The filter size is also 5x5 and the number of `out_channels` chosen is 64.

HINT: refer to Step 5

In [8]:
W_conv2 = tf.Variable(tf.truncated_normal([5,5,32,64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))
# take input from the previous pooling layer:
h_conv2 = tf.nn.conv2d(h_pool1, W_conv2, strides=[1,1,1,1], padding='SAME') + b_conv2
h_activate2 = tf.nn.relu(h_conv2)
h_pool2 = tf.nn.max_pool(h_activate2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

## Step 7: Flatten h_pool2
The shape of `h_pool2` will be [None, 7, 7, 64] (`None`, as usual, stands for an arbitrary batch size, but do you understand why we get 7x7x64?).

Now, we need to flatten `h_pool2` to the shape (-1, 7\*7\*64) because later it will be passed on to fully connected layer.

HINT: use `tf.reshape`

In [9]:
# Interestingly -1 is cannot be exchanged for None like in tf.placeholder
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) 

## Step 8: Fully connected layer
The fully connected layer will have 1024 units and is very similar to step 4
in Q1. However, you need to feed the output of this layer to a ReLU
activation layer rather than the softmax cross entropy.

HINT: use `tf.nn.relu` and `tf.matmul`

In [10]:
# TODO
W_fc1 = tf.Variable(tf.random_normal(shape=[7*7*64,1024], stddev=0.01), name="dense-1-weights")
b_fc1 = tf.Variable(tf.zeros([1,1024]), name="dense-1-bias")
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

## Step 9: Dropout layer 
[Dropout](https://en.wikipedia.org/wiki/Convolutional_neural_network#Dropout) is used to regularize the neural network during training. The variable `dropout` is defined as a placeholder, so we need to pass its value to the model later (the value is the probability that a neuron's output is kept after the dropout).

In [11]:
dropout = tf.placeholder(tf.float32, name='dropout')
h_fc1_drop = tf.nn.dropout(h_fc1, dropout)

## Step 10: Readout layer
We add another fully connected layer. This time, the output size is 10, which is the number of classes. Don't forget to add the bias term! The output serves actually as logits (do remember the logits in Q1?).

HINT: refer to Step 8.

In [12]:
# TODO
W_fc2 = tf.Variable(tf.random_normal(shape=[1024,10], stddev=0.01), name="dense-2-weights")
b_fc2 = tf.Variable(tf.zeros([1,10]), name="dense-2-bias")
logits = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

## Step 11: Define the loss function
Compute the mean cross entropy loss with the logits and the labels. Notice that this final layer is exactly analogous to the single "layer" we had in Q1.

HINT: use `softmax_cross_entropy_with_logits`

In [13]:
entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y, name='loss')
loss = entropy#tf.reduce_mean(entropy) <- Don't want to 

## Step 12: Optimization
To optimize the neural net, we will use the Adam algorithm. What is the Adam? It is a gradient-based optimization algorithm. For details, check this [link](http://ruder.io/optimizing-gradient-descent/).

HINT: use `tf.train.AdamOptimizer`

In [14]:
train_step = tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss)

## Step 13: Compute the accuracy

In [15]:
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

## Step 14: Train the model

In [18]:
sess = tf.Session()
# initialize
sess.run(tf.global_variables_initializer())
for i in range(3000):
    batch = mnist.train.next_batch(BATCH_SIZE)
    if i % 100 == 0:#compute training accuracy after every 100 batches
        train_accuracy = sess.run(accuracy, feed_dict={
            x: batch[0], y: batch[1], dropout: 1.0})
        print('step {}, training accuracy {}'.format(i, train_accuracy))
        
    # TODO: run train_step. Don't forget to include the dropout placeholder in the feed_dict! (set it to 0.75)
    sess.run(train_step, feed_dict={
            x: batch[0], y: batch[1], dropout: 0.75})

step 0, training accuracy 0.1875
step 100, training accuracy 0.78125
step 200, training accuracy 0.78125
step 300, training accuracy 1.0
step 400, training accuracy 0.9375
step 500, training accuracy 0.84375
step 600, training accuracy 1.0
step 700, training accuracy 0.96875
step 800, training accuracy 0.875
step 900, training accuracy 0.84375
step 1000, training accuracy 0.90625
step 1100, training accuracy 0.96875
step 1200, training accuracy 0.96875
step 1300, training accuracy 1.0
step 1400, training accuracy 1.0
step 1500, training accuracy 0.90625
step 1600, training accuracy 1.0
step 1700, training accuracy 0.96875
step 1800, training accuracy 0.9375
step 1900, training accuracy 0.9375
step 2000, training accuracy 1.0
step 2100, training accuracy 0.96875
step 2200, training accuracy 0.96875
step 2300, training accuracy 0.96875
step 2400, training accuracy 1.0
step 2500, training accuracy 1.0
step 2600, training accuracy 1.0
step 2700, training accuracy 0.96875
step 2800, trainin

## Step 15: Test the model

In [19]:
try:
    print('test accuracy {}'.format(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels, dropout: 1.0})))
except tf.errors.InvalidArgumentError as e:
    print(e.message)
    print(e)

test accuracy 0.981599986553
