# TensorFlow Assignment: Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: 

### Convolutional Neural Network

Build a 2-layer CNN for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> convolution (32 5x5 filters) -> nonlinearity (ReLU) ->  (2x2 max pool) -> convolution (64 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> fully connected (256 hidden units) -> nonlinearity (ReLU) -> fully connected (10 hidden units) -> softmax

Some tips:
- The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

- Since CNNs a more complex than the logistic regression and MLP models you've worked with before, so you may find it helpful to use a more advanced optimizer. You're model will train faster if you use [`tf.train.AdamOptimizer`](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) instead of `tf.train.GradientDescentOptimizer`. A learning rate of 1e-4 is a good starting point.

### Load Packages and Data

In [1]:
### YOUR CODE HERE ###
%matplotlib inline
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tqdm import trange 

In [3]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("official/mnist/dataset.py", one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting official/mnist/dataset.py\train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting official/mnist/dataset.py\train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting official/mnist/dataset.py\t10k-images-idx3-ubyte.gz
Extracting official/mnist/dataset.py\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


### Create the CNN

In [5]:
#--Make sure things are empty and load graph--#
tf.reset_default_graph()
g = tf.get_default_graph()

#--Image and input placeholders--#
X = tf.placeholder(tf.float32,[None,784])
y = tf.placeholder(tf.float32,[None,10])
x_cnn = tf.reshape(X, [-1, 28, 28, 1])

#--Create the weights and biases--#

#Convolution Weights
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev = 0.1))
W2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev = 0.1))

#Convolution Biases
b1 = tf.Variable(tf.zeros([32]))
b2 = tf.Variable(tf.zeros([64]))

#Connected Weights
Wfc1 = tf.Variable(tf.truncated_normal([7*7*64,256], stddev = 0.1))
Wfc2 = tf.Variable(tf.truncated_normal([256,10], stddev = 0.1))

#Connected Biases
bfc1 = tf.Variable(tf.truncated_normal([256], stddev = 0.1))
bfc2 = tf.Variable(tf.truncated_normal([10], stddev = 0.1))

#--Create and pool the convolution layers--#
conv1 = tf.nn.max_pool((tf.nn.relu(tf.nn.conv2d(
    x_cnn, W1, strides = [1, 1, 1, 1], padding = "SAME") + b1)),
    ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = "SAME")

conv2 = tf.nn.max_pool((tf.nn.relu(tf.nn.conv2d(
    conv1, W2, strides = [1, 1, 1, 1], padding = "SAME") + b2)),
    ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = "SAME")

#--Create the fully connected layers--#

#First reshape the output of pooling convolutions
flat = tf.reshape(conv2, [-1, 7*7*64])

#Create the connected layers
latentscores = tf.nn.relu(tf.matmul(flat, Wfc1) + bfc1)
scores = tf.matmul(latentscores, Wfc2) + bfc2

#--Create Loss function and training steps--#

#loss function
avg_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=scores, labels = y))

#training
train_step = tf.train.AdamOptimizer(1e-4).minimize(avg_loss)

#initialize
init_all = tf.global_variables_initializer()

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



### Run the CNN

In [6]:
#--Create place to save weights--#
saver = tf.train.Saver()

#--Start the session--#
sess = tf.Session()
sess.run(init_all)

In [7]:
##-Run the CNN--#
for _ in trange(500):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={X: batch_xs, y: batch_ys})

#--Save the weights--#
saver.save(sess, "./checkpoints/reg_model.ckpt")

100%|████████████████████████████████████████████████████████████████████████████████| 500/500 [04:37<00:00,  1.80it/s]


'./checkpoints/reg_model.ckpt'

In [8]:
#--Test trained model--#
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(scores), 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={X: mnist.test.images, y: mnist.test.labels})))

Test accuracy: 0.9607999920845032


In [9]:
#End session
sess.close()

### Short answer

1\. How does the CNN compare in accuracy with yesterday's logistic regression and MLP models? How about training time?

The CNN has better accuracy than the MLP model from yesterday. Though, my CNN here had 96% accuracy and took ~4.5 minutes to run, but yesterday, my MLP took 12 seconds to run and had ~90% accuracy. I found that if I increased the training iterations for the MLP yesterday to 1000, which put the run time at ~2minutes, then I also got 95% accuracy with that model.

2\. How many trainable parameters are there in the CNN you built for this assignment?

*Note: By trainable parameters, I mean individual scalars. For example, a weight matrix that is 10x5 has 50.*

`[Your answer here]`

3\. When would you use a CNN versus a logistic regression model or an MLP?

Theoretically, I would use these scaling with complexity of the dataset needing to be analyzed; logictic regression, MLP, and then CNN as the dataset was more complex. The CNN, if I remember from class correctly, would be better for images with color layers (i.e. another level of complexity).