# TensorFlow Assignment: Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: 

### Convolutional Neural Network

Build a 2-layer CNN for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> convolution (32 5x5 filters) -> nonlinearity (ReLU) ->  (2x2 max pool) -> convolution (64 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> fully connected (256 hidden units) -> nonlinearity (ReLU) -> fully connected (10 hidden units) -> softmax

Some tips:
- The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

- Since CNNs a more complex than the logistic regression and MLP models you've worked with before, so you may find it helpful to use a more advanced optimizer. You're model will train faster if you use [`tf.train.AdamOptimizer`](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) instead of `tf.train.GradientDescentOptimizer`. A learning rate of 1e-4 is a good starting point.

In [1]:
### YOUR CODE HERE ###
%matplotlib inline
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("official/mnist/dataset.py", one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting official/mnist/dataset.py\train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting official/mnist/dataset.py\train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting official/mnist/dataset.py\t10k-images-idx3-ubyte.gz
Extracting official/mnist/dataset.py\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [10]:
##### tf.reset_default_graph()

#get a default empty graph
g = tf.get_default_graph()

#create the two placeholders for images and levels
X = tf.placeholder(tf.float32,[None,784])
y = tf.placeholder(tf.float32,[None,10])

# Create image input placeholder
x_cnn = tf.reshape(X, [-1, 28, 28, 1])

# Create convolutional kernel variable
#5x5 = kernel variable
#1= channel in
#16 = filters
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev = 0.1))

# Create bias variable
b1 = tf.Variable(tf.zeros([32]))

# Apply convolutional layer
conv1_preact = tf.nn.conv2d(x_cnn, W1, strides = [1, 1, 1, 1], padding = "SAME") + b1
conv1 = tf.nn.relu(conv1_preact)

#Max pool 1
max_pool1 = tf.nn.max_pool(conv1, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = "SAME")


# 2nd layer variables
W2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev = 0.1))
b2 = tf.Variable(tf.zeros([64]))

# Apply 2nd convolutional layer
conv2_preact = tf.nn.conv2d(max_pool1, W2, strides = [1, 1, 1, 1], padding = "SAME") + b2
conv2 = tf.nn.relu(conv2_preact)

#MAx pool 2
max_pool2 = tf.nn.max_pool(conv2, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = "SAME")
flat = tf.reshape(max_pool2, [-1, 7*7*64])


#Fully connected layer
Wfc1 = tf.Variable(tf.truncated_normal([7*7*64,256], stddev = 0.1))
bfc1 = tf.Variable(tf.truncated_normal([256], stddev = 0.1))
Wfc2 = tf.Variable(tf.truncated_normal([256,10], stddev = 0.1))
bfc2 = tf.Variable(tf.truncated_normal([10], stddev = 0.1))



latentscores = tf.nn.relu(tf.matmul(flat, Wfc1) + bfc1)
scores = tf.matmul(latentscores, Wfc2) + bfc2

#loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=scores,
                                              labels = y))
avg_loss = tf.reduce_mean(loss)
train_step = tf.train.AdamOptimizer(1e-4).minimize(avg_loss)

init_all = tf.global_variables_initializer()


In [11]:
sess = tf.Session()
sess.run(init_all)

In [12]:
from tqdm import trange         

#for iter in trange(1):
#runs 550 iterations of the training
for _ in trange(500):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={X: batch_xs, y: batch_ys})
    
# Test trained model
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(scores), 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={X: mnist.test.images, y: mnist.test.labels})))
sess.close()

100%|████████████████████████████████████████████████████████████████████████████████| 500/500 [03:40<00:00,  2.27it/s]


Test accuracy: 0.9610999822616577


### Short answer

1\. How does the CNN compare in accuracy with yesterday's logistic regression and MLP models? How about training time?

`[Your answer here]`

2\. How many trainable parameters are there in the CNN you built for this assignment?

*Note: By trainable parameters, I mean individual scalars. For example, a weight matrix that is 10x5 has 50.*

`[Your answer here]`

3\. When would you use a CNN versus a logistic regression model or an MLP?

`[Your answer here]`