# TensorFlow Assignment: Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: Rachel Kositsky

### Convolutional Neural Network

Build a 2-layer CNN for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> convolution (32 5x5 filters) -> nonlinearity (ReLU) ->  (2x2 max pool) -> convolution (64 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> fully connected (256 hidden units) -> nonlinearity (ReLU) -> fully connected (10 hidden units) -> softmax

Some tips:
- The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

- Since CNNs a more complex than the logistic regression and MLP models you've worked with before, so you may find it helpful to use a more advanced optimizer. You're model will train faster if you use [`tf.train.AdamOptimizer`](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) instead of `tf.train.GradientDescentOptimizer`. A learning rate of 1e-4 is a good starting point.

In [1]:
import numpy as np
import tensorflow as tf
from tqdm import trange

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [2]:
# Construct graph #

tf.reset_default_graph()

# Input X placeholder. Reshape flat image to 4D batched image.
x_flat = tf.placeholder(tf.float32, [None, 784])
x_reshaped = tf.reshape(x_flat, [-1, 28, 28, 1])

# Output y placeholder
y = tf.placeholder(tf.float32, [None, 10])

# Convolutional layer 1 (5x5, 32 filters)
W1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b1 = tf.Variable(tf.zeros([32]))
conv1 = tf.nn.relu(tf.nn.conv2d(x_reshaped, W1, strides=[1, 1, 1, 1], padding="SAME") + b1)

# Pooling layer 1 (2x2 max pool)
max_pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

# Convolutional layer 2 (5x5, 64 filters)
W2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))
b2 = tf.Variable(tf.zeros([64]))
conv2 = tf.nn.relu(tf.nn.conv2d(max_pool1, W2, strides=[1, 1, 1, 1], padding="SAME") + b2)

# Pooling layer 2 (2x2 max pool)
max_pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

# flatten previous layer to a vector so you can add a fully connected layer next
max_pool2_flat = tf.reshape(max_pool2, [-1, 7*7*64])

# Fully connected layer 1
num_hidden_1 = 256
W_full_1 = tf.Variable(tf.truncated_normal([7*7*64, num_hidden_1], stddev=0.1))
b_full_1 = tf.Variable(tf.zeros([num_hidden_1]))
full_1 = tf.nn.relu(tf.matmul(max_pool2_flat, W_full_1) + b_full_1)

# Fully connected layer 2: score feature edition
num_scores = 10
W_scores = tf.Variable(tf.truncated_normal([num_hidden_1, num_scores], stddev=0.1))
b_scores = tf.Variable(tf.zeros([num_scores]))
scores = tf.matmul(full_1, W_scores) + b_scores

# Softmax loss calculation
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=scores, labels=y))
train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy)

# Accuracy measure
# tf.argmax(y) or scores: gives label for image according to true labels, CNN scores
correct_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(scores, axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Variable Initializer
init_op = tf.global_variables_initializer()

In [7]:
# Train network and print accuracy #

num_epochs = 50
num_per_batch = 100
num_iters = int(len(mnist.train.images) / num_per_batch)

with tf.Session() as sess:
    sess.run(init_op)
    
    # Train
    for epoch in trange(num_epochs):
        for i in range(num_iters):
            start = i*num_per_batch
            end = (i+1)*num_per_batch
            sess.run(train_step, 
                     feed_dict = {x_flat: mnist.train.images[start:end],
                                  y: mnist.train.labels[start:end]})
            
    # Print accuracy
        print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={x_flat: mnist.test.images, y: mnist.test.labels})))

  2%|▏         | 1/50 [01:12<59:20, 72.67s/it]

Test accuracy: 0.9649999737739563


  4%|▍         | 2/50 [02:23<57:30, 71.89s/it]

Test accuracy: 0.977400004863739


  6%|▌         | 3/50 [03:34<56:03, 71.57s/it]

Test accuracy: 0.9812999963760376


  8%|▊         | 4/50 [04:46<54:56, 71.67s/it]

Test accuracy: 0.9829000234603882


 10%|█         | 5/50 [05:57<53:37, 71.50s/it]

Test accuracy: 0.984000027179718


 12%|█▏        | 6/50 [07:07<52:12, 71.20s/it]

Test accuracy: 0.9851999878883362


 14%|█▍        | 7/50 [08:16<50:52, 70.98s/it]

Test accuracy: 0.9866999983787537


 16%|█▌        | 8/50 [09:24<49:24, 70.59s/it]

Test accuracy: 0.9866999983787537


 18%|█▊        | 9/50 [10:32<48:00, 70.25s/it]

Test accuracy: 0.987500011920929


 20%|██        | 10/50 [11:40<46:40, 70.00s/it]

Test accuracy: 0.9884999990463257


 22%|██▏       | 11/50 [12:47<45:21, 69.78s/it]

Test accuracy: 0.9883000254631042


 24%|██▍       | 12/50 [13:54<44:03, 69.58s/it]

Test accuracy: 0.9886999726295471


 26%|██▌       | 13/50 [15:02<42:49, 69.44s/it]

Test accuracy: 0.9886999726295471


 28%|██▊       | 14/50 [16:10<41:35, 69.31s/it]

Test accuracy: 0.9890999794006348


 30%|███       | 15/50 [17:17<40:20, 69.15s/it]

Test accuracy: 0.9884999990463257


 32%|███▏      | 16/50 [18:24<39:07, 69.04s/it]

Test accuracy: 0.9878000020980835


 34%|███▍      | 17/50 [26:45<51:55, 94.42s/it]

Test accuracy: 0.9869999885559082


 36%|███▌      | 18/50 [39:07<1:09:32, 130.41s/it]

Test accuracy: 0.9876000285148621


 38%|███▊      | 19/50 [40:21<1:05:50, 127.44s/it]

Test accuracy: 0.9873999953269958


 40%|████      | 20/50 [41:26<1:02:10, 124.34s/it]

Test accuracy: 0.9886000156402588


 42%|████▏     | 21/50 [42:31<58:43, 121.52s/it]  

Test accuracy: 0.9887999892234802


 44%|████▍     | 22/50 [43:37<55:31, 118.98s/it]

Test accuracy: 0.9891999959945679


 46%|████▌     | 23/50 [44:43<52:29, 116.66s/it]

Test accuracy: 0.989300012588501


 48%|████▊     | 24/50 [45:49<49:38, 114.56s/it]

Test accuracy: 0.9896000027656555


 50%|█████     | 25/50 [46:55<46:55, 112.63s/it]

Test accuracy: 0.9890999794006348


 52%|█████▏    | 26/50 [48:01<44:20, 110.84s/it]

Test accuracy: 0.9908000230789185


 54%|█████▍    | 27/50 [49:07<41:51, 109.18s/it]

Test accuracy: 0.9902999997138977


 56%|█████▌    | 28/50 [50:13<39:27, 107.63s/it]

Test accuracy: 0.9908000230789185


 58%|█████▊    | 29/50 [51:19<37:10, 106.21s/it]

Test accuracy: 0.991100013256073


 60%|██████    | 30/50 [52:26<34:57, 104.89s/it]

Test accuracy: 0.9914000034332275


 62%|██████▏   | 31/50 [53:32<32:49, 103.64s/it]

Test accuracy: 0.9908000230789185


 64%|██████▍   | 32/50 [54:39<30:44, 102.47s/it]

Test accuracy: 0.9908000230789185


 66%|██████▌   | 33/50 [55:45<28:43, 101.38s/it]

Test accuracy: 0.9919000267982483


 68%|██████▊   | 34/50 [56:52<26:45, 100.36s/it]

Test accuracy: 0.9908999800682068


 70%|███████   | 35/50 [57:58<24:50, 99.40s/it] 

Test accuracy: 0.9912999868392944


 72%|███████▏  | 36/50 [59:05<22:58, 98.48s/it]

Test accuracy: 0.991599977016449


 74%|███████▍  | 37/50 [1:00:12<21:09, 97.62s/it]

Test accuracy: 0.9912999868392944


 76%|███████▌  | 38/50 [1:01:19<19:21, 96.83s/it]

Test accuracy: 0.9911999702453613


 78%|███████▊  | 39/50 [1:02:27<17:36, 96.08s/it]

Test accuracy: 0.991599977016449


 80%|████████  | 40/50 [1:03:33<15:53, 95.34s/it]

Test accuracy: 0.9918000102043152


 82%|████████▏ | 41/50 [1:04:39<14:11, 94.63s/it]

Test accuracy: 0.9916999936103821


 84%|████████▍ | 42/50 [1:05:46<12:31, 93.96s/it]

Test accuracy: 0.9919000267982483


 86%|████████▌ | 43/50 [1:06:52<10:53, 93.31s/it]

Test accuracy: 0.9919000267982483


 88%|████████▊ | 44/50 [1:07:58<09:16, 92.70s/it]

Test accuracy: 0.9922000169754028


 90%|█████████ | 45/50 [1:09:05<07:40, 92.12s/it]

Test accuracy: 0.9922999739646912


 92%|█████████▏| 46/50 [1:10:12<06:06, 91.57s/it]

Test accuracy: 0.9922999739646912


 94%|█████████▍| 47/50 [1:11:18<04:33, 91.03s/it]

Test accuracy: 0.9922999739646912


 96%|█████████▌| 48/50 [1:12:24<03:01, 90.52s/it]

Test accuracy: 0.9921000003814697


 98%|█████████▊| 49/50 [1:13:31<01:30, 90.02s/it]

Test accuracy: 0.9919000267982483


100%|██████████| 50/50 [1:14:37<00:00, 89.55s/it]

Test accuracy: 0.9916999936103821





### Short answer

1\. How does the CNN compare in accuracy with yesterday's logistic regression and MLP models? How about training time?

The CNN is more accurate than the MLP at 99.17% accuracy vs. the MLP at 97.31% accuracy.

Yesterday's MLP model took 1m20s for me to train for 50 epochs. The CNN took 1h14m to complete 50 epochs, making it 57x slower.

2\. How many trainable parameters are there in the CNN you built for this assignment?

*Note: By trainable parameters, I mean individual scalars. For example, a weight matrix that is 10x5 has 50.*

There are 808,138 trainable parameters in the CNN I built for this assignment.

Breakdown by variable:
- `W1`: $32*5*5 = 800$
- `b1`: $32$
- `W2`: $64*5*5 = 1600$
- `b2`: $64$
- `W_full_1`: $7*7*64*256 = 802816$
- `b_full_1`: $256$
- `W_scores`: $256*10 = 2560$
- `b_scores`: $10$

3\. When would you use a CNN versus a logistic regression model or an MLP?

When the additional capabilities of a CNN are required to signficantly improve the accuracy. A CNN will likely have more trainable parameters, increasing its training time relative to an MLP. This can lead to overfitting, so a CNN could be used only if enough data are available to train the CNN. Overfitting can happen with logistic regression or an MLP as well, but is more of an issue with the increased number of parameters in a CNN.

One example of this additional capability is transitional and rotational invariance, which a CNN is capable of learning while an MLP or logistic regression cannot.