In [5]:
import tensorflow as tf
import numpy as np

In [6]:
from tensorflow.examples.tutorials.mnist import input_data
MNIST = input_data.read_data_sets('data/MNIST/', one_hot=True)

Extracting data/MNIST/train-images-idx3-ubyte.gz
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz


<img src="cnn.png">

### CNN Configurations
Understand how the configuration of the CNN is set based on the image above. 
* Assume that each convolutional layer is zero padded appropriately so that the convolutional layer does not decrease the width or height of the image. 
* After each convolutional layer and ReLu layer, there will be a max pool layer which decreases the image size by 2.
* Note that the number of filters in the convolution corresponds to the number of channels of the output volume

In [7]:
# Input
img_size = 28                               # MNIST images are 28x28# Size of MNIST image in 1D array
img_size_flat = img_size * img_size         # Size of MNIST image in 1D array
num_channels = 1                            # 1 channel because images are grayscale
num_classes = 10                            # Number of classes, one class for each of 10 digits.

# Convolutional Layer 1.
filter_size1 = 5          # Convolution filters are 5 x 5 pixels.
num_filters1 = 16         # There are 16 of these filters.

# Convolutional Layer 2.
filter_size2 = 5          # Convolution filters are 5 x 5 pixels.
num_filters2 = 36         # There are 36 of these filters.

# Fully-connected layer.
fc_size = 128             # Number of neurons in fully-connected layer.

Fill in the following helper functions to create new weight and bias tensors of a given shape. 
* Let the weights be initialized to values from the truncated normal distribution with a standard deviation of 0.1.
* Let the bias be initialized to a tensor of 0.1s. Note that the input to new_biases() will be a scalar number so accordingly decide the shape of the vector

In [21]:
def new_weights(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.1))

def new_biases(shape):
    return tf.Variable(tf.constant(0.1, shape=[shape]))

### Convolutional Layer
Fill in the following function to create a new convolutional layer using tf.layers.conv2d() by first creating a convolutional layer, then using pooling if required, then applying ReLU.
* Make sure zero padding is appropriately used so the width and height of the output of the convolutional layer match that of the input layer
* Use a stride of 1 in all directions
* If use_pooling = True, use 2x2 max pooling with tf.layers.max_pooling2d() to half the width and height of the input. Choose the appropriate stride for the pooling step
    * Hint: Use the formula $W_2 = \frac{W_1-F}{S} + 1$ where $W_2 = \frac{W_1}{2}$ to calculate the appropriate stride


In [22]:
# Fill in the dots
def new_conv_layer(input_layer, filter_size, num_filters, use_pooling=True):  
    
    output = tf.layers.conv2d(inputs=..., kernel_size = ..., 
                              filters=..., padding = 'SAME', strides = ...)
    
    if use_pooling:
        output = tf.layers.max_pooling2d(inputs = ..., pool_size = ..., strides = ...)
        
    output = tf.nn.relu(output)
    return output

Repeat the above function, but use tf.nn.conv2d() this time to create the convolutional layer.
* Remember: Unlike while using tf.layers.conv2d(), tf.nn.conv2d() requires the actual filter tensor as an input instead of just the size of the filter
    * To do this, use the new_weights() function defined above.
    * Also remember to add a bias for each filter after creating a convolutional layer with tf.nn.conv2d(). tf.layers.conv2d() adds a bias tensor by default

In [23]:
#Fill in the dots

def new_conv_layer_nn(input_layer, filter_size, num_filters, use_pooling=True):  
    input_shape = input_layer.get_shape().as_list()
    w_shape = [..., ..., input_shape[3], ...]
    weights = new_weights(w_shape)
    biases = new_biases(...)
    
    output = tf.nn.conv2d(input=..., filter = ..., padding = 'SAME', strides = ...)
    output += biases
    
    if use_pooling:
        output = tf.layers.max_pooling2d(inputs = output, pool_size=2, strides = 2)
        
    output = tf.nn.relu(output)
    return output

Test dimensions to make sure that both new_conv_layer functions work as expected.
<br>
* Create a placeholder value for each image with shape [batch size, image flattened size]
* Reshape that placeholder into the shape [batch size, image height, image width, number of channels]. This is the format of the input that is required for both tf.layers.conv2d() and tf.nn.conv2d()
    * Hint: Using -1 as one of the lengths of a dimension causes tensorflow to infer the length of that dimension depending on the size of the input and the lengths of the other dimensions provided. This will be useful because you don't yet know what the batch size is.
* Create a placeholder value for each label with shape [batch size, number of classes]

In [24]:
#Fill in the dots

X = tf.placeholder(...)
x_img = tf.reshape(X, [-1, img_size, img_size, num_channels])
y = tf.placeholder(...)

* Using the new_conv_layer() function created above, create layer 1 and layer 2 of the CNN based on the configuration of the CNN and the image above. 
* Check the shape of the output of layer 1 and layer 2 to make sure they are accurate. 
* Change new_conv_layer() function to new_conv_layer_nn() and make sure the output shapes are still accurate.

In [27]:
#Fill in the dots

layer_conv1 = new_conv_layer(input_layer=..., filter_size=..., num_filters=...)

In [28]:
layer_conv1

<tf.Tensor 'Relu_2:0' shape=(?, 14, 14, 16) dtype=float32>

In [10]:
#Fill in the dots

layer_conv2 = new_conv_layer(input_layer=..., filter_size=..., num_filters=...)

In [11]:
layer_conv2

<tf.Tensor 'Relu_1:0' shape=(?, 7, 7, 36) dtype=float32>

### Fully Connected Layers
Understand how this function to flatten a given layer of shape [batch size, ...] into [batch size, number of features] works
* Hint: Tensor_Shape.num_elements() returns the total number of elements in a tensor of given shape

In [12]:
def flatten_layer(layer):
    num_features = layer.get_shape()[1:4].num_elements()
    return tf.reshape(layer, [-1, num_features])

Create a function for a fully connected layer with the input layer having shape [batch size, number of input features] and the output layer having shape [batch size, num_outputs]
* Define the weight and bias tensors and apply them on the input layer
* Use a ReLU activation if appropriate

In [13]:
#Fill in the dots

def new_fc_layer(layer, num_outputs, use_relu=True):
    num_inputs = layer.get_shape().as_list()[1]
    w = new_weights(shape = [num_inputs, ...])
    b = new_biases(shape = ..)
    
    layer = ...
    
    if use_relu:
        layer = ...
    return layer

Flatten the last convolutional layer and check that its shape is equal to the product of the last three dimensions of layer_conv2.

In [14]:
#Fill in the dots

layer_flat = flatten_layer(...)
layer_flat

<tf.Tensor 'Reshape_1:0' shape=(?, 1764) dtype=float32>

Create a fully connected layer of the appropriate size based on the CNN configurations and check the size.

In [15]:
#Fill in the dots

layer_fc1 = new_fc_layer(layer = ..., num_outputs = ...)

In [16]:
layer_fc1

<tf.Tensor 'Relu_2:0' shape=(?, 128) dtype=float32>

Optional Dropout:

In [17]:
#Fill in the dots (Optional)

#keep_prob = tf.placeholder(...)
#layer_fc1 = tf.nn.dropout(...)
#layer_fc1

<tf.Tensor 'dropout/mul:0' shape=(?, 128) dtype=float32>

Create the final fully connected layer and check the size. ReLu is set to fall because we need access to the logits.

In [18]:
#Fill in the dots

layer_fc2 = new_fc_layer(layer = ..., num_outputs = ..., use_relu=False)

In [19]:
layer_fc2

<tf.Tensor 'add_1:0' shape=(?, 10) dtype=float32>

Apply softmax to the final fully connected layer.

In [20]:
y_pred = tf.nn.softmax(layer_fc2)

Fill in the following variables of the class of the predicted and true y value by using argmax.

In [21]:
y_pred_cls = tf.argmax(y_pred, dimension = 1)
y_cls = tf.argmax(y, dimension = 1)

Instructions for updating:
Use the `axis` argument instead
Instructions for updating:
Use the `axis` argument instead


### Optimization and Accuracy
Set entropy to be cross entropy and the loss to be the mean of the entropy. Use the Adam Optimizer with a learning rate of 0.01 to minimize the loss function.

In [22]:
#Fill in the dots

entropy = tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...)
loss = tf.reduce_mean(...)
optimizer = tf.train.AdamOptimizer(learning_rate=...).minimize(...)

Compute the accuracy by:
* Checking if y_pred_cls = y_cls and casting this value as a float
* Finding the mean of y_pred_cls = y_cls for each example

In [23]:
accuracy = tf.reduce_mean(tf.cast(tf.equal(y_pred_cls, y_cls), tf.float32))

### Running 
Create a new interactive session and initialize all variables.

In [24]:
#Fill in the dots

sess = ...
sess.run(...)

Fill in the optimize function which will run a certain number of epochs.

In [25]:
total_epochs = 0

In [26]:
def optimize(epochs):
    global total_epochs
    batch_size = 64
    num_batches = int(MNIST.train.num_examples/batch_size)
    for epoch in range(total_epochs, total_epochs+epochs):
        total_loss = 0
        for batch in range(num_batches):
            x, y_ = MNIST.train.next_batch(batch_size)
            _, l = sess.run([optimizer, loss], feed_dict = {...})
            total_loss += l
        print("Epoch {0}: {1}".format(epoch, total_loss))
    total_epochs += epochs

Optimize for a few iterations and print the accuracy.

In [27]:
optimize(3)

Epoch 0: 180.70307449204847
Epoch 1: 108.97698271943955
Epoch 2: 101.18824915075675
CPU times: user 6min 47s, sys: 43.7 s, total: 7min 31s
Wall time: 2min 25s


In [28]:
# Testing
print("Computing accuracy ...")
X_batch, y_batch = MNIST.test.next_batch(MNIST.test.num_examples)
total_accuracy = sess.run(accuracy, feed_dict={...})

print ("Accuracy {0}".format(total_accuracy))

Computing accuracy ...
Accuracy 0.9800000190734863


### Extra Exercises

#### Dropout
Add a dropout layer after the first fully connected layer by creating a placeholder keep_prob, which is a parameter to tf.nn.dropout(). This allows dropout to be on during training and off during testing. When training the model, set keep_prob to a fraction (such as 0.5) and when testing, set keep_prob to 1.0.