# Convolutional Neural Networks

![title](img/convolution-schematic.gif)

The above is an example of a **convolution** with a 3x3 filter and a stride of 1 being applied to data with a range of 0 to 1. The convolution for each 3x3 section is calculated against the weight, [[1, 0, 1], [0, 1, 0], [1, 0, 1]], then a bias is added to create the convolved feature on the right. In this case, the bias is zero. In TensorFlow, this is all done using **tf.nn.conv2d()** and **tf.nn.bias_add()**.

**Quiz Setup:**

H = height, W = width, D = depth

    We have an input of shape 32x32x3 (HxWxD)
    20 filters of shape 8x8x3 (HxWxD)
    A stride of 2 for both the height and width (S)
    Valid padding of size 1 (P)

Formula for calculating the new height or width:

**new_height = (input_height - filter_height + 2 * P)/S + 1**  
**new_width = (input_width - filter_width + 2 * P)/S + 1**

What's the shape of the output? The answer format is HxWxD

The answer is **14x14x20**.

We can get the new height and width with the formula resulting in:

(32 - 8 + 2 * 1)/2 + 1 = 14
(32 - 8 + 2 * 1)/2 + 1 = 14

The new depth is equal to the number of filters, which is 20.

## Question on number of parameters without parameter sharing 

Without parameter sharing, each neuron in the output layer must connect to each neuron in the filter. In addition, each neuron in the output layer must also connect to a single bias neuron.

In [4]:
# 8 * 8 * 3 is the number of weights, we add 1 for the bias. 
# Each weight is assigned to every single part of the output (14 * 14 * 20). 
# So we multiply these two numbers together and we get the final answer
print((8*8*3+1)*(14*14*20))
# That's a HUGE amount!

756560


## Question on number of parameters with parameter sharing  

With parameter sharing, each neuron in an output channel shares its weights with every other neuron in that channel. So the number of parameters is equal to the number of neurons in the filter, plus a bias neuron, all multiplied by the number of channels in the output layer.

In [6]:
print((8*8*3+1)*(20))
# That's 196 times fewer parameters!

3860


That's 3840 weights and 20 biases. This should look similar to the answer from the previous quiz. The difference being it's just 20 instead of (14 * 14 * 20). Remember, with weight sharing we use the same filter for an entire depth slice. Because of this we can get rid of 14 * 14 and be left with only 20.

# CNN in TensorFlow

In [1]:
import tensorflow as tf

In [2]:
input = tf.placeholder(tf.float32, (None, 32, 32, 3)) # batch size, image H, image W, RGB
filter_weights = tf.Variable(tf.truncated_normal((8, 8, 3, 20))) # (height, width, input_depth, output_depth)
filter_bias = tf.Variable(tf.zeros(20))
strides = [1, 2, 2, 1] # (batch, height, width, depth)
padding = 'VALID'
conv = tf.nn.conv2d(input, filter_weights, strides, padding) + filter_bias

Note the output shape of **conv** will be [1, 13, 13, 20]. It's 4D to account for batch size, but more importantly, it's not [1, 14, 14, 20].   This is because the padding algorithm TensorFlow uses is not exactly the same as the one above. An alternative algorithm is to switch **padding** from **'VALID'** to **'SAME'** which would result in an output shape of [1, 16, 16, 20]. If you're curious how padding works in TensorFlow, read this document: https://www.tensorflow.org/api_guides/python/nn#Convolution

## TensorFlow Convolution

Let's examine how to implement a CNN in TensorFlow.

TensorFlow provides the **tf.nn.conv2d()** and **tf.nn.bias_add()** functions to create your own convolutional layers.

In [7]:
# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(tf.float32, shape=[None, image_height, image_width, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal([filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
# strides: (batch, height, width, depth)
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)

The code above uses the **tf.nn.conv2d()** function to compute the convolution with **weight** as the filter and **[1, 2, 2, 1]** for the strides. TensorFlow uses a stride for each **input** dimension, **[batch, input_height, input_width, input_channels]**. We are generally always going to set the stride for **batch** and **input_channels** (i.e. the first and fourth element in the **strides** array) to be **1**.

You'll focus on changing **input_height** and **input_width** while setting **batch** and **input_channels** to 1. The **input_height** and **input_width** strides are for striding the filter over **input**. 

**This example code uses a stride of 2 with 5x5 filter over input.**

The **tf.nn.bias_add()** function adds a 1-d bias to the last dimension in a matrix.

## TensorFlow Max Pooling

![title](img/max-pooling.png)

The image above is an example of **max pooling with a 2x2 filter and stride of 2**. The four 2x2 colors represent each time the filter was applied to find the maximum value. 

Conceptually, the benefit of the max pooling operation is to reduce the size of the input, and allow the neural network to focus on only the most important elements. Max pooling does this by only retaining the maximum value for each filtered area, and removing the remaining values.

TensorFlow provides the **tf.nn.max_pool()** function to apply max pooling to your convolutional layers.

In [8]:
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
    conv_layer,
    ksize=[1, 2, 2, 1],
    strides=[1, 2, 2, 1],
    padding='SAME')

The **tf.nn.max_pool()** function performs max pooling with the **ksize** parameter as the size of the filter and the **strides** parameter as the length of the stride. **2x2 filters with a stride of 2x2 are common in practice**.

The **ksize** and **strides** parameters are structured as 4-element lists, with each element corresponding to a dimension of the input tensor (**[batch, height, width, channels]**). For both **ksize** and **strides**, the batch and channel dimensions are typically set to **1**.

Max pooling is generally used to:
    * decrease the size of the output
    * prevent overfitting
Preventing overfitting is a consequence of reducing the output size, which in turn, reduces the number of parameters in future layers.

Recently, pooling layers have fallen out of favor. Some reasons are:

    * Recent datasets are so big and complex we're more concerned about underfitting.
    * Dropout is a much better regularizer.
    * Pooling results in a loss of information. Think about the max pooling operation as an example. We only keep the largest of n numbers, thereby disregarding n-1 numbers completely.

## Quiz Max Pooling 

H = height, W = width, D = depth

    We have an input of shape 4x4x5 (HxWxD)
    Filter of shape 2x2 (HxW)
    A stride of 2 for both the height and width (S)

Recall the formula for calculating the new height or width:

**new_height = (input_height - filter_height)/S + 1  
new_width = (input_width - filter_width)/S + 1**

NOTE: For a pooling layer the output depth is the same as the input depth. Additionally, the pooling operation is applied individually for each depth slice.

What's the shape of the output? Format is HxWxD.

In [16]:
H1=(4-2)/2+1
W1=(4-2)/2+1
print(H1,W1)
D=5

2.0 2.0


In [18]:
print('%dx%dx%d'%(H1,W1,D))

2x2x5


Here's the corresponding code:

In [19]:
input = tf.placeholder(tf.float32, (None, 4, 4, 5))
filter_shape = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = 'VALID'
pool = tf.nn.max_pool(input, filter_shape, strides, padding)

The output shape of **pool** will be [1, 2, 2, 5], even if **padding** is changed to **'SAME'**.

What's the result of a max pooling operation on the input:

[[[0, 1, 0.5, 10],
   [2, 2.5, 1, -8],
   [4, 0, 5, 6],
   [15, 1, 2, 3]]]

Assume the filter is 2x2 and the stride is 2 for both height and width. The output shape is 2x2x1.

The answering format will be 4 numbers, each separated by a comma, such as: 1,2,3,4.

Work from the top left to the bottom right

In [21]:
print(2.5,10,15,6)

2.5 10 15 6


What's the result of a average (or mean) pooling?

In [22]:
print((0+1+2+2.5)/4,(.5+10+1-8)/4,(4+0+15+1)/4,(5+6+2+3)/4)

1.375 0.875 5.0 4.0


# Full convolutional network in TensorFlow

In [1]:
import tensorflow as tf

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('../datasets/mnist', one_hot=True, reshape=False)

Extracting ../datasets/mnist/train-images-idx3-ubyte.gz
Extracting ../datasets/mnist/train-labels-idx1-ubyte.gz
Extracting ../datasets/mnist/t10k-images-idx3-ubyte.gz
Extracting ../datasets/mnist/t10k-labels-idx1-ubyte.gz


In [2]:
# Parameters
learning_rate = 0.00001
epochs = 10
batch_size = 128

# Number of samples to calculate validation and accuracy
# Decrease this if you're running out of memory to calculate accuracy
test_valid_size = 256

# Network Parameters
n_classes = 10  # MNIST total classes (0-9 digits)
dropout = 0.75  # Dropout, probability to keep units

In [3]:
# Store layers weight & bias
weights = {
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),     # 32 filters 5x5, in:1 out:32
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),    # 64 filters 5x5, in:32 out:64
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),    # FC: flatten 7x7x64 --> 1024 units
    'out': tf.Variable(tf.random_normal([1024, n_classes]))} # FC: 1024 --> 10

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))}

In [7]:
# Convolution
def conv2d(x, W, b, strides=1):
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

The **tf.nn.conv2d()** function computes the convolution against weight **W** as shown above.

In TensorFlow, **strides** is an array of 4 elements; the first element in this array indicates the stride for batch and last element indicates stride for features. It's good practice to remove the batches or features you want to skip from the data set rather than use a stride to skip them. You can always set the first and last element to 1 in **strides** in order to use all batches and features.

The middle two elements are the strides for height and width respectively. I've mentioned stride as one number because you usually have a square stride where **height = width**. When someone says they are using a stride of 3, they usually mean **tf.nn.conv2d(x, W, strides=[1, 3, 3, 1])**.

To make life easier, the code is using **tf.nn.bias_add()** to add the bias. Using **tf.add()** doesn't work when the tensors aren't the same shape.

In [8]:
# max pooling
def maxpool2d(x, k=2):
    return tf.nn.max_pool(
        x,
        ksize=[1, k, k, 1],
        strides=[1, k, k, 1],
        padding='SAME')

The **tf.nn.max_pool()** function does exactly what you would expect, it performs max pooling with the **ksize** parameter as the size of the filter.

In [9]:
# Model
def conv_net(x, weights, biases, dropout):
    # Layer 1 - 28*28*1 to 14*14*32
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    conv1 = maxpool2d(conv1, k=2)

    # Layer 2 - 14*14*32 to 7*7*64
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer - 7*7*64 to 1024
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output Layer - class prediction - 1024 to 10
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

In the code above, we're creating 3 layers alternating between convolutions and max pooling followed by a fully connected and output layer. The transformation of each layer to new dimensions are shown in the comments. For example, the first layer shapes the images from 28x28x1 to 28x28x32 in the convolution step. Then next step applies max pooling, turning each sample into 14x14x32. All the layers are applied from **conv1** to **output**, producing 10 class predictions.

In [11]:
# Session: now let's run it
# tf Graph input
x = tf.placeholder(tf.float32, [None, 28, 28, 1])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32)

# Model
logits = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(epochs):
        for batch in range(mnist.train.num_examples//batch_size):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: dropout})

        # Calculate batch loss and accuracy
        loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y, keep_prob: 1.})
        valid_acc = sess.run(accuracy, feed_dict={
                x: mnist.validation.images[:test_valid_size],
                y: mnist.validation.labels[:test_valid_size],
                keep_prob: 1.})

        print('Epoch {:>2}, Batch {:>3} -'
              'Loss: {:>10.4f} Validation Accuracy: {:.6f}'.format(epoch + 1, batch + 1, loss, valid_acc))

    # Calculate Test Accuracy
    test_acc = sess.run(accuracy, feed_dict={
        x: mnist.test.images[:test_valid_size],
        y: mnist.test.labels[:test_valid_size],
        keep_prob: 1.})
    print('Testing Accuracy: {}'.format(test_acc))


Epoch  1, Batch 429 -Loss:  1238.9299 Validation Accuracy: 0.765625
Epoch  2, Batch 429 -Loss:   933.5107 Validation Accuracy: 0.804688
Epoch  3, Batch 429 -Loss:   529.3239 Validation Accuracy: 0.835938
Epoch  4, Batch 429 -Loss:   277.0436 Validation Accuracy: 0.828125
Epoch  5, Batch 429 -Loss:   353.6198 Validation Accuracy: 0.828125
Epoch  6, Batch 429 -Loss:   211.7828 Validation Accuracy: 0.828125
Epoch  7, Batch 429 -Loss:   205.2637 Validation Accuracy: 0.839844
Epoch  8, Batch 429 -Loss:   233.9811 Validation Accuracy: 0.835938
Epoch  9, Batch 429 -Loss:   245.9134 Validation Accuracy: 0.835938
Epoch 10, Batch 429 -Loss:   162.3698 Validation Accuracy: 0.832031
Testing Accuracy: 0.828125


# Quizz: Using Convolution Layers in TensorFlow

Let's now apply what we've learned to build real CNNs in TensorFlow. In the below exercise, you'll be asked to set up the dimensions of the Convolution filters, the weights, the biases. This is in many ways the trickiest part to using CNNs in TensorFlow. Once you have a sense of how to set up the dimensions of these attributes, applying CNNs will be far more straight forward.
Review

You should go over the TensorFlow documentation for 2D convolutions: https://www.tensorflow.org/api_guides/python/nn#Convolution. Most of the documentation is straightforward, except perhaps the padding argument. The padding might differ depending on whether you pass 'VALID' or 'SAME'.

Here are a few more things worth reviewing:

    TensorFlow Variables.
    Truncated Normal Distributions in TensorFlow (you'll want to initialize your weights with a truncated normal distribution).
    How to determine the dimensions of the output based on the input size and the filter size (shown below). You'll use this to determine what the size of your filter should be.

**new_height = (input_height - filter_height + 2 * P)/S + 1  
new_width = (input_width - filter_width + 2 * P)/S + 1**

**Instructions**

    Finish off each TODO in the conv2d function.  
    Setup the strides, padding and filter weight/bias (F_w and F_b) such that the output shape is (1, 2, 2, 3).   
    Note that all of these except strides should be TensorFlow variables.


In [12]:
"""
Setup the strides, padding and filter weight/bias such that
the output shape is (1, 2, 2, 3).
"""
import tensorflow as tf
import numpy as np

# `tf.nn.conv2d` requires the input be 4D (batch_size, height, width, depth)
# (1, 4, 4, 1)
x = np.array([
    [0, 1, 0.5, 10],
    [2, 2.5, 1, -8],
    [4, 0, 5, 6],
    [15, 1, 2, 3]], dtype=np.float32).reshape((1, 4, 4, 1))
X = tf.constant(x)


def conv2d(input):
    # Filter (weights and bias)
    # The shape of the filter weight is (height, width, input_depth, output_depth)
    # The shape of the filter bias is (output_depth,)
    # TODO: Define the filter weights `F_W` and filter bias `F_b`.
    # NOTE: Remember to wrap them in `tf.Variable`, they are trainable parameters after all.
    F_W = tf.Variable(tf.truncated_normal([3, 3, 1, 3])) # 3 filters 3x3, in:1 out:3
    F_b = tf.Variable(tf.zeros([3]))
    # TODO: Set the stride for each dimension (batch_size, height, width, depth)
    strides = [1, 2, 2, 1]
    # TODO: set the padding, either 'VALID' or 'SAME'.
    padding = 'SAME'
    # https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#conv2d
    # `tf.nn.conv2d` does not include the bias computation so we have to add it ourselves after.
    return tf.nn.conv2d(input, F_W, strides, padding) + F_b

out = conv2d(X)


#### proposed solution
NOTE: there's more than 1 way to get the correct output shape. Your answer might differ from mine.
def conv2d(input):
    # Filter (weights and bias)
    F_W = tf.Variable(tf.truncated_normal((2, 2, 1, 3)))
    F_b = tf.Variable(tf.zeros(3))
    strides = [1, 2, 2, 1]
    padding = 'VALID'
    return tf.nn.conv2d(input, F_W, strides, padding) + F_b


# Quizz: Using Pooling Layers in TensorFlow

In the below exercise, you'll be asked to set up the dimensions of the pooling filters, strides, as well as the appropriate padding. You should go over the TensorFlow documentation for **tf.nn.max_pool()**: https://www.tensorflow.org/api_docs/python/tf/nn/max_pool. Padding works the same as it does for a convolution.
Instructions

   Finish off each TODO in the maxpool function.

   Setup the **strides, padding and ksize** such that the output shape after pooling is **(1, 2, 2, 1)**.


In [13]:
"""
Set the values to `strides` and `ksize` such that
the output shape after pooling is (1, 2, 2, 1).
"""
import tensorflow as tf
import numpy as np

# `tf.nn.max_pool` requires the input be 4D (batch_size, height, width, depth)
# (1, 4, 4, 1)
x = np.array([
    [0, 1, 0.5, 10],
    [2, 2.5, 1, -8],
    [4, 0, 5, 6],
    [15, 1, 2, 3]], dtype=np.float32).reshape((1, 4, 4, 1))
X = tf.constant(x)

def maxpool(input):
    # TODO: Set the ksize (filter size) for each dimension (batch_size, height, width, depth)
    ksize = [1, 2, 2, 1]
    # TODO: Set the stride for each dimension (batch_size, height, width, depth)
    strides = [1, 2, 2, 1]
    # TODO: set the padding, either 'VALID' or 'SAME'.
    padding = 'SAME'
    # https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#max_pool
    return tf.nn.max_pool(input, ksize, strides, padding)
    
out = maxpool(X)

#### proposed solution
NOTE: there's more than 1 way to get the correct output shape. Your answer might differ from mine.  
def maxpool(input):
    ksize = [1, 2, 2, 1]  
    strides = [1, 2, 2, 1]  
    padding = 'VALID'  
    return tf.nn.max_pool(input, ksize, strides, padding)