# COMP4702/7703 Prac 7: Convolutional Neural Networks (CNNs)

So, unless you're not at all interested in machine learning and AI (in which case why are you doing this course?!), you will have heard of convolutional neural networks. They are very high on the hype-curve because they have been very successful, amongst other things, at image processing. Here we will construct a small CNN to do classification on the MNIST dataset.

The network follows this structure:

[conv -> max_pool]*N -> FC -> FC

where:
* conv is a convolutional layer that applies a kernel to the previous layer
* max_pool is a pooling layer
* N is the number of conv-pool repititions 
* FC is a fully connected layer

Again, the activation function is ReLU by default but feel free to change it! The number of filters in each convolutional layer is given by the layer number multiplied by the 'numFilters' variable defined in the code below; i.e. with 'numFilters' = 32, the first conv layer will have 32 filters, the second 64, the third 96, and so on.

In [1]:
from prac7ConvMLPModel import *
from SupportCode.Helpers import *

Instructions for updating:
non-resource variables are not supported in the long term


In [2]:
convTop = {}
convTop['convPoolLayers'] = 1 # N
# Convolutional layer parameters
convTop['filterSize'] = 3 # F
convTop['convStride'] = 1 # S
# This is equivalent to the number of features
convTop['numFilters'] = 32 # K
# Pooling parameters
convTop['poolK'] = 2 # F
convTop['poolStride'] = 2 # S
# Size of the first FC layer (Any ideas why we don't need to specify the size of the output layer? ;))
convTop['FCLayerSize'] = 512

In [3]:
# Optimisation dictionary for Gradient Descent
optDicGD = {}
optDicGD["optMethod"] = "GradientDescent"
optDicGD["learning_rate"] = 0.0001

activationFunction = tf.nn.relu

### Set up data

In [4]:
mnist = tf.keras.datasets.mnist.load_data()
[x_train, y_train], [x_test, y_test] = mnist

# Flatten input arrays from 28x28 to 784 for x_train and x_test
x_train = x_train.reshape(len(x_train), 784)
x_test = x_test.reshape(len(x_test), 784)

# Concatenate x_train and y_train in order to randomly shuffle whole dataset (VERY IMPORTANT - used for K-Fold CV)
y_train = y_train.reshape(len(y_train), 1)
train = np.concatenate((x_train, y_train), axis=1)
np.random.shuffle(train)
# Resplit x_train and y_train
x_train = train[:, :-1]
y_train = train[:, -1]

# One hot encoding for y_train to be able to train neural net
shape = (y_train.size, y_train.max() + 1)
one_hot = np.zeros(shape)
rows = np.arange(y_train.size)
one_hot[rows, y_train] = 1
y_train = one_hot

# One hot encoding for y_train to be able to train neural net
shape = (y_test.size, y_test.max() + 1)
one_hot = np.zeros(shape)
rows = np.arange(y_test.size)
one_hot[rows, y_test] = 1
y_test = one_hot

In [6]:
data = [x_train, y_train, x_test ,y_test]
prac7ConvMLPModel(data, model='convNet', convTop=convTop, optimiser=optDicGD, act=activationFunction, max_steps=100)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Accuracy at step 0: 0.1134
Accuracy at step 10: 0.308
Accuracy at step 20: 0.4579
('Adding run metadata for', 24)
Accuracy at step 30: 0.5247
Accuracy at step 40: 0.5686
('Adding run metadata for', 49)
Accuracy at step 50: 0.5884
Accuracy at step 60: 0.6214
Accuracy at step 70: 0.6529
('Adding run metadata for', 74)
Accuracy at step 80: 0.6491
Accuracy at step 90: 0.6661
('Adding run metadata for', 99)
('Accuracy on test set: ', 0.6778)




In [7]:
# openTensorBoardAtIndex("convNet", "GradientDescent", 0)

Have a look at the images of the convolutional weights in TensorBoard. Hopefully you find them interesting. [Here](http://cs231n.github.io/convolutional-networks/ ) is an excllent resource for more information on CNNs. Have a read of this material before continuing.

So you've implemented a CNN and that's pretty cool, but did you understand it?

## Q2

Using the link to the CS231n CNN theory given above, calculate the volume for the weight matrices of each layer for a convolutional network that has two conv-pool layers:

\[conv -> max_pool\] (Layer 1) -> \[conv -> max_pool\] (Layer 2) -> FC -> FC

Assume: 
* The input is 28x28.
* Parameter sharing.
* The number of filters in a particular layer is given by the expression: i*32, where i is the layer number.

For the conv layer, assume:
* A padding of 1 (P=1).
* A stride length of 1.
* A spatial extent of 3.

For the pooling layer, assume:
* A stride length of 1
* A spatial extent of 2

Assume that the first FC layer has 1024 neurons and the second has the number of classes in MNIST. 

Once you have done this calculate the number of parameters in this network.

### Hint:
Don't forget the biases!

### Set up variables (formulas used from CS231n CNN theory, also used this [link](https://towardsdatascience.com/understanding-and-calculating-the-number-of-parameters-in-convolution-neural-networks-cnns-fc88790d530d))

In [8]:
K = 1*32
P = 1
S = 1
F_conv = 3
F_pool = 2

### Volume and number of parameters for each layer (total of 4 layers)

In [9]:
# Input layer
num_input = 0
print(f"Number of parameters in input layer: {num_input}")

Number of parameters in input layer: 0


In [10]:
# First conv layer
W1 = 28
H1 = 28
D1 = 3

W2 = (W1 - F_conv + 2*P)/S + 1
H2 = W2
D2 = K
print(f"Volume for first conv layer: {W2*H2* D2}")

num_conv1 = ((F_conv*F_conv*D1) + 1)*K
print(f"Number of parameters in first conv layer: {num_conv1}")

Volume for first conv layer: 25088.0
Number of parameters in first conv layer: 896


In [11]:
# First pool layer
W1 = W2
H1 = H2
D1 = D2

W2 = (W1 - F_pool)/S + 1
H2 = W2
D2 = D1
print(f"Volume for first pool layer: {W2*H2* D2}")

num_pool1 = 0
print(f"Number of parameters in first pool layer: {num_pool1}")

Volume for first pool layer: 23328.0
Number of parameters in first pool layer: 0


In [12]:
# Second conv layer
W1 = W2
H1 = H2
D1 = D2

W2 = (W1 - F_conv + 2*P)/S + 1
H2 = W2
K = 2*32
D2 = K
print(f"Volume for second conv layer: {W2*H2* D2}")

num_conv2 = ((F_conv*F_conv*D1) + 1)*K
print(f"Number of parameters in second conv layer: {num_conv2}")

Volume for second conv layer: 46656.0
Number of parameters in second conv layer: 18496


In [13]:
# Second pool layer
W1 = W2
H1 = H2
D1 = D2

W2 = (W1 - F_pool)/S + 1
H2 = W2
D2 = D1
print(f"Volume for second pool layer: {W2*H2* D2}")

num_pool2 = 0
print(f"Number of parameters in second pool layer: {num_pool2}")

Volume for second pool layer: 43264.0
Number of parameters in second pool layer: 0


In [14]:
# First fully connected layer
prev_layer = W2*H2*D2
curr_layer = 1024
print(f"Volume for first fully connected layer: {curr_layer}")


num_fc1 = curr_layer*prev_layer + 1*curr_layer
print(f"Number of parameters in first fully connected layer: {num_fc1}")

Volume for first fully connected layer: 1024
Number of parameters in first fully connected layer: 44303360.0


In [15]:
# Second fully connected (output) layer
prev_layer = curr_layer
curr_layer = 10
print(f"Volume for second fully connected layer: {curr_layer}")


num_fc2 = curr_layer*prev_layer + 1*curr_layer
print(f"Number of parameters in second fully connected layer: {num_fc2}")

Volume for second fully connected layer: 10
Number of parameters in second fully connected layer: 10250


### Total number of parameters

In [16]:
total = num_input + num_conv1 + num_pool1 + num_conv2 + num_pool2 + num_fc1 + num_fc2
print(f"Total parameters: {total}")

Total parameters: 44333002.0


## Q3
Calculate the number of parameters in an MLP that has two hidden layers, with 1000 neurons in the first hidden layer, and 300 neurons in the second hidden layer.

Compare this number to the number of parameters that you calculated for the CNN in Q2.

In [17]:
# First hidden layer
prev_layer = 28*28*3
curr_layer = 1000

num_fc1 = curr_layer*prev_layer + 1*curr_layer
print(f"Number of parameters in first hidden layer: {num_fc1}")

Number of parameters in first hidden layer: 2353000


In [18]:
# Second hidden layer
prev_layer = curr_layer
curr_layer = 300

num_fc2 = curr_layer*prev_layer + 1*curr_layer
print(f"Number of parameters in second hidden layer: {num_fc2}")

Number of parameters in second hidden layer: 300300


In [19]:
# Final output layer
prev_layer = curr_layer
curr_layer = 10

num_fc3 = curr_layer*prev_layer + 1*curr_layer
print(f"Number of parameters in output layer: {num_fc3}")

Number of parameters in output layer: 3010


In [20]:
total = num_fc1 + num_fc2 + num_fc3
print(f"Total parameters: {total}")

Total parameters: 2656310


## Q4
Compare the performance of the MLP and the CNN that you created. 

### Instructions:

* Use a **table** to display your results and hyper-parameter choices.
* Discuss the hyper-parameter selection of your CNN in **at most** 150 words.
* Discuss the difference between the MLP and CNN in **at most** 100 words.

## Fun for Everyone 

So CNNs are super cool. They can make artwork like this:

![inceptionism-neural-network-deep-dream-art-42__605.jpg](attachment:inceptionism-neural-network-deep-dream-art-42__605.jpg)

Or they can make cool pictures like this:

![image.png](attachment:image.png)

for maximum doge effect!

Included in the assignment zip is the deep dream notebook (The second picture is generated from deep dream) that is available from the tensorflow tutorials. If you'd like feel free to go through the notebook and do some dreaming :).