# About
In this notebook, we will:
- Test our model from last time on unseen data
- Investigate whether a convolutional neuron network can outperform the original neural network comprised of Dense layers.

# Imports

In [1]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import random

from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.optimizers import Adam

In [2]:
# set seeds for reproducable results
random.seed(1234)
tf.random.set_seed(1234)

## Data

In [3]:
X = np.load('data/X.npy')
y = np.load('data/y.npy')

In [4]:
print(f'X has a shape of {X.shape}')
print(f'y has a shape of {y.shape}')

X has a shape of (5000, 400)
y has a shape of (5000, 1)


## Load model
We will load the same model we trained from the last notebook.

In [5]:
overfitted_model = load_model('models/Denselayer.keras')

In [6]:
m,n = X.shape
indices = np.random.randint(0,X.shape[0],400)
num_correct_predictions = sum([np.argmax(overfitted_model.predict(X[idx].reshape(1,n), verbose=0)) for idx in indices]==y[indices].reshape(-1,))

print(f'{num_correct_predictions} out of {len(indices)} digits correctly predicted \n{num_correct_predictions/len(indices)*100:.1f}% success rate')

389 out of 400 digits correctly predicted 
97.2% success rate


This model does extremely good at predicting target values for data which it trained on. It may have "overfitted the data". <br>Let's see what happens when we train the same model as last time but test it on a subset of data it has not seen before. 

## Retrain Model
We will load the same model, before it was trained (with randomly initialised weights), from the last notebook.

In [8]:
model = load_model('models/Denselayer_beforetraining.keras')

In [9]:
# you will need to run pip install scikit-learn
from sklearn.model_selection import train_test_split

In [10]:
# do a 80|20 training|test split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
print(f'X_train has shape {X_train.shape}')
print(f'y_train has shape {y_train.shape}')
print(f'X_test has shape {X_test.shape}')
print(f'y_test has shape {y_test.shape}')

X_train has shape (4000, 400)
y_train has shape (4000, 1)
X_test has shape (1000, 400)
y_test has shape (1000, 1)


We have split our data into two subsets: training and test data 

In [11]:
# define loss function
model.compile(
    loss = SparseCategoricalCrossentropy(from_logits = True),
    optimizer = Adam(0.001)
)

# train model
num_epochs = 20
history = model.fit(X_train, y_train,epochs = num_epochs)

Epoch 1/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 2.0474
Epoch 2/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.9640
Epoch 3/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.5657
Epoch 4/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.4129
Epoch 5/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.3483
Epoch 6/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.3103
Epoch 7/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.2828
Epoch 8/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.2609
Epoch 9/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.2430
Epoch 10/20
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - lo

In [12]:
m,n = X_test.shape
indices = np.random.randint(0,X_test.shape[0],400)
num_correct_predictions = sum([np.argmax(model.predict(X_test[idx].reshape(1,n), verbose=0)) for idx in indices]==y_test[indices].reshape(-1,))

print(f'{num_correct_predictions} out of {len(indices)} digits correctly predicted \n{num_correct_predictions/len(indices)*100:.1f}% success rate')

370 out of 400 digits correctly predicted 
92.5% success rate


Still convincing, but obviously not as good as the overfitted model

# Convolutional Neuron Network 
A convolutional neuron network responds to inputs differently than the neural network we built from the last notebook. I found the article from Axel Thevenot has useful animations to explain: https://towardsdatascience.com/conv2d-to-finally-understand-what-happens-in-the-forward-pass-1bbaafb0b148
![cnn](media/cnn.gif) <br>
Let's break it down:
- On the left, we have our 9×9 image, a total of 81 pixels
- The 3×3 grid in the middle represents our neuron in the convolutional neuron network (CNN). Neurons in a CNN are often referred to as kernels
- On the right is the kernel's output

Still with me, you're doing great! Instead of consuming the entire input image in one go, as did our neurons in the Dense layer, our kernel slides over the image. Let's see what the kernel does in step 1 <br>
## Kernel Calculation
![cnn_step1](media/cnn_step1.jpg)<br><br>
Those 9 pixels on the left serve as input to our kernel - each pixel is represented by a number between 0-255 (if you don't know about this, research greyscale values).<br>
The kernel itself has 9 weights because 3×3=9.<br><br>
![kernel](media/kernel.jpg)<br><br>
If we perform element-wise multiplication & sum the results, we get a single number ouput.<br><br>
![kernel](media/kernel_calculation.jpg) <br><br>
This is the number representing the top left most pixel in the output. Repeat this calculation every time the kernel slides across.

## Activation Function
Let's assume the output of our kernel is the following 7×7 matrix <br><br>
![kernel_output_before_activation](media/kernel_output_before_activation.jpg) <br><br>
We are not done, each kernel has an activation function - here we will use Rectified Linear Unit (ReLU). That means anything negative turns to zero and anything else stays the same. <br> <br>
![kernel_output_before_activation](media/kernel_output_after_activation.jpg) <br><br>
And at long last, we have made it. This 2D matrix is the output of 1 convolutional neuron.