In [2]:
import numpy as np
from nn_scripts.layer import Layer
from scipy import signal

# Convolution Neural Networks:

Created by following this video: https://www.youtube.com/watch?v=Lakz2MoHy6o 

To first begin a look at a convolutional neural network we first need to start with a look at convolutions themselves.

### Convolutions

The most intuitive way to think about a convolution is to consider a input object say a 3 x 3 matrix with some random values, we'll call it $I$. Then imagine another smaller 2 x 2 matrix that has values only 0.25, we'll call it $K$. If you slid $K$ over the entirety of $I$ and multiplied the correlating values of each matrix and added them to get a new value, you would essentially get the average of each 2 x2 square in $I$. This is essentially the idea of a convolution.

The only other things you need to note right now are: 

- This process is technically a correlation not a convolution to make this a convolution you must rotate $K$ 180 degrees(this is not the same thing as transpose.)
- Also, $K$ stands for kernel, which is generally what these matrices will be called.

$$conv(I,K) = I \star rotate180(K)$$

<p style="text-align: center;">or</p>

$$conv(I,K) = I * K$$

### Convolutions in a NN:

Inputs, denoted $X$, will be 3-dimensional objects that are almost like several pages of a book. Each page,$X_n$, is a matrix (probably a square matrix) and the amount of pages can be thought of as the depth, denoted $n$. The elements of the inputs represent activations for a given input. Each element denoted by: $$x_{i,j}^n \in X_n$$

The kernels object, denoted $K$, will be 4-dimensional, with a given kernel,$K_{dn}$, having $n$ 2d matrices to apply to each the matrix at each depth in the input, and then there can be multiple kernels in a kernel which each are denoted by $d$. The elements of kernels represent weights that are attached to the inputs to calculate a weighted sum. Weights denoted by: $$w_{i,j}^d \in K_{dn}$$

Then for each kernel the layer contains, a separate bias matrix, denoted $B_d$ will be added to the resulting weighted sums for each output. Elements denoted: $$b_{i,j}^d \in B_d$$

Finally to show how the full output matrix of the convolution layer, $Y_d$ is mathematically represented: 
$$Y_d = B_d + X_1 \star K_{1d} + ... + X_n \star K_{nd}$$

$$Y = B + X \cdot|\star K $$

In [15]:
class Convolutional(Layer):
    def __init__(self, input_shape, kernel_size, depth):
        input_depth, input_height, input_width = input_shape
        self.depth = depth
        self.input_shape = input_shape
        self.input_depth = input_depth
        self.output_shape = (depth, input_height - kernel_size + 1, input_width - kernel_size + 1)
        self.kernel_shape = (depth, input_depth, kernel_size, kernel_size)
        self.kernels = np.random.randn(*self.kernel_shape)
        self.biases = np.random.randn(*self.output_shape)

    def forward(self, input):
        self.input = input
        self.output = np.copy(self.biases)
        for i in range(self.depth):
            for j in range(self.input_depth):
                self.output[i] += signal.correlate2d(self.input[j], self.kernels[i, j], "valid") 
        return self.output
    
    def backward(self, output_gradient, learning_rate):
        # TODO: propagate backwards
        pass

    

### Back Propogation:

With a given output and error function, we can easily calculate the derivate $\frac{\delta E}{\delta K_{ij}}$ with 

$$\frac{\delta E}{\delta K_{ij}} = X_j \star \frac{\delta E}{\delta Y_i}$$

Using some similar calculations we get: 
$$\frac{\delta E}{\delta B_{i}} = \frac{\delta E}{\delta Y_i}$$

$$\frac{\delta E}{\delta X_{j}} = \sum_{i=1}^n \frac{\delta E}{\delta Y_i} \star_{full} K_{ij}$$

In [3]:
class Convolutional(Layer):
    def __init__(self, input_shape, kernel_size, depth):
        input_depth, input_height, input_width = input_shape
        self.depth = depth
        self.input_shape = input_shape
        self.input_depth = input_depth
        self.output_shape = (depth, input_height - kernel_size + 1, input_width - kernel_size + 1)
        self.kernel_shape = (depth, input_depth, kernel_size, kernel_size)
        self.kernels = np.random.randn(*self.kernel_shape)
        self.biases = np.random.randn(*self.output_shape)

    def forward(self, input):
        self.input = input
        self.output = np.copy(self.biases)
        for i in range(self.depth):
            for j in range(self.input_depth):
                self.output[i] += signal.correlate2d(self.input[j], self.kernels[i, j], "valid") 
        return self.output
    
    def backward(self, output_gradient, learning_rate):
        kernels_gradient = np.zeros(self.kernel_shape)
        input_gradient = np.zeros(self.input_shape)

        for i in range(self.depth):
            for j in range(self.input_depth):
                kernels_gradient[i,j] = signal.correlate2d(self.input[j], output_gradient[i], "valid")
                input_gradient[j] += signal.correlate2d(output_gradient[i], self.kernels[i,j], "full")
        
        self.kernels -= learning_rate * kernels_gradient
        self.biases -= learning_rate * output_gradient

        return input_gradient

### Reshape Layer:

At the end of a CNN, it is very common to have one or a few dense layers to make final predictions. To enter into the dense layer the data must be shaped in a 1-dimensional column so to turn our 3-d output objects into this shape we use a reshape layer.

In [4]:
class Reshape(Layer):
    def __init__(self, input_shape, output_shape):
        self.input_shape = input_shape
        self.output_shape = output_shape
    
    def forward(self, input):
        return np.reshape(input, self.output_shape)

    def backward(self, output_gradient, learning_rate):
        return np.reshape(output_gradient, self.input_shape)

### Full Implementation:

Bringing it all together, first we'll get the MNIST Dataset from the keras.datasets packages. The MNIST Dataset is the definitive neural network dataset, comrprised of a million labeled 28 x 28 images of handwritten digits 0-9. The general problem is to take the pixel activation values as a neuron each and then try to get a model that can accurately identify the handwritten digits. 

For my implementation I chose to limit it to only 1000 training examples and 20 test examples, to avoid extreme training times. To prepare the data for the implementation it involves first normalizing the pixel values to 0-1. Then adjusting the y values to be one-hot encoded to make the model a classification model rather than a regression. 

In [5]:
# ! pip install keras

from keras.datasets import mnist




In [6]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

def preprocess_data(x, y, limit):
    # reshape and normalize input data
    x = x.reshape(len(x), 1, 28, 28)
    x = x.astype("float32") / 255
    # encode output which is a number in range [0,9] into a vector of size 10
    # e.g. number 3 will become [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
    y = np.eye(10)[y]
    y = y.reshape(y.shape[0], 10, 1)
    return x[:limit], y[:limit]

x_train, y_train = preprocess_data(x_train, y_train, 1000)
x_test, y_test = preprocess_data(x_test, y_test, 20)


Then all I need to do is add my layers to a network object. For this I chose to have 2 convolutional layers, with 6 kernels and with a size of 3 x 3 each. Finally put the convolutional features through 2 Dense layers to finally choose its answer. 

In [24]:
from nn_scripts.network import *
from nn_scripts.dense import Dense
from nn_scripts.activations import Sigmoid, Tanh
from nn_scripts.loss import *

network = [
    Convolutional((1, 28, 28), 3, 6),
    Sigmoid(),
    Reshape((6, 26, 26), (6 * 26 * 26, 1)),
    Dense(6 * 26 * 26, 100),
    Tanh(),
    Dense(100, 10),
    Sigmoid()
]

train(network, binary_cross_entropy, binary_cross_entropy_prime, x_train, y_train, epochs=15)

1/15, error=0.7598488239180784
2/15, error=0.4424885477765716
3/15, error=0.35647294572113924
4/15, error=0.32698257429738714
5/15, error=0.3093853603016622
6/15, error=0.3014377847669694
7/15, error=0.29522329144716214
8/15, error=0.28959494013372933
9/15, error=0.2861134354123164
10/15, error=0.28169636721567515
11/15, error=0.2766333799671089
12/15, error=0.27195295217174664
13/15, error=0.2674354607087865
14/15, error=0.2641227867620214
15/15, error=0.2607030438181997


The training mse is supposedly around 0.25, which is quite decent, to make sure the model was overfitting, I put the test samples through and achieved an error of about 0.35 which I am quite happy with.

In [28]:
pred_test = [predict(network,input) for input in x_test]


In [29]:
error = 0.0
for y,y_hat in zip(y_test,pred_test):
    answer_error = binary_cross_entropy(y, y_hat)
    print('Error:',answer_error)
    error += answer_error
print('Total Average Error:',error / len(y_test))

Error: 0.16214652349750322
Error: 0.3435895208736797
Error: 0.5039025401996056
Error: 0.6014211317786741
Error: 0.1522104957956137
Error: 0.39957298504754213
Error: 0.15222141042462417
Error: 0.3625097403488424
Error: 0.35964982726072775
Error: 0.26888848285858535
Error: 0.48775162311442866
Error: 0.7277393383319422
Error: 0.25594468139299553
Error: 0.16832549869195979
Error: 0.19912678954531687
Error: 0.7791157060627018
Error: 0.1542535710251805
Error: 0.36724582101680486
Error: 0.45873283939721965
Error: 0.15224237067692975
Total Average Error: 0.3528295448670439
