In [1]:
import numpy as np

This notebook will create classes that defines a layer and the functions to compute the forward and backwards pass through the layer. These will be connected later to form the neural network.

This is shown in the following image:

![neural network](neural_network.png)

In [1]:
class Layer:
    def __init__(self):
        self.input = None
        self.output = None

    # computes output y of a layer for a given input x
    def forward_pass(self, input):
        raise NotImplementedError

    # computes dE/dX for a given dE/dY (and updates the parameters)
    def backward_pass(self, output_gradient, learning_rate):
        raise NotImplementedError


Within each layer, there is a dense layer and an acitvation layer. These will be created seperately for ease of understanding how a layer works. The activation layer step could be skipped by directly implementing the activation function inside the dense layer.

To create a dense layer, the forward and backwards propogation functions for each layer must be defined.

Forwards propogation will simply use y = w.x + b to calculate the Y value of that layer (often called the Z value). This value will later be used with an activation function to find the A value.

The following equations will be used to find the derivatives necesary to calculate backwards propogation:

![DERIV_EQ](deriv_eq.png)

where E is the error.

In [None]:
# create the dense layer where:
# input_size = number of input neurons
# output_size = number of output neurons
class Dense(Layer):
    def __init__(self, input_size, output_size):
        self.weights = np.random.randn(input_size, output_size)
        self.bias = np.random.randn(1, output_size)
    
    # returns the output for a given input
    def forward_pass(self, input):
        self.input = input
        return np.dot(self.input, self.weights) + self.bias
    
    # computes dE/dW, dE/dB for a given output_error (dE/dY)
    # returns input_error = dE/dX
    def backward_pass(self, output_gradient, learning_rate):
        # output_gradient is the derivative of the error with respect to the biases
        # calculating derivative of error with respect to weights
        input_error = np.dot(output_gradient, self.weights.T)
        weights_gradient = np.dot(self.input.T, output_gradient)
        # updating parameters
        self.weights -= np.multiply(weights_gradient, learning_rate)
        self.bias -= np.multiply(output_gradient, learning_rate)
        # return the dertivative of the error with respect to the input
        return input_error

Next, an activation layer must be created after the dense layer. Creating these seperately makes the model simpler to understand. In the activation layer, each input value goes to a different, single neuron in the acitvation layer (instead of each input value going to all the neurons in a dense layer). This is simply because each Y value in the dense layer has the same function performed on them. 

The forward_pass function will give the output of the whole layer- the activation value.

We want to find the derivative of the error with respect to the input so that it can be minimised by changing the w and b parameters. The following equation shows that function:
![activ_backpass](activ_backpass.png)

In [None]:
class Activation(Layer):
    def __init__(self, activation, activation_deriv):
        self.activation = activation
        self.activation_deriv = activation_deriv
    
    # returns the acitvated value of the layer
    def forward_pass(self, input):
        self.input = input
        return self.activation(self.input)
    
    # returns the input_error (dE/dX) for a given output_error (dE/dY)
    # no use of learning rate since no parameters are being updated
    def backward_pass(self, output_gradient, learning_rate):
        return np.multiply(output_gradient, self.activation_deriv(self.input))

Finally, the neural network needs to be created by making a network class which will allow the user to make a desired netowrk of any size using the acitvation functions from activation_funcs.ipynb and  the mean squared error function from error_funcs.ipynb

In [None]:
class Network:
    def __init__(self):
        self.layers = []
        self.loss = None
        self.loss_deriv = None
    
    # function to allow user to add a layer to the network
    def add(self, layer):
        self.layers.append(layer)
    
    # function to set loss to use
    def use(self, loss, loss_deriv):
        self.loss = loss
        self.loss_deriv = loss_deriv
    
    # function to predict an output for a given input
    def predict(self, input_data):
        # sample dimension first
        features, samples = input_data.shape
        result = []

        # run network over all the input samples
        for i in range(samples):
            # forward propogation
            output = input_data[:, i]
            for layer in self.layers:
                output = layer.forward_pass(output)
                result.append(output)
        
        # return the result array that contains all the outputs for each input
        return result

    # train the network using the training data set
    def fit(self, x_train, y_train, epochs, learning_rate):
        features, samples = x_train.shape
    
        # loop over all the training samples
        for i in range(epochs):
            error = 0
            for j in range(samples):
                # forward propogation
                output = x_train[:, j]
                for layer in self.layers:
                    output = layer.forward_pass(output)
                
                # compute loss (for display purposes only)
                error += self.loss(y_train[j], output)

                # backward propogation to update the parameters w and b
                error = self.loss_deriv(y_train[j], output)
                for layer in reversed(self.layers):
                    error = layer.backward_pass(error, learning_rate)
                
                # calculate the average error on all of the samples
                error /= samples
                print(f'epoch {i + 1}{epochs}, error = {error}')


This network assumes that the input data is a matrix where each column is an individual data point and each row is a different feature for the data points.

An example of this can be seen in data.ipynb