# Modular Neural Network

## The Layer Class

The `Layer` class is the fundamental parent class used in the modular implementation of a neural network using stochastic gradient descent and backpropogation.  

Each layer acts as a function, with given input/output sizes when required. Each layer must include the following methods:  
* `forward`: calculates the output of the layer given input, $ x $.
* `backward`: given the derivative of the error with respect to the layer's output, $ \frac{\partial E}{\partial y} $, the backward function must calculate the derivative of the error with respect to the layer's parameters, $ \frac{\partial E}{\partial w} $, and return the derivative of the error with respect to the layer's input, $ \frac{\partial E}{\partial x} $.
* `update_params`: updates the layer's parameters based on the the derivatives calculated in the backward method, according to the equation $ w' = w - \frac{\eta}{n}\frac{\partial E}{\partial w} $. Note that for layers which have no tunable parameters, the update params method is left blank.

Example layers include the Linear and Sigmoid layers, as shown below.

In [None]:
class Layer:
    def __init__(self):
        self.input = None
        self.output = None
        
    def forward(self, x):
        pass
    
    def backward(self, nabla_out):
        pass
    
    def update_params(self, eta, n):
        pass

In [None]:
class Linear(Layer):
    def __init__(self, input_size, output_size):
        self.weights = np.random.randn(output_size, input_size) / np.sqrt(input_size)
        self.biases = np.random.randn(output_size, 1)
        self.nabla_w = 0
        self.nabla_b = 0
        
    def forward(self, x):
        self.input = x
        return np.dot(self.weights, self.input) + self.biases
    
    def backward(self, nabla_out):
        self.nabla_w += np.dot(nabla_out, self.input.T)
        self.nabla_b += nabla_out
        return np.dot(self.weights.T, nabla_out)
    
    def update_params(self, eta, n):
        self.weights -= eta * self.nabla_w / n
        self.biases -= eta * self.nabla_b / n
        self.nabla_w = 0
        self.nabla_b = 0

In [None]:
class Sigmoid(Layer):
    def __init__(self):
        self.func = lambda x: 1 / (1 + np.exp(-x))
        self.func_prime = lambda x: self.func(x) * (1 - self.func(x))
        
    def forward(self, x):
        self.input = x
        return self.func(self.input)
    
    def backward(self, nabla_out):
        return np.multiply(nabla_out, self.func_prime(self.input))
    
    def update_params(self, eta, n):
        pass

## The Network Class

The `Network` class is inialized with a list of layer modules which define the architecture of the network. Analogously to the `forward`, `backward`, and `update_params` methods, the `Network` class contains the `forward_prop`, `backward_prop`, and `update_params` methods, which sequentially call the respective method in each of the layers of the network.  

The `Network` class includes several accuracy evaluation methods such as `evaluate_percentage_onehot` and `evaluate_error`. Each of these methods evaluates the performance of the network on a given set of test data and labels by calculating the percentage of properly classified test points or directly calculating the error as defined by the network's error function.  

The `mse` method implements the mean squared error function, defined as $ MSE = \frac{1}{N} \sum_{i=1}^{N}(y_i - \hat{y_i})^2 $ and was chosen as the network's error function. The `mse_prime` method implements the derivative of the mean squared error function with respect to the actual output, $ y $. Note that this error function could be replaced with another error function of choice.  

The `train` function tunes the network parameters according to the stochastic gradient descent algorithm. For each epoch, the input data and labels are split into mini batches. For every data point in the mini batch, the output and error are calculated, then the gradients are calculated using the backpropogation algorithm and the parameters are updated accordingly. The `train` method takes in the following arguments:  
* `input_data`: the input training data used to train the model
* `labels`: the input training labels used to train the model
* `mini_batch_size`: the size of each mini batch used during stochastic gradient descent
* `eta`: the learning rate for the stochastic gradient descent algorithm
* `epochs`: the number of training epochs to run
* `epoch_disp`: the number of epochs to display between each network evalution
* `evaluation`: set the network accuracy evaluation function
* `test_data`: the test data to use for network accuracy evaluations
* `test_labels`: the test labels to use for network accuracy evaluations

In [None]:
class Network:
    def __init__(self, layers):
        self.layers = layers
        
    def forward_prop(self, x):
        y = x
        for layer in self.layers:
            y = layer.forward(y)
        return y
    
    def backward_prop(self, nabla):
        for layer in reversed(self.layers):
            nabla = layer.backward(nabla)
        return nabla
            
    def update_params(self, eta, n):
        for layer in self.layers:
            layer.update_params(eta, n)
    
    def train(self, input_data, labels, mini_batch_size, eta, epochs, epoch_disp, evaluation='error',
        test_data=None, test_labels=None):
        n = len(input_data)
        
        for e in range(epochs):

            p = np.random.permutation(n)
            input_data = input_data[p]
            labels = labels[p]
            
            mini_batch_inputs = [input_data[k:k+mini_batch_size] for k in range(0, n, mini_batch_size)]
            mini_batch_labels = [labels[k:k+mini_batch_size] for k in range(0, n, mini_batch_size)]
            
            for i in range(len(mini_batch_inputs)):
                
                mini_batch_input = mini_batch_inputs[i]
                mini_batch_label = mini_batch_labels[i]
                
                for x, y in zip(mini_batch_input, mini_batch_label):
                    output = self.forward_prop(x)
                    nabla = self.mse_prime(output, y)
                    self.backward_prop(nabla)
            
                self.update_params(eta, mini_batch_size)
            
            if (test_data is not None) & (epoch_disp != 0):
                
                if (e % epoch_disp == 0) | (e == epochs - 1):

                    if evaluation == 'error':
                        err = self.evaluate_error(test_data, test_labels)
                        print(f'Error in epoch {e+1} / {epochs}: {round(err, 5)}')
                    elif evaluation == 'percentage':
                        acc = self.evaluate_percentage(test_data, test_labels)
                        print(f'Accuracy in epoch {e+1} / {epochs}: {round(acc, 2)}%')
                    elif evaluation == 'percentage_onehot':
                        acc = self.evaluate_percentage_onehot(test_data, test_labels)
                        print(f'Accuracy in epoch {e+1} / {epochs}: {round(acc, 2)}%')

## Utility Functions

The above code along with several other implementations of the `Layer` class is included in the `network.py` module. An additional module, `network_utils.py` is also included to facilitate experimentation.