# Neural Network From Scratch

We will try to explain how someone should go about implementing a neural network from scratch. The idea of implementation is taken from this [book](http://neuralnetworksanddeeplearning.com/chap1.html).

In [1]:
# necessary imports of dependencies
import random
import numpy as np

In the constructor of a neural network, we store weights and biases of each layer and the number of layers. The constructor accepts one argument and that is an array explaining the number of neurons in each layer of a network.

```
>>> neuralNetwork = NeuralNetwork([10, 7, 5])
```

As a result, we've created a neural network with 10 neurons in the input layer, 7 neurons in the first hidden layer, and 5 neurons in the output layer.

```
>>> neuralNetwork.weights = [matrix(7x10), matrix(5x7)]
>>> neuralNetwork.biases = [vector(7), vector(5)]
```

In [2]:
class NeuralNetwork(object):
    """
    This class encapsulates neural network mechanics.
    """
    
    def __init__(self, sizes):
        """
        NeuralNetwork constructor.
        
        Attributes:
        -----------
        sizes: array that represents number of neurons in different layers of the network
        """
        self.sizes = sizes
        self.number_of_layers = len(sizes)
        self.weights = [np.random.randn(y, x) for x,y in zip(sizes[:-1], sizes[1:])]
        self.biases = [np.random.randn(x, 1) for x in sizes[1:]]

In [3]:
neuralNetwork = NeuralNetwork([10, 7, 5])

for w in neuralNetwork.weights:
    display(f'Weights: {w.shape}')
    
for b in neuralNetwork.biases:
    display(f'Biases: {b.shape}')

'Weights: (7, 10)'

'Weights: (5, 7)'

'Biases: (7, 1)'

'Biases: (5, 1)'

Since we are using the SGD(Stochastic Gradient Descent) - learning function besides training data and the number of epochs to train accepts the size of the batch and the learning rate. For each epoch, we first shuffle the data and split them into mini-batches. For each mini-batch, we perform a step in gradient descent. In each epoch, we evaluate how much our metric (accuracy) changed.

In [4]:
class NeuralNetwork(object):
    """
    This class encapsulates neural network mechanics.
    """
    
    def __init__(self, sizes):
        """
        NeuralNetwork constructor.
        
        Attributes:
        -----------
        sizes: array that represents number of neurons in different layers of the network
        """
        self.sizes = sizes
        self.number_of_layers = len(sizes)
        self.weights = [np.random.randn(y, x) for x,y in zip(sizes[:-1], sizes[1:])]
        self.biases = [np.random.randn(x, 1) for x in sizes[1:]]
        
        
    def learn(self, training_data, epochs, batch_size, learning_rate, test_data=None):
        """
        Performs a neural network learning process.
        
        Attributes:
        -----------
        training_data: training data for the neural network in the form (x, y), 
        representing the training inputs and the desired outputs
        epochs: number of epochs to train the neural network
        batch_size: size of the mini batch
        learning_rate: learning rate for the steepest descent
        test_data: if available it is used to see how well the network is learning at each epoch
        """
        if test_data: 
            size_test = len(test_data)
            
        size_training = len(training_data)
        
        # train a net for a specified number of epochs
        for i in range(epochs):
            random.shuffle(training_data)
            batches = [training_data[k:k+batch_size] for k in range(0, size_training, batch_size)]
            
            # do SGD per batch
            for batch in batches:
                self._sgd_step(batch, learning_rate)
            
            # show the results of a step if test data is available
            if test_data:
                print(f"Epoch {i}: {self.evaluate(test_data)} / {size_test}")
            # otherwise, just notify that epoch is completed
            else:
                print(f"Epoch {i} complete")

To perform an SGD step we calculate gradients for each data point in the batch. Finally, in the last step, we average all gradient vectors and update weights w and biases b.

In [5]:
    def _sgd_step(self, batch, learning_rate):
        """
        Performs one gradient descent step for the batch.
        
        Attributes:
        -----------
        batch: subset of the training data in the form (x,y)
        learning_rate: learning rate for the steepest descent
        """
        # create empty gradients
        nabla_w = [np.zeros(w_matrix.shape) for w_matrix in self.weights]
        nabla_b = [np.zeros(b_vector.shape) for b_vector in self.biases]
        
        # for each data point in the batch 
        # update the gradient according to the backpropagation algorithm
        for x,y in batch:
            delta_nabla_b, delta_nabla_w = self._backpropagation(x, y)
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            
        # finally update the parameters of the network (weights and biases)
        self.weights = [w - (learning_rate/len(batch))*nw for w,nw in zip(self.weights, nabla_w)]
        self.biases = [b - (learning_rate/len(batch))*nb for b,nb in zip(self.biases, nabla_b)]

This is the most important function in the class. It accepts a $x$ vector that represents activations in the input layer for the network and $y$ vector that represents the ideal output of the network for given $x$. 

We want to track activations and `z` values of each neuron in the network. In order to do that we need a forward pass through the network. Then we just follow the formulas from the backpropagation algorithm derivation to obtain gradients for the current input layer.

In [6]:
    def _backpropagation(self, x, y):
        """
        This function executes a backpropagation algorithm for one training example.
        
        Attributes:
        -----------
        x: activations in the input layer of a neural network
        y: desired output of a neural network
        """
        # gradient of weights and biases
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        
        # feedforward pass
        activations = [x]
        zs = []
        
        for w,b in zip(self.weights, self.biases):
            z = np.dot(w, activations[-1]) + b
            activations.append(self._sigmoid(z))
            zs.append(z)
        
        # backpropagation pass        
        delta = self._cost_derivative(activations[-1], y)*self._sigmoid_derivative(zs[-1])
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        nabla_b[-1] = delta
        
        for l in range(2, self.number_of_layers):
            z = zs[-l]
            sp = self._sigmoid_derivative(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            print(delta.shape, activations[-l-1].shape)
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
            
        return (nabla_w, nabla_b)

The full implementation of the neural network class is given below:

In [7]:
class NeuralNetwork(object):
    """
    This class encapsulates neural network mechanics.
    """
    
    def __init__(self, sizes):
        """
        NeuralNetwork constructor.
        
        Attributes:
        -----------
        sizes: array that represents number of neurons in different layers of the network
        """
        self.number_of_layers = len(sizes)
        self.sizes = sizes
        self.weights = [np.random.randn(y, x) for x,y in zip(sizes[:-1], sizes[1:])]
        self.biases = [np.random.randn(x, 1) for x in sizes[1:]]
        
        
    def learn(self, training_data, epochs, batch_size, learning_rate, test_data=None):
        """
        Performs a neural network learning process.
        
        Attributes:
        -----------
        training_data: training data for the neural network in the form (x, y), 
        representing the training inputs and the desired outputs
        
        epochs: number of epochs to train the neural network
        batch_size: size of the mini batch
        learning_rate: learning rate for the steepest descent
        test_data: if available it is used to see how well the network is learning at each epoch
        """
        if test_data: size_test = len(test_data)
        size_training = len(training_data)
        
        # train a net for a specified number of epochs
        for i in range(epochs):
            random.shuffle(training_data)
            batches = [training_data[k:k+batch_size] for k in range(0, size_training, batch_size)]
            
            # do SGD per batch
            for batch in batches:
                self._sgd_step(batch, learning_rate)
            
            # show the results of a step if test data is available
            if test_data:
                print(f"Epoch {i}: {self.evaluate(test_data)} / {size_test}")
            # otherwise, just notify that epoch is completed
            else:
                print(f"Epoch {i} complete")
                
                
    def _sgd_step(self, batch, learning_rate):
        """
        Performs one gradient descent step for the batch.
        
        Attributes:
        -----------
        batch: subset of the training data in the form (x,y)
        learning_rate: learning rate for the steepest descent
        """
        # create empty gradients
        nabla_w = [np.zeros(w_matrix.shape) for w_matrix in self.weights]
        nabla_b = [np.zeros(b_vector.shape) for b_vector in self.biases]
        
        # for each data point in the batch 
        # update the gradient according to the backpropagation algorithm
        for x,y in batch:
            delta_nabla_b, delta_nabla_w = self._backpropagation(x, y)
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            
        # finally update the parameters of the network (weights and biases)
        self.weights = [w - (learning_rate/len(batch))*nw for w,nw in zip(self.weights, nabla_w)]
        self.biases = [b - (learning_rate/len(batch))*nb for b,nb in zip(self.biases, nabla_b)]
            
    def _backpropagation(self, x, y):
        """
        This function executes a backpropagation algorithm for one training example.
        
        Attributes:
        -----------
        x: activations in the input layer of a neural network
        y: desired output of a neural network
        """
        # gradient of weights and biases
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        
        # feedforward pass
        activations = [x]
        zs = []
        
        for w,b in zip(self.weights, self.biases):
            z = np.dot(w, activations[-1]) + b
            activations.append(self._sigmoid(z))
            zs.append(z)
        
        # backpropagation pass        
        delta = self._cost_derivative(activations[-1], y)*self._sigmoid_derivative(zs[-1])
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        nabla_b[-1] = delta
        
        for l in range(2, self.number_of_layers):
            z = zs[-l]
            sp = self._sigmoid_derivative(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            print(delta.shape, activations[-l-1].shape)
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
            
        return (nabla_w, nabla_b)
        
        
    
    def _feedforward(self, x):
        """
        Calculates the output of a neural network by feeding forward information 
        from the input layer to the output layer through sigmoid neurons.
        
        Attributes:
        -----------
        x: activations in the input layer of a neural network. They should be represented in the form of a vector (n, 1)
        """
        for w,b in zip(self.weights, self.biases):
            x = self._sigmoid(np.dot(w, x) + b) 
            
        return x
    
    
    def _sigmoid(self, z):
        """
        Calculates sigmoid function value.
        
        Attributes:
        -----------
        z: input value for a sigmoid function
        """
        return 1.0 / (1.0 + np.exp(-z))
    
    def _sigmoid_derivative(self, z):
        """
        Calculates sigmoid derivative function value.
        
        Attributes:
        -----------
        z: input value for a sigmoid derivative function
        """
        return self._sigmoid(z)*(1 - self._sigmoid(z))
    
    def _cost_derivative(self, output_activations, y):
        """
        Vector of partial derivatives of the loss function 
        with respect to activations in the output layer.
        
        Attributes:
        -----------
        output_activations: activations in the output layer
        y: desired output of a neural network
        """
        return output_activations - y
    
    def evaluate(self, test_data):
        test_results = [(np.argmax(self.feedforward(x)), y) for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)
    
# Notes:
# - when z is a vector (in sigmoid function) function will return vector where sigmoid function is applied element-wise

In [8]:
neuralNetwork = NeuralNetwork([100, 50, 20, 10])

We won't start the network to learn digits because data preparation would take some time to program. The intention was to get a deep understanding of how the neural network learns from training examples.