# Digit Classifier

This Notebook uses a neural network that inputs an image of a handwritten digit, and predicts what digit it is (0 to 10).

I am making this classifier while learning the basics of DL and neural networks online, from Micheal Nielsen's online book, the link of which is available in the README file.

## Import dependencies

In [7]:
import numpy as np
import random

## Helper functions

In [44]:
def sigmoid(z):
    """ Return sigma(z), where z = w.x + b """
    return 1/(1 + np.exp(-z))

def sigmoid_prime(z):
    """ Return sigma'(z), where z = w.x + b """
    sigma = sigmoid(z)
    return sigma * (1-sigma)

## The `Network` class

The following class represents a basic neural network, with attributes and methods as follows.

### Attributes
<ul>
    <li> sizes </li>
    <li> number of layers </li>
    <li> biases </li>
    <li> weights </li>
</ul>

### Methods

<ul>
    <li> `feedforward`: compute σ(wx + b) for the neural network </li>
    <li> `update_mini_batch`: perform gradient descent and backpropagation on a mini-batch </li>
    <li> `backpropagate`: compute gradients of cost on weights and biases </li>
    <li> `SGD`: perform stochastic gradient descent </li>
    <li> `evaulate`: test the network on test-set </li>
</ul>

### Working

You can create a neural network by creating an instance of the following class:

```
    myNetwork = Network(size)
```

In [50]:
class Network:
    
    def __init__(self, size):
        """
        Constructor to initalize the object attributes
        Assumption: layer 0 is the input layer (so it won't have any biases)
        """
        
        self.size = size # number of neurons in each layer
        self.num_layers = len(size) # number of layers
        self.biases = [np.random.randn(l, 1) for l in size[1:]] # randomly initalize bias of each layer from layer 1
        self.weights = [np.random.randn(l, m) for m, l in zip(size[:-1], size[1:])] # randomly initiliaze weights in the same way
        
    
    def feedforward(self, a):
        """ 
        Compute sigma(w.x + b), where w is list of weights for each layer, and b is the list of biases for each layer.
        Return the feedforward output after going through all layers
        """
        
        for w,b in zip(self.weights, self.biases):
            a = sigmoid(np.dot(w,a) + b)
        return a
    
    def backdrop(self, x, y):
        """
        Return the gradients nabla_b, nabla_w for the baises and weights respectivly, applying backpropagation.
        """
        
        # Initialize the tuples with zeroes
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]        
        
        # Feedforward
        act = x # activation for first layer
        acts = [x] # list to store activations for all layers
        Z = [] # list to store all z's layer-wise
        for b,w in zip(self.biases, self.weights):
            z = np.dot(w, act) + b
            Z.append(z)
            act = sigmoid(z)
            acts.append(act)
        
        # Compute the error
        delta = self.cost_prime(acts[-1], y) * sigmoid_prime(Z[-1])
        
        # Compute the final gradients
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, acts[-2].transpose())
        
        # Backpropagation
        for l in range(2, self.num_layers):
            z = Z[-l]
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sigmoid_prime(z)
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, acts[-l-1].transpose())
        
        # results
        return nabla_b, nabla_w           
        
    
    def update_mini_batch(self, mini_batch, eta):
        """
        Calculate the gradients (or derivatives) for all weights and biases in the mini-batch using backpropagation.
        Apply gradient descent to update the entries of the mini-batch using the gradients obtained, thereby updating the weights and biases of the network.
        
        Parameters
        ----------
        mini_batch: a list of tuples (x,y) randomly selected from the training set
        eta: the learning rate of the algorithm
        """
        
        # gradient arrays for weights and biases, initialized with zeroes
        nabla_w = [np.zeros(w.shape) for w in self.weights] 
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        
        for x,y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backdrop(x,y) # compute derivatives
            nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
            nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
        
        # update weights and biases
        m = len(mini_batch)
        self.weights = [w - (eta*nw)/m for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b - (eta*nb)/m for b, nb in zip(self.biases, nabla_b)]
        
    
    def SGD(self, train_data, epochs, mini_batch_size, eta, test_data = None):
        """ Stochastic Gradient Descent
        Aim: train the neural network using stochastic gradient descent in mini-batches
        
        Parameters
        -----------
        train_data: list of (x,y) tuples listing the inputs (x) and their desired outputs (y);
        epochs: the point where training of one mini-batch ends, and another needs to begin;
        mini_batch_size: number of elements in the mini-batch;
        eta: learning rate;
        test_data (optional): if provided, the network will be evaluated with the test data after each epoch, and progress will be printed
        """
        
        train_data = list(train_data)
        n = len(train_data)
        
        if test_data:
            test_data = list(test_data)
            m = len(test_data)
        
        # loop over every epoch until the training set is exhausted
        for epoch in range(epochs):
            # pick random samples from the training set, and put them in mini-batches
            random.shuffle(train_data)
            mini_batches = [train_data[i: i+mini_batch_size] for i in range(0, n, mini_batch_size)]
            
            # perform gradient descent and back-propagation in every mini-batch
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            
            # test the trained batches, if test set is provided
            if test_data:
                print (f'Epoch {epoch}: {self.evaluate(test_data)} / {m}')
            else:
                print (f'Epoch {epoch} complete')
    
    
    def evaluate(self, test_data):
        """ 
        Test our trained data against the training batch provided.
        Return the number of test cases that result in correct output from the neural network.
        Neural networks output is the index of the neuron in the final layer having the highest activation.
        """
        
        test_results = [(np.argmax(self.feedforward(x)), y) for x, y in test_data]
        return sum(x==y for x,y in test_results)
    
    
    def cost_prime(self, a, y):
        """ Return cost derivative C'(x) = (a-y) for n = 1 """
        return a-y
        
    
    def printElements(self):
        """ Print the elements of the current instance of the neural network """
        
        print(f'This neural network has: \n{self.num_layers} number of layers, \n{self.size} neurons in each layer, \n{self.biases} as biases for each layer, and \n{self.weights} as weights for each layer.')

In [37]:
# uncomment the following lines to understand how a zip works
# l1 = [1,2,3]
# l2 = [4,5,6]
# print(f'first zip: {list(zip(l1,l2))}')
# for i in zip(l1,l2):
#     print(i)

# l = [1,2,3,4,5]
# print(f'second zip: {list(zip(l, l[1:]))}')
# print(np.multiply(l1,l2))
# print(list(zip(l1,l2)))

# np.random.randn(1,3)

## Using the classifier to recognize handwritten digits

The class `Network` contains all the members needed to run the classifier.

** Steps. **
<ol>
    <li> At first, import the helper file `mnist_loader` file to the notebook </li>
    <li> Now, load the train, validation and test data from the the helper </li>
    <li> Set up a network with `30` neurons </li>
    <li> Perform Stochastic Gradient Descent on it with initial parameters for 30 epochs </li>
</ol>

In [38]:
import mnist_loader

In [51]:
# Load the train, test and validation data from mnist_loader file
train_set, validation_set, test_set = mnist_loader.load_data_wrapper()

# Set up a network with 30 neurons in the hidden layer
myNet = Network([784, 30, 10])

# Perform stochastic gradient descent
myNet.SGD(train_set, epochs = 30, mini_batch_size = 10, eta = 3.0, test_data = test_set)

Epoch 0: 9111 / 10000
Epoch 1: 9238 / 10000
Epoch 2: 9296 / 10000
Epoch 3: 9339 / 10000
Epoch 4: 9396 / 10000
Epoch 5: 9372 / 10000
Epoch 6: 9419 / 10000
Epoch 7: 9434 / 10000
Epoch 8: 9440 / 10000
Epoch 9: 9441 / 10000
Epoch 10: 9449 / 10000
Epoch 11: 9469 / 10000
Epoch 12: 9477 / 10000
Epoch 13: 9485 / 10000
Epoch 14: 9423 / 10000
Epoch 15: 9494 / 10000
Epoch 16: 9477 / 10000
Epoch 17: 9463 / 10000
Epoch 18: 9509 / 10000
Epoch 19: 9515 / 10000
Epoch 20: 9480 / 10000
Epoch 21: 9504 / 10000
Epoch 22: 9511 / 10000
Epoch 23: 9510 / 10000
Epoch 24: 9503 / 10000
Epoch 25: 9490 / 10000
Epoch 26: 9493 / 10000
Epoch 27: 9487 / 10000
Epoch 28: 9509 / 10000
Epoch 29: 9510 / 10000


## Conclusion

This brings this notebook to an end. By the end of the learning, the neural network managed to attain an accuracy of **95.10 %**, which is satisfactory for starters.