# Fully Connected Neural Network

***The notebook contains the code of a 3-layer (input - hidden - output) neural network with backpropagation for my course project.***<br>
***I will apply the Neural Network model to the MINST dataset to test its performace.***


## Load modules

In [1]:
import numpy as np
import pandas as pd
import random 
import warnings
from tensorflow.examples.tutorials.mnist import input_data
from sklearn.preprocessing import MinMaxScaler

warnings.filterwarnings("ignore");

## Load data
Then we wil load the MINST data from sklearn and split the data into train and test 

In [113]:
# extract data with tensorflow API
mnist = input_data.read_data_sets("MNIST_data/")
mnist_train_images = mnist.train.images
mnist_train_labels = mnist.train.labels
mnist_test_images = mnist.test.images
mnist_test_labels = mnist.test.labels

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [114]:
print(type(mnist_train_images))
print(mnist_train_images.shape)
print(mnist_test_images.shape)
print(mnist_train_labels.shape)
print(mnist_test_labels.shape)

<class 'numpy.ndarray'>
(55000, 784)
(10000, 784)
(55000,)
(10000,)


We have to do the one-hot-encoding on the labels and scale the data

In [115]:
# scale
scaler = MinMaxScaler(feature_range=(0,1), copy = True)
scale_input = np.concatenate([mnist_train_images,mnist_test_images],axis = 0)
minst_scale = scaler.fit_transform(scale_input)

# retrieve the data
X_train = minst_scale[:mnist_train_images.shape[0],]
X_test = minst_scale[mnist_train_images.shape[0]:,]

In [116]:
X_train = [X_train[i,] for i in range(X_train.shape[0])]
X_test = [X_test[i,] for i in range(X_test.shape[0])]

In [117]:
# onehotencode of labels
y_train = []
for i in mnist_train_labels:
    encode = np.zeros(10)
    encode[i] = 1
    y_train.append(encode)

y_test = []
for i in mnist_test_labels:
    encode = np.zeros(10)
    encode[i] = 1
    y_test.append(encode)

# create train test data sets
train = [k for k in zip(X_train, y_train)]
test = [k for k in zip(X_test, y_test)]

## A nueral network class

In [124]:
class NeuralNet(object):
    
    
    def __init__(self,size):
        """
        Initialize w with small random gaussian
        """

        self.size = size
        self.depth = len(size)
        self.biases = [np.random.rand(x,1) for x in self.size[1:]]
        self.weights = [np.zeros([x,y]) for x,y in zip(self.size[1:], self.size[:-1])]
    
    def fit(self, train, test = None, num_epoch = 10, batch_size = 100, learn_rate = 1.0):
        """
        The fit method shuffle the train data and create the batches.
        Then it update the weights and bias with stochastic gradient descent
        and returns the training error on test dataset after each epoch
        
        """
        
        n_train = len(train)
        n_test = len(test)
        i = 0
        
        if test:
            for i in range(num_epoch):
                random.shuffle(train)
                batches = [train[k:k+batch_size] for k in np.arange(0, n_train, batch_size)]
                
                for batch in batches:
                    self.update(batch, learn_rate)
                i += 1
                print("Prediction accuracy on test data after epoch {} is: {}".format(i, 
                        self.evaluate(test)/n_test))
        
    def update(self, batch, learn_rate):
        """
        Update the parameters in each minibatch
        """
        
        batch_length = len(batch)
        weight_change = [np.zeros(weight.shape) for weight in self.weights]
        bias_change = [np.zeros(bias.shape) for bias in self.biases]
        
        # accumulate parameter gradient
        for i in range(batch_length):
            delta_weight, delta_bias = self.backpropagation(batch[i])
            weight_change = [x+y for x,y in zip(weight_change, delta_weight)]
            bias_change = [x+y for x,y in zip(bias_change, delta_bias)]
        
        # update the weights and bias for a mini-batch
        self.weights = [x-(learn_rate/batch_length) * y for x, y in zip(self.weights, weight_change)]
        self.biases = [x-(learn_rate/batch_length) * y for x, y in zip(self.biases, bias_change)]
        
    def backpropagation(self, one_sample):
        """
        This function takes in one sample and return the gradient of 
        cost function of each layer with backpropagation
        """
        
        x = one_sample[0]
        x.shape = (len(x),1)
        y = one_sample[1]
        y.shape = (len(y),1)
        
        weight_container = [np.zeros(weight.shape) for weight in self.weights]
        bias_container = [np.zeros(bias.shape) for bias in self.biases]
        
        # forward
        # hidden layer
        z1 = np.dot(self.weights[0], x) + self.biases[0]
        output1 = self.sigmoid(z1)
        
        # output layer
        z2 = np.dot(self.weights[1], output1) + self.biases[1]
        output2 = self.softmax(z2)
        
        # gradient of cost
        delta2 = output2 - y
        
        # backward propagated to hidden layer
        bias_container[-1] = delta2
        weight_container[-1] = np.dot(delta2, output1.transpose())
        
        # backward propagated to hidden layer
        r = self.sigmoid_dev(z1)
        delta1 = np.dot(self.weights[-1].transpose(), delta2) * r
        bias_container[-2] = delta1
        weight_container[-2] = np.dot(delta1, x.transpose())
        return (weight_container, bias_container)
        
    def predict(self, x):
        """
        return vector indicating estimated class of x
        """
        
        x.shape = (len(x),1)
        z1 = np.dot(self.weights[0], x) + self.biases[0]
        output1 = self.sigmoid(z1)
        z2 = np.dot(self.weights[1], output1) + self.biases[1]
        output2 = self.softmax(z2)
        output = np.argmax(output2)
        return output
        
    def evaluate(self, test):
        """
        This function will evaluate the accuracy of prediction of Neural Network
        """
        
        output = [(self.predict(x), np.argmax(y)) for (x, y) in test]
        return sum(int(x == y) for (x, y) in output)
        
    # activation functions and derivatives
    def softmax(self, z):
        output = np.exp(z)/np.sum(np.exp(z))
        return output
    
    def sigmoid(self,z):
        return 1.0/(1.0+np.exp(-z))
    
    def sigmoid_dev(self,z):
        return self.sigmoid(z)*(1-self.sigmoid(z))

## Application

In [125]:
Network = NeuralNet([784, 30, 10])

In [126]:
Network.fit(train = train, test = test, num_epoch = 30, batch_size = 10, learn_rate = 0.1)

Prediction accuracy on test data after epoch 1 is: 0.9156
Prediction accuracy on test data after epoch 2 is: 0.9305
Prediction accuracy on test data after epoch 3 is: 0.9417
Prediction accuracy on test data after epoch 4 is: 0.9433
Prediction accuracy on test data after epoch 5 is: 0.9468
Prediction accuracy on test data after epoch 6 is: 0.9514
Prediction accuracy on test data after epoch 7 is: 0.9553
Prediction accuracy on test data after epoch 8 is: 0.9556
Prediction accuracy on test data after epoch 9 is: 0.9569
Prediction accuracy on test data after epoch 10 is: 0.9618
Prediction accuracy on test data after epoch 11 is: 0.9596
Prediction accuracy on test data after epoch 12 is: 0.9608
Prediction accuracy on test data after epoch 13 is: 0.9624
Prediction accuracy on test data after epoch 14 is: 0.9624
Prediction accuracy on test data after epoch 15 is: 0.9634
Prediction accuracy on test data after epoch 16 is: 0.9612
Prediction accuracy on test data after epoch 17 is: 0.9631
Predic

Bingo, the accuracy reaches 96%.<br>
Next, I will try to add more activation functions like Relu, and rebuild the backpropagation method