### This code has been inspired by : http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/

### LAB - Neural Network From Scratch 
#### Numan SAHNOU & Matthieu ECCHER

We defined a `Multiply` class, it returns the operation __X * W__ (= we called this operation __Z__) in "forward" mode and return __dW__ and __dX__ (prefix 'd' means previous values) in "backward" mode

In [1]:
import numpy as np

class Multiply:
    def forward(self,W, X):
        return np.dot(X, W)

    def backward(self, W, X, dZ):
        dW = np.dot(np.transpose(X), dZ)
        dX = np.dot(dZ, np.transpose(W))
        return dW, dX

Here we defined a `Add` class, it returns the operation __X + b__ (= we called this operation __Z_biased__) in "forward" mode and return __db__ and __dX__ in "backward" mode

In [2]:
class Add:
    def forward(self, X, b):
        return X + b

    def backward(self, X, b, dZ):
        dX = dZ * np.ones_like(X)
        db = np.dot(np.ones((1, dZ.shape[0]), dtype=np.float64), dZ)
        return db, dX

__Sigmoid__ function, in "forward" we simply apply Sigmoid to Z, in "backward" we retrieve __dZ_biased__

In [3]:
class Sigmoid:
    def forward(self, Z):
        return 1 / (1 + np.exp(-Z))

    def backward(self, Z, top_diff):
        output = self.forward(Z)
        return (1 - output) * output * top_diff

__Softmax__ used for the ouput. 

In [4]:
def softmax(X):
    return np.exp(X) / np.sum(np.exp(X), axis=1, keepdims=True)

## Model class

We can implement out neural network by a class `Model` and initialize the parameters in the `__init__` function (Weight and Bias)

First we implement the loss function (`calculate_loss`). It is just a forward propagation computation of our neural network. We use this to evaluate how well our model is doing
<br>We also implement `predict` function to calculate the output of the network. It does forward propagation and returns the class with the highest probability.<br>
Finally, we defined the function `train` to __train__ our Neural Network. It implements batch gradient descent using the __backpropagation algorithms__ 

In [5]:
class Model:
    def __init__(self, layers_dim):
        self.b = []
        self.W = []
        for i in range(len(layers_dim)-1):
            self.W.append(np.random.randn(layers_dim[i], layers_dim[i+1]) / np.sqrt(layers_dim[i]))
            self.b.append(np.random.randn(layers_dim[i+1]).reshape(1, layers_dim[i+1]))
    
    def diff(self, X, y):
        num_examples = X.shape[0]
        probs = softmax(X)
        probs[range(num_examples), y] -= 1
        return probs

    def calculate_loss(self, X2, y):
        mul = Multiply()
        add = Add()
        sigmoid = Sigmoid()
        
        X = X2
        for i in range(len(self.W)):
            Z = mul.forward(self.W[i], X)
            Z_biased = add.forward(Z, self.b[i])
            X = sigmoid.forward(Z_biased)

        num_examples = X.shape[0]
        probs = softmax(X)
        correct_logprobs = -np.log(probs[range(num_examples), y])
        data_loss = np.sum(correct_logprobs)
        return 1/num_examples * data_loss

    #Return the class with the highest probability
    def predict(self, X2):
        mul = Multiply()
        add = Add()
        sigmoid = Sigmoid()

        X = X2
        for i in range(len(self.W)):
            Z = mul.forward(self.W[i], X)
            Z_biased = add.forward(Z, self.b[i])
            X = sigmoid.forward(Z_biased)

        probs = softmax(X)
        return np.argmax(probs, axis=1)

    def train(self, X2, y, iterations, alpha, delta, print_loss=False):
        mul = Multiply()
        add = Add()
        sigmoid = Sigmoid()

        for epoch in range(iterations):
            # Forward propagation
            X = X2
            forward = [(None, None, X)]
            for i in range(len(self.W)):
                Z = mul.forward(self.W[i], X)
                Z_biased = add.forward(Z, self.b[i])
                X = sigmoid.forward(Z_biased)
                forward.append((Z, Z_biased, X))

            # Back propagation
            dsigmoid = self.diff(forward[len(forward)-1][2], y)
            for i in range(len(forward)-1, 0, -1):
                dZ_biased = sigmoid.backward(forward[i][1], dsigmoid)
                db, dZ = add.backward(forward[i][0], self.b[i-1], dZ_biased)
                dW, dsigmoid = mul.backward(self.W[i-1], forward[i-1][2], dZ)
                # Add regularization terms
                dW += delta * self.W[i-1]
                # Gradient descent parameter update
                self.b[i-1] += -alpha * db
                self.W[i-1] += -alpha * dW

            if print_loss and epoch % 100 == 0:
                print("Loss after iteration %i: %f" %(epoch, self.calculate_loss(X2, y)))

## Execution of the model

Here we will execute the model with an input layer of dimention __400__, an hidden layer of dimention __150__ and an ouput of dimension __10__

In [6]:
import matplotlib.pyplot as plt
import numpy as np
import sklearn.datasets
import sklearn.linear_model
from sklearn.model_selection import train_test_split 

X = np.loadtxt("features.txt", delimiter = ",",dtype=float)
y = np.loadtxt("labels.txt", delimiter = ",",dtype=int)

y[y == 10] = 0

layers_dim = [400, 150, 10]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

model = Model(layers_dim)
model.train(X_train, y_train, iterations=1000, alpha=0.001, delta=0.0001, print_loss=True)
y_predicted = model.predict(X_test)
    
def score_from_scratch(y_predicted, y_actual):
    cpt=0
    for i in range(len(y_predicted)):
        if(y_predicted[i] == y_actual[i]):
            cpt+=1
            
    return cpt/len(y_predicted)

print("Accuracy of the model from scratch : ", score_from_scratch(y_predicted, y_test))

Loss after iteration 0: 2.295576
Loss after iteration 100: 1.677630
Loss after iteration 200: 1.617462
Loss after iteration 300: 1.586783
Loss after iteration 400: 1.569411
Loss after iteration 500: 1.558131
Loss after iteration 600: 1.549886
Loss after iteration 700: 1.543463
Loss after iteration 800: 1.538243
Loss after iteration 900: 1.533867
Accuracy of the model from scratch :  0.919


### Confusion Matrix of the model from scratch

In [7]:
from sklearn.metrics import confusion_matrix
import pandas as pd

y_predicted = pd.get_dummies(y_predicted, columns = [0])
y_test = pd.get_dummies(y_test, columns = [0])
y_train = pd.get_dummies(y_train, columns = [0])

matrix = confusion_matrix(
    y_test.to_numpy().argmax(axis=1), y_predicted.to_numpy().argmax(axis=1))

matrix

array([[114,   0,   1,   0,   0,   1,   1,   0,   1,   0],
       [  0, 100,   1,   0,   0,   0,   0,   0,   1,   0],
       [  3,   1,  89,   1,   3,   0,   0,   2,   3,   0],
       [  0,   1,   6,  88,   0,   1,   0,   0,   1,   0],
       [  1,   0,   0,   0,  95,   0,   0,   0,   1,   6],
       [  1,   2,   1,   4,   1,  89,   2,   0,   5,   0],
       [  0,   0,   1,   0,   0,   2,  96,   0,   0,   0],
       [  0,   2,   0,   0,   4,   0,   0,  96,   0,   2],
       [  0,   2,   4,   2,   1,   1,   2,   0,  70,   0],
       [  1,   0,   0,   0,   1,   0,   1,   2,   1,  82]])

## Comparison with tensorflow 

In [14]:
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from keras.optimizers import SGD

model = Sequential()

model.add(Dense(units=400, activation='sigmoid', input_shape=(400,)))
model.add(Dense(units=150, activation='sigmoid'))
model.add(Dense(units=10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

model.fit(X_train, y_train, batch_size = 400, epochs = 100, verbose=1)
print("Model prediction classes :\n ", model.predict_classes(X_test))

score, acc = model.evaluate(X_test, y_test,batch_size=300)

print("\n Accuracy of Tensorflow model : ", acc)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

### Confusion Matrix Tensorflow Keras

In [15]:
from sklearn.metrics import confusion_matrix

y_pred = model.predict_classes(X_test)

y_pred = pd.get_dummies(y_pred, columns = [0])

matrix_keras = confusion_matrix(
    y_test.to_numpy().argmax(axis=1), y_pred.to_numpy().argmax(axis=1))

matrix_keras



array([[113,   0,   0,   0,   0,   2,   2,   0,   1,   0],
       [  0, 100,   1,   0,   1,   0,   0,   0,   0,   0],
       [  2,   0,  94,   1,   2,   0,   0,   1,   2,   0],
       [  0,   0,   2,  87,   0,   6,   0,   0,   1,   1],
       [  0,   0,   0,   0,  98,   0,   0,   0,   1,   4],
       [  0,   1,   0,   3,   0,  97,   1,   0,   3,   0],
       [  0,   0,   1,   0,   0,   2,  96,   0,   0,   0],
       [  0,   1,   0,   1,   1,   0,   0,  96,   0,   5],
       [  0,   3,   1,   3,   1,   0,   0,   0,  74,   0],
       [  1,   0,   1,   0,   2,   0,   0,   3,   1,  80]])

### Comparison of the two confusion matrix

In [16]:
print("Confusion Matrix model from scratch \n",matrix, "\n Confusion Matrix keras model \n", matrix_keras)

Confusion Matrix model from scratch 
 [[114   0   1   0   0   1   1   0   1   0]
 [  0 100   1   0   0   0   0   0   1   0]
 [  3   1  89   1   3   0   0   2   3   0]
 [  0   1   6  88   0   1   0   0   1   0]
 [  1   0   0   0  95   0   0   0   1   6]
 [  1   2   1   4   1  89   2   0   5   0]
 [  0   0   1   0   0   2  96   0   0   0]
 [  0   2   0   0   4   0   0  96   0   2]
 [  0   2   4   2   1   1   2   0  70   0]
 [  1   0   0   0   1   0   1   2   1  82]] 
 Confusion Matrix keras model 
 [[113   0   0   0   0   2   2   0   1   0]
 [  0 100   1   0   1   0   0   0   0   0]
 [  2   0  94   1   2   0   0   1   2   0]
 [  0   0   2  87   0   6   0   0   1   1]
 [  0   0   0   0  98   0   0   0   1   4]
 [  0   1   0   3   0  97   1   0   3   0]
 [  0   0   1   0   0   2  96   0   0   0]
 [  0   1   0   1   1   0   0  96   0   5]
 [  0   3   1   3   1   0   0   0  74   0]
 [  1   0   1   0   2   0   0   3   1  80]]
