## Title: 2-layered Neural Network

### Date: November 29, 2019
### Author: Nikhil Singh
### Details: The entire code of neural networks has been implemented using numpy matrix and has been tested 3-input X-OR gate.
---
---

### Basics of Neural Network

The output $\widehat y$ of a simple 2-layer Neural Network is:

$$\widehat y = \sigma(W_{2}\sigma(W_{1}x + b_{1}) + b_{2})$$


> Each iteration of the training process consists of the following steps:
    - Calculating the predicted output ŷ, known as feedforward
    - Updating the weights and biases, known as backpropagation
    
-------    
### Feedforward
$$\widehat y = \sigma(W_{2}\sigma(W_{1}x + b_{1}) + b_{2})$$

-------    
### Loss Function
$$Sum of Squares Error = \sum_{i=1}^{n} (y-\widehat y)^2$$

-------    
### Backpropogation
- derivative of the loss function w.r.t the weights and biases. The issue is, we can't directly calculate the derivative of the loss function w.r.t. to the weights and biases because the equation of the loss function doesn't contant the weights and biases.

> Chain rule     
 $$ Loss(y,\widehat y) = \sum_{i=1}^{n}(y-\widehat y)^2$$
 $$ \frac{\partial Loss(y,\widehat y)}{\partial W} = \frac{\partial Loss(y, \widehat y)}{\partial \widehat y} * \frac{\partial \widehat y}{\partial z} * \frac{\partial z}{\partial W}$$
 $$ =   2(y - \widehat y) * derivative     of sigmoid function *x$$
 $$ =   2(y - \widehat y) * z(1-z) *x$$
 
 _above shown is a partial derivative on 1-layer NN_

--------

## Below is the python code of the same and I've coded everything from scratch:

### __Objective:__ Implementing a XOR gate using Neural Network with 3 inputs and trying to get as close to 100% accuracy as possible using a 2 layer Neural Network

#### All the necessary functions are coded below:

In [12]:
import numpy as np

def sigmoid(x):
    return 1.0/(1+np.exp(-x))

def sigmoid_derivative(x):
    return x * (1.0 - x)

def initialize_wt_bias(input_layer_neurons, hidden_layer_neurons, output_neurons):
    # for hidden layer
    weights_hidden = np.random.uniform(size = (input_layer_neurons, hidden_layer_neurons))
    bias_hidden = np.random.uniform(size = (1, hidden_layer_neurons))
    
    # for output layer (single neuron in this case, can be treated for multiple cases also)
    weights_output = np.random.uniform(size = (hidden_layer_neurons, output_neurons))
    bias_output = np.random.uniform(size = (1, output_neurons))
    return weights_hidden, bias_hidden, weights_output, bias_output
    
def feedforward(X, weights_hidden, bias_hidden, weights_output, bias_output):
    # Forward Propogation
    
    hidden_layer_ip = np.dot(X, weights_hidden) + bias_hidden # dot-product of input & weights for hidden layer and adding bias in that
    hidden_layer_activation = sigmoid(hidden_layer_ip)
    
    output_layer_ip = np.dot(hidden_layer_activation, weights_output) + bias_output
    output_layer_activation = sigmoid(output_layer_ip)
    return hidden_layer_activation, output_layer_activation
    
def loss_function(y, output_layer_activation):
    error = y - output_layer_activation
    return error

def backpropogation(X, hidden_layer_activation, output_layer_activation, error, weights_output, bias_output, weights_hidden, bias_hidden, learning_rate = 0.1):
    # slope from output layer
    slope_output_layer = sigmoid_derivative(output_layer_activation)
    d_output = error * slope_output_layer
    
    # slope from hidden layer
    slope_hidden_layer = sigmoid_derivative(hidden_layer_activation)
    error_hidden_layer = np.dot(d_output, weights_output.T)
    d_hidden_layer = error_hidden_layer * slope_hidden_layer
    
    # Re-initializing output weights & bias matrix
    weights_output += np.dot(hidden_layer_activation.T, d_output) * learning_rate
    bias_output += np.sum(d_output, axis=0, keepdims=True) * learning_rate
    
    # Re-initialzing hidden layer weights & bias matrix
    weights_hidden += np.dot(X.T, d_hidden_layer) * learning_rate
    bias_hidden += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate
    
    return weights_hidden, bias_hidden, weights_output, bias_output

In [13]:
np.random.seed(123)

#### The truth-table for a 3-input XOR (Exclusive - OR) gate looks like this

| X1 | X2 | X3 | y |
|:--:|:--:|:--:|:-:|
|  0 |  0 |  0 | 0 |
|  0 |  0 |  1 | 1 |
|  0 |  1 |  0 | 1 |
|  0 |  1 |  1 | 0 |
|  1 |  0 |  0 | 1 |
|  1 |  0 |  1 | 0 |
|  1 |  1 |  0 | 0 |
|  1 |  1 |  1 | 1 |

> I'll use _X1, X2 & X3_ as inputs and _y_ as output

### Experiment - 1

#### Defining input data

In [14]:
X=np.array([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]])
print ('\n Input:')
print(X)


 Input:
[[0 0 0]
 [0 0 1]
 [0 1 0]
 [0 1 1]
 [1 0 0]
 [1 0 1]
 [1 1 0]
 [1 1 1]]


#### Actual Output

In [15]:
y = np.array([[0],[1],[1],[0],[1],[0],[0],[1]])
print('\n Actual Output:')
print(y)


 Actual Output:
[[0]
 [1]
 [1]
 [0]
 [1]
 [0]
 [0]
 [1]]


#### Initializing basic arguments

In [16]:
epoch = 10000
learning_rate = 0.1
input_layer_neurons = X.shape[1]
hidden_layer_neurons = 8
output_neurons = 1

#### Initializing Weights & biases

In [17]:
weights_hidden, bias_hidden, weights_output, bias_output = initialize_wt_bias(input_layer_neurons, hidden_layer_neurons, output_neurons)

#### Running the neural networks by simply calling the above mentioned functions

In [18]:
for i in range(epoch):
    
    if i%1000 == 0:
        print("Epoch {}".format(i))
    
    hidden_layer_activation, output_layer_activation = feedforward(X, weights_hidden, bias_hidden, weights_output, bias_output)
    error = loss_function(y, output_layer_activation)
    weights_hidden, bias_hidden, weights_output, bias_output = backpropogation(X, hidden_layer_activation, output_layer_activation, error, weights_output, bias_output, weights_hidden, bias_hidden, learning_rate)

Epoch 0
Epoch 1000
Epoch 2000
Epoch 3000
Epoch 4000
Epoch 5000
Epoch 6000
Epoch 7000
Epoch 8000
Epoch 9000


#### Output probabilities

In [19]:
print ('\n Output from the model:')
print (output_layer_activation)


 Output from the model:
[[0.09779328]
 [0.88350865]
 [0.87770153]
 [0.26781581]
 [0.88290741]
 [0.26951997]
 [0.26971992]
 [0.48599104]]


#### Model predicted output array

In [20]:
y_hat = np.where(output_layer_activation>0.5,1,0)
y_hat

array([[0],
       [1],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0]])

#### Accuracy

In [21]:
accuracy = (y == y_hat).mean() * 100
print("Accuracy = {}".format(accuracy))

Accuracy = 87.5


### Another Experiment with epochs = 15000

In [22]:
# Initializing basic arguments
epoch = 15000
learning_rate = 0.1
input_layer_neurons = X.shape[1]
hidden_layer_neurons = 8
output_neurons = 1

# Initializing weights and biases
weights_hidden, bias_hidden, weights_output, bias_output = initialize_wt_bias(input_layer_neurons, hidden_layer_neurons, output_neurons)

# Running the model
for i in range(epoch):
    
    if i%1000 == 0:
        print("Epoch {}".format(i))
    
    hidden_layer_activation, output_layer_activation = feedforward(X, weights_hidden, bias_hidden, weights_output, bias_output)
    error = loss_function(y, output_layer_activation)
    weights_hidden, bias_hidden, weights_output, bias_output = backpropogation(X, hidden_layer_activation, output_layer_activation, error, weights_output, bias_output, weights_hidden, bias_hidden, learning_rate)
    
    
# Output probabilities
print ('\n Output from the model:')
print (output_layer_activation)

# Creating model predicted y-matrix
y_hat = np.where(output_layer_activation>0.5,1,0)

# Calculating Accuracy
accuracy = (y == y_hat).mean() * 100#.all(axis=(0,2)).mean()
print("Accuracy = {}".format(accuracy))

Epoch 0
Epoch 1000
Epoch 2000
Epoch 3000
Epoch 4000
Epoch 5000
Epoch 6000
Epoch 7000
Epoch 8000
Epoch 9000
Epoch 10000
Epoch 11000
Epoch 12000
Epoch 13000
Epoch 14000

 Output from the model:
[[0.0178696 ]
 [0.97641074]
 [0.977834  ]
 [0.04015204]
 [0.96394376]
 [0.03314883]
 [0.03616036]
 [0.95321119]]
Accuracy = 100.0


> After running the code for _10000_ epochs which will not take more than 2 seconds to run, I'm getting an accuracy of 87.5%

> In case of _15000_ epochs, I was able to achieve 100% accuracy.

## Script Over