In [None]:
import numpy as np # linear algebra

# Continuing from Single Layer Perceptron (Part 1)...
Can we map our inputs to output for the following data ?

| No. | Input | Output |
| --- | ----- | --- |
| 1 | 0 0 1 | 0 |
| 2 | 1 1 0 | 0 |
| 3 | 1 0 1 | 1 |
| 4 | 0 1 1 | 1 |

<br>
- If we see very carefully, there is no direct correlation of any input column with the output
- But, there is a correlation of combination of first two input columns with the output
- For first two columns, if either of them is 1, then our output is 1
- Here, we will have to make our ANN learn this pattern
- To make this happen, we will have to assign different weights to every element of each input, so that every element will contribute to pattern identification
- Not only this, we will also introduce one hidden layer this time
- This hidden layer will aid us to deal with the complexity of the problem
- The weights between Input layer and Hidden layer will map patterns among input elements
- While the weights between Hidden layer and Output layer will decide on the final output depending on pattern 

# Let's get started

In [None]:
input = [[0,0,1],[0,1,1],[1,0,1],[1,1,0]]
output = [0, 1, 1, 0]

In [None]:
X = np.array(input)
print(X.shape)
X

In [None]:
Y = np.array(output).reshape(4,1)
print(Y.shape)
Y

In [None]:
# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(42)

In [None]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

In [None]:
def sigmoid_derivative(output):
    return output * (1 - output)

In [None]:
# randomly initialize our weights 
synapse0 = 2 * np.random.random((3,4)) - 1 # weights between Input layer and Hidden layer
synapse0.shape

Here, 3 represents the weights for each element of an input, while 4 represents the number of input samples.

In [None]:
synapse1 = 2 * np.random.random((4,1)) - 1 # weights between Hidden layer and Output layer

Here, 4 represents the weights for number of combinations produced at Hidden layer.

In [None]:
for j in range(50000):

    # Feed forward
    l0 = X                             # Input layer - 4 x 3 matrix
    l1 = sigmoid(np.dot(l0,synapse0))  # Hidden layer - 4 x 3 dot 3 x 4 = 4 X 4 matrix
    l2 = sigmoid(np.dot(l1,synapse1))  # Output layer - 4 x 4 dot 4 x 1 = 4 X 1 matrix
    
    # how much did we miss the target value?
    l2_error = l2 - Y
    
    if(j % 10000 == 0):
        print("Error: {}".format(np.mean(np.abs(l2_error))))
    
    # error_weighted_derivative at Output layer
    error_weighted_derivative2 = l2_error * sigmoid_derivative(l2)
    synapse_derivative1 = np.dot(l1.T, error_weighted_derivative2)
              
    # how much did each l1 value contribute to the l2 error (according to the weights)?
    l1_error = np.dot(error_weighted_derivative2, synapse1.T)
    
    # error_weighted_derivative at Hidden layer
    error_weighted_derivative1 = l1_error * sigmoid_derivative(l1)
    synapse_derivative0 = np.dot(l0.T, error_weighted_derivative1)

    # learning !
    synapse1 -= synapse_derivative1
    synapse0 -= synapse_derivative0
        
print("Output After Training:")
l2 

# End notes...
- Single Layer Perceptron made our way easy in understanding Multi Layer Perceptron
- There are lot of questions still to be answered, we will take them up in upcoming posts
