# Multui layer Neural Network:
This is a very early approach to make a 2 and 3 leyer neural network that can easily be traced and understood
This network therefore computes very simple predictions in a very small dataset described by [this](http://iamtrask.github.io/2015/07/12/basic-python-network/) tutorial by iamtrask

## 2 Layer Neural Net

|Input  |Output|
|-------|------|
|[0,0,1]|0     |
|[1,1,1]|1     |
|[1,0,1]|1     |
|[0,1,1]|0     |

We can see that the leftmost input column is perfectly correlated with the output.
We therefore expect Backpropagation to measure statistics like this to make train the model.

In [3]:
import numpy as np

# sigmoid function
# maps any value to a value between 0 and 1.
# use it to convert numbers to probabilities.
def sigmoid (x, deriv = False):
    # implement the gradient inside for convenience
    if(deriv):
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))

# Data 
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])
          
y = np.array([[0,0,1,1]]).T

# synapse zero:  weight matrix
# - Only one for one layer
# - Its dimension is (3,1) because we have 3 inputs and 1 output.
syn0 = 2 * np.random.random((3,1)) - 1

for _ in range(10000):

    # make a prediction
    l0 = X
    l1 = sigmoid(np.dot(l0, syn0))
    # this returns a 4 by 1 vector with the expected values of our inputs

    # calculate a simple linear error
    l1_error = y - l1

    # multiply the error by the 
    # slope of the sigmoid at the values in l1
    # Because sigmoid is higher in 0 regien when we have low confidence
    # and we want to change those weight more heavily
    l1_delta = l1_error * sigmoid(l1, True)

    # update weights
    syn0 += np.dot(l0.T, l1_delta)

print ("Output After Training:\n{}".format(l1))

Output After Training:
[[ 0.00966968]
 [ 0.00786504]
 [ 0.99358967]
 [ 0.99211616]]


|Variable|Definition|
|--------|----------|
|X|Input dataset matrix where each row is a training example|
|y|Output dataset matrix where each row is a training example|
|l0|First Layer of the Network, specified by the input data|
|l1|Second Layer of the Network, otherwise known as the hidden layer|
|syn0|First layer of weights, Synapse 0, connecting l0 to l1|

## 3 Layer Neural Net:
Our first layer will combine the inputs, and our second layer will then map them to the output using the output of the first layer as input. 

|Input  |Output|
|-------|------|
|[0,0,1]|0     |
|[1,1,1]|1     |
|[1,0,1]|1     |
|[0,1,1]|0     |

In [10]:
import numpy as np

def nonlin(x, deriv = False):
    if(deriv == True):
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))
    
X = np.array([[0,0,1],
            [0,1,1],
            [1,0,1],
            [1,1,1]])
                
y = np.array([[0],
              [1],
              [1],
              [0]])

# randomly initialize our weights with mean 0
syn0 = 2 * np.random.random((3,4)) - 1
syn1 = 2 * np.random.random((4,1)) - 1

for j in range(60000):

    # make a prediction
    l0 = X
    l1 = nonlin(np.dot(l0,syn0))
    l2 = nonlin(np.dot(l1,syn1))

    # calculate the error
    l2_error = y - l2
    
    # print the error every 10_000 steps to see if working
    if (j % 10000) == 0:
        print ("Error: {}".format(str(np.mean(np.abs(l2_error)))))
        
    # in what direction is the target value?
    # were we really sure? if so, don't change too much.
    l2_delta = l2_error * nonlin(l2, deriv = True)

    # how much did each l1 value contribute to the l2 error (according to the weights)?
    l1_error = l2_delta.dot(syn1.T)
    
    # in what direction is the target l1?
    l1_delta = l1_error * nonlin(l1, deriv = True)

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)

print ("\nOutput After Training:\n{}".format(l2))

Error: 0.500000030718
Error: 0.0100930169264
Error: 0.0069060210652
Error: 0.00555848815644
Error: 0.00477250290804
Error: 0.00424329637213

Output After Training:
[[ 0.00411882]
 [ 0.99619376]
 [ 0.99617695]
 [ 0.00367733]]


|Variable|Definition|
|--------|----------|
|X|Input dataset matrix where each row is a training example|
|y|Output dataset matrix where each row is a training example|
|l0|First Layer of the Network, specified by the input data|
|l1|Second Layer of the Network, otherwise known as the hidden layer|
|l1|Final Layer of the Network, which is our hypothesis, and should approximate the correct answer as we train.|
|syn0|First layer of weights, Synapse 0, connecting l0 to l1|
|syn1|Second layer of weights, Synapse 1 connecting l1 to l2.|
|l2_error|This is the amount that the neural network "missed".|
|l2_delta|This is the error of the network scaled by the confidence. It's almost identical to the error except that very confident errors are muted.|
|l1_error|Weighting l2_delta by the weights in syn1, we can calculate the error in the middle/hidden layer.|
|l1_delta|This is the l1 error of the network scaled by the confidence. Again, it's almost identical to the l1_error except that confident errors are muted.|