![Multi layered perceptron](https://upload.wikimedia.org/wikipedia/commons/c/c2/MultiLayerNeuralNetworkBigger_english.png)

# The Multi-Layered Perceptron

1. Introduction to Multi-layer networks
2. Activation functions (introducing sigmoids)
3. Sigmoid function Theory
4. Hidden layer activation
5. Sigmoid function implementation
6. Error Functions
7. Multilayered Perceptron basic Algorithm
8. Gradient Descent
9. Output layerDelta
10. Backpropagation
11. Implementation of multi-layer Perceptron with Python & numpy

# Introduction to Multi-layer networks

We can see in the depiction at the top of the workbook the main concepts of a multilayer perceptron. 
- The idea of a hidden layer. 
- The confirmation that each neuron of that layer should its own: 
    - sum function 
    - activation function. 
- We see that for each input and weight, there is a connection to each neuron in the hidden layer. This means that a great many individual connection lines may be seen. 
- We see that each neuron in the hidden layer is connected to the last neuron in the output layer and passes the weights along these connections. 
- The output layer neuron will have a sum function & activation function. This leads to a final output from our networks evaluations.

# Activation functions (Introducing sigmoids)

- `Step functions` : can have values of zero or 1, ie values are stepped. This is the example we seen in the single layer perceptron. 

- `Sigmoid functions` : can have a value in the **_range of 0 to 1_**, the function has the ability to touch all the points between 0 & 1. To get the value we can apply the following equation: $y = \frac{1}{1 + e^{-x}}$. This works by determining where on the line a value belongs.
    - if `x` is high, the value lies closer to, or equal to 1. 
    - if `x` is low , the value lies closer to, or equal to 0. 
    
If we need to return negative values we can use the `hyperbolic tangent function`:
- $y = \frac{e^{x} - e^{-x} }{e^{x} + e^{-x}}$
- evaluating the equation asks to replace the `x` with the value under evaluation and the return will be graded between `-1` & `1`. 

# Sigmoid function Theory

The irrational number e is also known as `Euler’s number`. It is approximately 2.718281, and is the base of the natural logarithm, ln (this means that, if $x = \ln y = \log_e y$, then $e^x = y$. 

**Note: For real input, exp(x) is always positive.**

In [1]:
import numpy as np

In [2]:
# See above for Euler’s number details. 
def sigmoid(sum):
    return 1 / (1 + np.exp(-sum))

Let's see our sigmoid in action

In [95]:
# sample testing the sigmoid function with a range of values. 
values = [-1, 0, 1, 3, 5, 30.5, -25.5]

for val in values:
    print(f"sigmoid({val})\t is:  {sigmoid(val)}")

sigmoid(-1)	 is:  0.2689414213699951
sigmoid(0)	 is:  0.5
sigmoid(1)	 is:  0.7310585786300049
sigmoid(3)	 is:  0.9525741268224334
sigmoid(5)	 is:  0.9933071490757153
sigmoid(30.5)	 is:  0.9999999999999432
sigmoid(-25.5)	 is:  8.423463754397692e-12


# Hidden Layer activation

We will use the 'XOR' operator as our case for the multi-layer study. The following truth table used as reference. We will focus on the `feed-forward` process from the input layer to the hidden layer. 

![](https://static.javatpoint.com/tutorial/coa/images/logic-gates5.png)

## Two inputs, three neurons hidden layer
In the theory for this example we will have an example with: 
- two inputs (x,y) as we have had all along
- three neurons. (with individual sum and activation functions)


#### Outcomes 
For the data sample `x=0, y=0, class=0` we will have the following scenario: 

| | Input 1 value | Input 1 weight | Input 2 value | Input 2 weight | Result |
|-|---------|----------|---------|----------|--------|
|1|0|-0.424|0|0.358| 0 * (-0.424) + 0 * 0.358 = 0.000|
|2|0|-0.740|0|-0.577| 0 * (-0.740) + 0 * (-0.577) = 0.000|
|3|0|-0.961|0|-0.469| 0 * (-0.961) + 0 * (-0.469) = 0.000|

| | Input 1 value | Input 1 weight | Input 2 value | Input 2 weight | Result |
|-|---------|----------|---------|----------|--------|
|4|0|-0.424|1|0.358| 0 * (-0.424) + 1 * 0.358 = 0.358|
|5|0|-0.740|1|-0.577| 0 * (-0.740) + 1 * (-0.577) = -0.577|
|6|0|-0.961|1|-0.469| 0 * (-0.961) + 1 * (-0.469) = -0.469|


| | Input 1 value | Input 1 weight | Input 2 value | Input 2 weight | Result |
|-|---------|----------|---------|----------|--------|
|7|1|-0.424|0|0.358| 1 * (-0.424) + 0 * 0.358 = -0.424|
|8|1|-0.740|0|-0.577| 1 * (-0.740) + 0 * (-0.577) = -0.740|
|9|1|-0.961|0|-0.469| 1 * (-0.961) + 0 * (-0.469) = -0.961|


| | Input 1 value | Input 1 weight | Input 2 value | Input 2 weight | Result |
|-|---------|----------|---------|----------|--------|
|10|1|-0.424|1|0.358| 1 * (-0.424) + 1 * 0.358 = -0.066|
|11|1|-0.740|1|-0.577| 1 * (-0.740) + 1 * (-0.577) = -1.317|
|12|1|-0.961|1|-0.469| 1 * (-0.961) + 1 * (-0.469) = -1.430|

In [17]:
import numpy as np

In [24]:
def sigmoid(sum):
    return 1 / (1 + np.exp(-sum))

#### Calling sigmoid() on the neuron results

In [29]:
sigmoid(0), sigmoid(0), sigmoid(0)

(0.5, 0.5, 0.5)

In [31]:
sigmoid(0.358), sigmoid(-0.577), sigmoid(-0.469)

(0.6044400174193626, 0.323004143761477, 0.2766780228949468)

In [32]:
sigmoid(0.424), sigmoid(-0.740), sigmoid(-0.961)

(0.6044400174193626, 0.323004143761477, 0.2766780228949468)

In [33]:
sigmoid(0.066), sigmoid(-1.317), sigmoid(-1.430)

(0.5164940131078767, 0.21131784831127748, 0.19309868423321644)

# Multilayer Perceptron Implementation 

We have covered the theory of passing from the inputs to the hidden layer. We can now move to present that in python code.

1. create the inputs 
2. create the outputs 
3. create the np.array(weights_for_each_input)
4. create the np.array(weights_for_each_hidden_layer_to_output)
5. create the epochs threshold. 
6. create the sum_synapse0 as np.dot(inputs, weights_for_each_input)
7. create the hidden_layer results as sigmoid(sum_synapse0) _see Euler's number_ 
8. apply the sum function to each of the hidden layer results (sigmoid) and activation application as sum_synapse1 = np.dot(hidden_layer, weights1)
9. define the error_function

In [36]:
# create the data 
inputs = np.array([[0,0], [0,1], [1,0], [1,1]])
inputs.shape

(4, 2)

In [37]:
outputs = np.array([[0], [1], [1], [0]])
outputs.shape

(4, 1)

In [65]:
# weights0 are inputX weights, inputY weghts respectively 
weights0 = np.array([[-0.424, -0.740, -0.961], 
                     [0.358, -0.577, -0.469]])


# weights1 are hardcoded in the class lecture of this example. 
weights1 = np.array([[-0.017], 
                     [-0.893], 
                     [0.148]])

We set the epochs limit to control the amount of times we'll allow the algorithm to run. 

In [66]:
# set the epochs limit.
epochs = 100

In [68]:
for epoch in range(epochs):
    pass

We can start the sum of the communication between the input layer and the hidden layer. This is basically a matrix multiplication exercise.Creating the results of: `for each input_layer * each weights0` except we are using the `np.dot()` is more highly optimised than using a for loop.

In [69]:
input_layer = inputs

# starts the sum of the communication between the input layer and the hidden layer
# basically a matrix multiplication exercise. Creating the results of: 
# for each input_layer * each weights0.  Important: using the np.dot is more highly 
# optimised than using a for loop.

sum_synapse0 = np.dot(input_layer, weights0)
sum_synapse0  

array([[ 0.   ,  0.   ,  0.   ],
       [ 0.358, -0.577, -0.469],
       [-0.424, -0.74 , -0.961],
       [-0.066, -1.317, -1.43 ]])

In [96]:
# calculates the hidden layer values, these are the values
# returned from the sigmoid of the sum_synapse0

hidden_layer = sigmoid(sum_synapse0)
hidden_layer

array([[0.5       , 0.5       , 0.5       ],
       [0.5885562 , 0.35962319, 0.38485296],
       [0.39555998, 0.32300414, 0.27667802],
       [0.48350599, 0.21131785, 0.19309868]])

Now we have the results of a the sigmoid from applying the inputs and weights we apply the sum funtion and activation function again to achieve the final outputs.

In [80]:
# for example 0 in our dataset (0,0) we can see the results are 0.5, 0.5, 0.5 
# lets take the results and multiply by the weights 
arr = [0.5, 0.5, 0.5]
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
result = sigmoid(calc_result)

calc_result, result

# in our case it's 0.40588573188433286

0.40588573188433286

In [85]:
# for example 1 in the dataset [0.5885562 , 0.35962319, 0.38485296]
arr = [0.5885562 , 0.35962319, 0.38485296]
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)

result = sigmoid(calc_result)

calc_result, result

(-0.27419072598999994, 0.43187856860760854)

In [86]:
# [0.39555998, 0.32300414, 0.27667802],
arr = [0.39555998, 0.32300414, 0.27667802]
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)

result = sigmoid(calc_result)

calc_result, result

(-0.25421886972, 0.43678536534288)

In [87]:
# [0.48350599, 0.21131785, 0.19309868]
arr = [0.48350599, 0.21131785, 0.19309868]
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)

result = sigmoid(calc_result)

calc_result, result

(-0.16834783724, 0.4580121586455045)

In [76]:
# create the sum_synapse1 values which are the 
# final results of the neural network for 
# each of the items in our dataset. 
sum_synapse1 = np.dot(hidden_layer, weights1)
sum_synapse1

array([[-0.381     ],
       [-0.27419072],
       [-0.25421887],
       [-0.16834784]])

In [89]:
output_layer  = sigmoid(sum_synapse1)
output_layer

array([[0.40588573],
       [0.43187857],
       [0.43678536],
       [0.45801216]])

# Error Functions (loss function, cost function)

Calculating the error by comparing the results of the predictions with the outputs of the dataset. The error function is often referred to as the loss function in ML terminology. the simplest formula is `error = correct - prediction`

|x1|x2|Class|Prediction|Error|
|--|--|-----|----------|-----|
|0|0|0|0.405|-0.405|
|0|1|1|0.431|0.569|
|1|0|1|0.436|0.564|
|1|1|0|0.458|-0.458|

In [102]:
# calculate the average error by taking the absolute values by instances 
res = (0.405 + 0.569 + 0.564 + 0.458) / 4
round(res, 3)

0.499

In [103]:
# grab the outputs defined earlier (outputs are the correct results)
outputs

array([[0],
       [1],
       [1],
       [0]])

In [104]:
# get out output layer (predictions)
output_layer

array([[0.40588573],
       [0.43187857],
       [0.43678536],
       [0.45801216]])

In [105]:
# create the errors (error = correct - preditions)
error_output_layer = outputs - output_layer
error_output_layer

array([[-0.40588573],
       [ 0.56812143],
       [ 0.56321464],
       [-0.45801216]])

In [109]:
# get the average error. As a reminder we need to use 
# the absolute vaues of the errors. If this is overlooked
# we will skew the results. 
error_avg = np.mean(abs(error_output_layer))
print(f"Error average: {error_avg}")
print(f"Error average (rounded) : {round(error_avg, 3)}")

Error average: 0.49880848923713045
Error average (rounded) : 0.499


# Multilayered Perceptron Basic Algorithm

![](https://drive.google.com/uc?export=view&id=1JqnDqu0T9k9-g87C_6dEHLoWItFK8D7J)

1. Cost function (loss function)
2. Gradient descent
3. Derivative 
4. Delta
5. Backpropagation

# Gradient Descent


![](https://cdn-images-1.medium.com/max/600/1*iNPHcCxIvcm7RwkRaMTx1g.jpeg)


The idea of gradient descent is to manage out cost function (loss function, error function) to get to the smallest possible error in the adjustment of the weights. The directional control of how a weight set should be adjusted is done by calculating the partial derivative as a means of determining the direction of a gradient. 

If you imagine a x,y axis graph with a curve, the x-axis is the weight and the y-axi is the error value, we are trying to achieve the lowest point of the curve, which may never be zero by the way, in a multi-dip curve we may have a local minimum and a global minimum across the span of measurements (number of epochs). So the purpose is to calculate the slope of a curve based on the partial derivatives.

![](https://www.researchgate.net/profile/Yong_Ma15/publication/267820876/figure/fig1/AS:669428953923612@1536615708709/Schematic-of-the-local-minima-problem-in-FWI-The-data-misfit-has-spurious-local-minima.png)

- reminder of the sigmoid function: $y = \frac{1}{1 + e^{-x}}$

- calculating the partial derivative: $d = y \cdot (1 -y)$

#### hypothetical example 

Assuming that `y` = 0.1 

- calculating the partial derivative: $d = 0.1 \cdot (1 -0.1)$


In [111]:
 def sigmoid_derivative(sigmoid):
        return sigmoid * (1 - sigmoid)

In [113]:
s = sigmoid(0.5)
s

0.6224593312018546

In [114]:
d = sigmoid_derivative(s)
d

0.2350037122015945

# Output layer Delta

The order of sequence is: 

- activation function (sigmoid) $y = \frac{1}{1 + e^{-x}}$

- Derivative $d = y \cdot (1 -y)$

- Delta $delta _{output} = error \cdot sigmoid _{derivative}$
- Gradient

## Walkthrough for dataset example 0

In [144]:
# as a reminder we will take the first calculation from the inputs example 0.  
# lets take the results and multiply by the weights 
arr = [0.5, 0.5, 0.5]
print("[Step 01.00] np.dot(input_layer, weights0) = hidden layer :", arr)

# start output layer sequence, multiply by the weights between hidden layer 
# and the output layer with final result (prediction)
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)
print(f"[Step 02.01] arr[0] * weights1[0] : {arr[0] * weights1[0]} ")
print(f"[Step 02.02] arr[1] * weights1[1] : {arr[1] * weights1[1]} ")
print(f"[Step 02.03] arr[2] * weights1[2] : {arr[2] * weights1[2]} ")
print(f"[step 02.04] Apply sum to calculated figures : ", calc_result)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
sigmoid_sum = sigmoid(calc_result)
print(f"[Step 03.00] The prediction - gets sigmoid of sum {calc_result} : ", sigmoid_sum)

error = outputs[0] - sigmoid_sum
print(f"[Step 04.00] Error = {outputs[0]} - {sigmoid_sum} : ", error)

# get the derivative 
derivative = sigmoid_derivative(result)
print(f"[Step 05.00] Derivative - sigmoid_derivative(sigmoid({sigmoid_sum})) : ", derivative)

# get the delta 
delta = error * derivative
print(f"[Step 06.00] Delta : (error * derivative) : {error} * {derivative} : ", delta)


[Step 01.00] np.dot(input_layer, weights0) = hidden layer : [0.5, 0.5, 0.5]
[Step 02.01] arr[0] * weights1[0] : [-0.0085] 
[Step 02.02] arr[1] * weights1[1] : [-0.4465] 
[Step 02.03] arr[2] * weights1[2] : [0.074] 
[step 02.04] Apply sum to calculated figures :  -0.381
[Step 03.00] The prediction - gets sigmoid of sum -0.381 :  0.40588573188433286
[Step 04.00] Error = [0] - 0.40588573188433286 :  [-0.40588573]
[Step 05.00] Derivative - sigmoid_derivative(sigmoid(0.40588573188433286)) :  0.24114250453705233
[Step 06.00] Delta : (error * derivative) : [-0.40588573] * 0.24114250453705233 :  [-0.0978763]


## Walkthrough for dataset example 1

In [145]:
# as a reminder we will take the first calculation from the inputs example 0.  
# lets take the results and multiply by the weights 
arr = np.array([sigmoid(0.358), sigmoid(-0.577), sigmoid(-0.469)])

print("[Step 01.00] np.dot(input_layer, weights0) = hidden layer :", arr)

# start output layer sequence, multiply by the weights between hidden layer 
# and the output layer with final result (prediction)
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)
print(f"[Step 02.01] arr[0] * weights1[0] : {arr[0] * weights1[0]} ")
print(f"[Step 02.02] arr[1] * weights1[1] : {arr[1] * weights1[1]} ")
print(f"[Step 02.03] arr[2] * weights1[2] : {arr[2] * weights1[2]} ")
print(f"[step 02.04] Apply sum to calculated figures : ", calc_result)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
sigmoid_sum = sigmoid(calc_result)
print(f"[Step 03.00] The prediction - gets sigmoid of sum {calc_result} : ", sigmoid_sum)

error = outputs[1] - sigmoid_sum
print(f"[Step 04.00] Error = {outputs[1]} - {sigmoid_sum} : ", error)

# get the derivative 
derivative = sigmoid_derivative(result)
print(f"[Step 05.00] Derivative - sigmoid_derivative(sigmoid({sigmoid_sum})) : ", derivative)

# get the delta 
delta = error * derivative
print(f"[Step 06.00] Delta : (error * derivative) : {error} * {derivative} : ", delta)


[Step 01.00] np.dot(input_layer, weights0) = hidden layer : [0.5885562  0.35962319 0.38485296]
[Step 02.01] arr[0] * weights1[0] : [-0.01000546] 
[Step 02.02] arr[1] * weights1[1] : [-0.3211435] 
[Step 02.03] arr[2] * weights1[2] : [0.05695824] 
[step 02.04] Apply sum to calculated figures :  -0.2741907222993588
[Step 03.00] The prediction - gets sigmoid of sum -0.2741907222993588 :  0.43187856951314224
[Step 04.00] Error = [1] - 0.43187856951314224 :  [0.56812143]
[Step 05.00] Derivative - sigmoid_derivative(sigmoid(0.43187856951314224)) :  0.24114250453705233
[Step 06.00] Delta : (error * derivative) : [0.56812143] * 0.24114250453705233 :  [0.13699822]


## Walkthrough for dataset example 2

In [146]:
# as a reminder we will take the first calculation from the inputs example 0.  
# lets take the results and multiply by the weights 
arr = np.array([sigmoid(0.424), sigmoid(-0.740), sigmoid(-0.961)])

print("[Step 01.00] np.dot(input_layer, weights0) = hidden layer :", arr)

# start output layer sequence, multiply by the weights between hidden layer 
# and the output layer with final result (prediction)
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)
print(f"[Step 02.01] arr[0] * weights1[0] : {arr[0] * weights1[0]} ")
print(f"[Step 02.02] arr[1] * weights1[1] : {arr[1] * weights1[1]} ")
print(f"[Step 02.03] arr[2] * weights1[2] : {arr[2] * weights1[2]} ")
print(f"[step 02.04] Apply sum to calculated figures : ", calc_result)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
sigmoid_sum = sigmoid(calc_result)
print(f"[Step 03.00] The prediction - gets sigmoid of sum {calc_result} : ", sigmoid_sum)

error = outputs[2] - sigmoid_sum
print(f"[Step 04.00] Error = {outputs[2]} - {sigmoid_sum} : ", error)

# get the derivative 
derivative = sigmoid_derivative(result)
print(f"[Step 05.00] Derivative - sigmoid_derivative(sigmoid({sigmoid_sum})) : ", derivative)

# get the delta 
delta = error * derivative
print(f"[Step 06.00] Delta : (error * derivative) : {error} * {derivative} : ", delta)



[Step 01.00] np.dot(input_layer, weights0) = hidden layer : [0.60444002 0.32300414 0.27667802]
[Step 02.01] arr[0] * weights1[0] : [-0.01027548] 
[Step 02.02] arr[1] * weights1[1] : [-0.2884427] 
[Step 02.03] arr[2] * weights1[2] : [0.04094835] 
[step 02.04] Apply sum to calculated figures :  -0.257769833286676
[Step 03.00] The prediction - gets sigmoid of sum -0.257769833286676 :  0.4359120113833003
[Step 04.00] Error = [1] - 0.4359120113833003 :  [0.56408799]
[Step 05.00] Derivative - sigmoid_derivative(sigmoid(0.4359120113833003)) :  0.24114250453705233
[Step 06.00] Delta : (error * derivative) : [0.56408799] * 0.24114250453705233 :  [0.13602559]


In [147]:
sigmoid_derivative(0.436)

0.245904

## Walkthrough for dataset example 3

In [148]:
# as a reminder we will take the first calculation from the inputs example 0.  
# lets take the results and multiply by the weights 
arr = np.array([sigmoid(0.066), sigmoid(-1.317), sigmoid(-1.430)])

print("[Step 01.00] np.dot(input_layer, weights0) = hidden layer :", arr)

# start output layer sequence, multiply by the weights between hidden layer 
# and the output layer with final result (prediction)
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)
print(f"[Step 02.01] arr[0] * weights1[0] : {arr[0] * weights1[0]} ")
print(f"[Step 02.02] arr[1] * weights1[1] : {arr[1] * weights1[1]} ")
print(f"[Step 02.03] arr[2] * weights1[2] : {arr[2] * weights1[2]} ")
print(f"[step 02.04] Apply sum to calculated figures : ", calc_result)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
sigmoid_sum = sigmoid(calc_result)
print(f"[Step 03.00] The prediction - gets sigmoid of sum {calc_result} : ", sigmoid_sum)

error = outputs[3] - sigmoid_sum
print(f"[Step 04.00] Error = {outputs[3]} - {sigmoid_sum} : ", error)

# get the derivative 
derivative = sigmoid_derivative(result)
print(f"[Step 05.00] Derivative - sigmoid_derivative(sigmoid({sigmoid_sum})) : ", derivative)

# get the delta 
delta = error * derivative
print(f"[Step 06.00] Delta : (error * derivative) : {error} * {derivative} : ", delta)


[Step 01.00] np.dot(input_layer, weights0) = hidden layer : [0.51649401 0.21131785 0.19309868]
[Step 02.01] arr[0] * weights1[0] : [-0.0087804] 
[Step 02.02] arr[1] * weights1[1] : [-0.18870684] 
[Step 02.03] arr[2] * weights1[2] : [0.02857861] 
[step 02.04] Apply sum to calculated figures :  -0.16890863149828866
[Step 03.00] The prediction - gets sigmoid of sum -0.16890863149828866 :  0.45787295203081535
[Step 04.00] Error = [0] - 0.45787295203081535 :  [-0.45787295]
[Step 05.00] Derivative - sigmoid_derivative(sigmoid(0.45787295203081535)) :  0.24114250453705233
[Step 06.00] Delta : (error * derivative) : [-0.45787295] * 0.24114250453705233 :  [-0.11041263]


# Backpropagation

# Implementation of multi-layer Perceptron with Python & numpy