![Multi layered perceptron](https://upload.wikimedia.org/wikipedia/commons/c/c2/MultiLayerNeuralNetworkBigger_english.png)

# The Multi-Layered Perceptron

- 3.01 - Introduction to Multi-layer networks
- 3.02 - Hidden layer activation
- 3.03 - Multilayer Perceptron Implementation steps
- 3.04 - Error Functions
- 3.05 - Multilayered Perceptron basic Algorithm
- 3.06 - Gradient Descent
- 3.07 - Output Layer Delta
- 3.08 - Delta implementation in Python
- 3.09 - Backpropagation
- 3.10 - Implementation of multi-layer Perceptron with Python & numpy

# 3.01 - Introduction to Multi-layer networks

We can see in the depiction at the top of the workbook the main concepts of a multilayer perceptron. The basic principal is that there is at least 1 hidden layer. Each neuron in the hidden layer should have its own `sum function` & `activation function`. The elements are all connected! which means in the case of the first hidden layer that each neuron is connected to each of the inputs. In the case of subsequent hidden layers each neuron is connected to each neuron of the preceding layer, again each having its own `sum` and `activation` functions.  

Finally our structure will conclude and this is called the output layer. This is fed the results of each of the neurons in the final hidden layer and this result is `the prediction` of our neural network.

**key point$^1$**: a `sum function` is the result of multiplying an input value by the associated weight. In a single layer perceptron that means, a single sum and activation function for the inputs. In a multilayer perceptron where we can have `_n_ layers` that means the sum can be made of the multiplier by the inputs and the weights in the case of the first hidden layer and for subsequent layers it can be the sum of the neuron in that preceding layer multiplied by another weight between hidden layers.   

**key point$^2$**: an `activation function` is a decision fork of evaluating a sum and deciding of that neuron is fired or not. In the single layer perceptron we seen a `step function` type of activation function. A step can have values of `0` or `1`. Another type of activation function is the `sigmoid function`. What is different here is that the result or activation can be between `0` and `1` and not stepped. That ability to touch all points between `0` and `1` means we need to work out exactly where on the line that value belongs. 
  
**key point$^3$**: If we need to return negative values we can use the `hyperbolic tangent function`:
- $y = \frac{e^{x} - e^{-x} }{e^{x} + e^{-x}}$
- evaluating the equation asks to replace the `x` with the value under evaluation and the return will be graded between `-1` & `1`. 

**Theory to remember**: The irrational number `e` is also known as `Euler’s number`. It is approximately 2.718281, and is the base of the natural logarithm, ln (this means that, if $x = \ln y = \log_e y$, then $e^x = y$. In a `sigmoid function` We apply the following equation: $y = \frac{1}{1 + e^{-x}}$ to determine:
- if `x` is high, the value lies closer to, or equal to 1. 
- if `x` is low , the value lies closer to, or equal to 0.  

**let's see what this looks like in python.**

In [254]:
import numpy as np

# definition of a sigmoid function. 
def sigmoid(sum):
    return 1 / (1 + np.exp(-sum))


# sample testing the sigmoid function with a range of values. 
values = [-1, 0, 1, 3, 5, 30.5, -25.5]

for val in values:
    print(f"sigmoid({val})\t is:  {sigmoid(val)}")

sigmoid(-1)	 is:  0.2689414213699951
sigmoid(0)	 is:  0.5
sigmoid(1)	 is:  0.7310585786300049
sigmoid(3)	 is:  0.9525741268224334
sigmoid(5)	 is:  0.9933071490757153
sigmoid(30.5)	 is:  0.9999999999999432
sigmoid(-25.5)	 is:  8.423463754397692e-12


# 3.02 - Hidden Layer activation (using binary xor operator)

![](https://static.javatpoint.com/tutorial/coa/images/logic-gates5.png)

We will use the 'XOR' operator as our case for the multi-layer study. The following truth table used as reference. We will focus on the `feed-forward` process from the input layer to the hidden layer. We can declare the following upfront: 
- We have 2 inputs (x,y) as we have had all along.
- We have 3 neurons in our hidden layer, with individual sum and activation functions, of course. 


**Let's see that in simple python code**

- task 1 - define the inputs of the xor truth table 
- task 2 - define some weights (demonstration purposes here)
- task 3 - perform the calculation loop to find:
    - product of _(input * weight)_
    - sigmoid of product

In [273]:
# To show a table of inputs, weights and products
def table_border():
    print("-" * 70)

# declare the inputs for a xor truth table     
inputs = [(0,0), (0,1), (1,0), (1,1)]

# define some weights. These are random numbers for demo purposes. 
weights0 = [(-0.424, 0.358), (-0.740, -0.577), (-0.961, -0.469)]

# create a table of outcomes 
for idx, i in enumerate(inputs):
    x1,x2 = i
    table_border()
    print(f"x1\tx2\tw1\tw2\t\tProduct\tSigmoid ")
    table_border()
    for w1,w2 in weights0:
        product = (x1 * w1) + (x2 * w2)
        sig = sigmoid(product)
        print(f"{x1}\t{x2}\t{w1}\t{w2}\t\t{product}\t{sig} ")
    print("")

----------------------------------------------------------------------
x1	x2	w1	w2		Product	Sigmoid 
----------------------------------------------------------------------
0	0	-0.424	0.358		0.0	0.5 
0	0	-0.74	-0.577		-0.0	0.5 
0	0	-0.961	-0.469		-0.0	0.5 

----------------------------------------------------------------------
x1	x2	w1	w2		Product	Sigmoid 
----------------------------------------------------------------------
0	1	-0.424	0.358		0.358	0.5885562043858291 
0	1	-0.74	-0.577		-0.577	0.3596231853677901 
0	1	-0.961	-0.469		-0.469	0.38485295749078957 

----------------------------------------------------------------------
x1	x2	w1	w2		Product	Sigmoid 
----------------------------------------------------------------------
1	0	-0.424	0.358		-0.424	0.39555998258063735 
1	0	-0.74	-0.577		-0.74	0.323004143761477 
1	0	-0.961	-0.469		-0.961	0.2766780228949468 

----------------------------------------------------------------------
x1	x2	w1	w2		Product	Sigmoid 
-------------------------

For each of the results, or `products` of the `inputs` * `weights` above, we call the sigmoid function to get the result. In the cell below we can see a summary around the generation of the sigmoid values. 

In [279]:
# example 0 
example0 = sigmoid(0), sigmoid(0), sigmoid(0)

# example1
example1 = sigmoid(0.358), sigmoid(-0.577), sigmoid(-0.469)

# example 2
example2 = sigmoid(0.424), sigmoid(-0.740), sigmoid(-0.961)

# example 3
example3 = sigmoid(0.066), sigmoid(-1.317), sigmoid(-1.430)

print("Sigmoid values of inputs * weights or 'products'")
print(f"example0 : {example0}")
print(f"example1 : {example1}")
print(f"example2 : {example2}")
print(f"example3 : {example3}")

Sigmoid values of inputs * weights or 'products'
example0 : (0.5, 0.5, 0.5)
example1 : (0.5885562043858291, 0.3596231853677901, 0.38485295749078957)
example2 : (0.6044400174193626, 0.323004143761477, 0.2766780228949468)
example3 : (0.5164940131078767, 0.21131784831127748, 0.19309868423321644)


# 3.03 - Multilayer Perceptron Implementation steps 

We have covered the theory of passing from the inputs to the hidden layer, the calculation of the inputs and weights to generate a sum and the sigmoid to get an activation value. We can now move to present that in slightly more robust python code and utilizing the `numpy` library because it's much more performant, an industry standard and well, pretty awesome too. Here are the tasks.

1. create the inputs 
2. create the outputs 
3. create the np.array(weights_for_each_input)
4. create the np.array(weights_for_each_hidden_layer_to_output)
5. create the epochs threshold. 
6. create the sum_synapse0 as np.dot(inputs, weights_for_each_input)
7. create the hidden_layer results as sigmoid(sum_synapse0) _see Euler's number_ 
8. apply the sum function to each of the hidden layer results (sigmoid) and activation application as sum_synapse1 = np.dot(hidden_layer, weights1)
9. define the error_function

In [280]:
# create the inputs data 
inputs = np.array([[0,0], [0,1], [1,0], [1,1]])
inputs.shape

(4, 2)

In [281]:
outputs = np.array([[0], [1], [1], [0]])
outputs.shape

(4, 1)

In [282]:
# weights0 are inputX weights, inputY weghts respectively 
# these are the weights that are used in the sum function to 
# geberate a product value. 
weights0 = np.array([[-0.424, -0.740, -0.961], 
                     [0.358, -0.577, -0.469]])


# weights1 are hardcoded in the class lecture of this example. 
# These are the weights we see used between the hidden layer
# and the output layer at the end of our opoeration. The figures
# are for demonstration purposes so don't get hung up on what 
# these particular numbers mean. They're just demo weights. 
weights1 = np.array([[-0.017], 
                     [-0.893], 
                     [0.148]])

We set the epochs limit to control the amount of times we'll allow the algorithm to run. Epoch thresholds are used to prevent infinite loops in search of a perfect prediction.

In [283]:
# set the epochs limit.
epochs = 100

We can start the sum of the communication between the input layer and the hidden layer. This is basically a matrix multiplication exercise. 

In the proof of concept above in the intro section we are doing this with `for loops` and gather in the values of `x1`, `x2`, `w1` & `w2` which creates a results of: _for each input_layer * each weights0_. Here we use the numpy method `np.dot()` as it is far more highly optimised than using a for loop.

In [284]:
input_layer = inputs

# starts the sum of the communication between the input
# layer and the hidden layer. This is basically a matrix
# multiplication exercise. Creating the results of: 
#     "for each input_layer * each weights0"  
# Important note: using the np.dot is more highly 
# optimised than using a for loop.

# sum_synapse0 is what we called product above
# the name sum_synapse0 is more descriptive 
# because it is the sum of the synapse at level 0
# which is the first layer, or the input to hidden
# layer. 
sum_synapse0 = np.dot(input_layer, weights0)
sum_synapse0  

array([[ 0.   ,  0.   ,  0.   ],
       [ 0.358, -0.577, -0.469],
       [-0.424, -0.74 , -0.961],
       [-0.066, -1.317, -1.43 ]])

In [285]:
# calculates the hidden layer values, these are the values
# returned from the sigmoid of the sum_synapse0

hidden_layer = sigmoid(sum_synapse0)
hidden_layer

array([[0.5       , 0.5       , 0.5       ],
       [0.5885562 , 0.35962319, 0.38485296],
       [0.39555998, 0.32300414, 0.27667802],
       [0.48350599, 0.21131785, 0.19309868]])

Now we have the results of a the sigmoid from applying the inputs and weights we apply the sum funtion and activation function again to achieve the final outputs. Let's work through a manual calculation process to increase our comprehension before adding more complexity to our python implementation.

array([[-0.424, -0.74 , -0.961],
       [ 0.358, -0.577, -0.469]])

#### Manual calculation of output results

In [292]:
# for example 0 in our dataset 
# inputs are (0,0) 
# sum_synapse0 = inputs * weights0 = (0, 0, 0)
# hidden_layer = sigmoid(sum_synapse0) = (0.5, 0.5, 0.5)

arr = np.array([0.5, 0.5, 0.5])
neuron_calculations = float(arr.dot(weights1))

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
prediction = sigmoid(neuron_calculations)

neuron_calculations, prediction

# in our case it's 0.40588573188433286

(-0.381, 0.40588573188433286)

In [293]:
# for example 1 in our dataset 
# inputs are (0,1) 
# sum_synapse0 = inputs * weights0 = (0.358, -0.577, -0.469)
# hidden_layer = sigmoid(sum_synapse0) = (0.5885562 , 0.35962319, 0.38485296)


arr = np.array([0.5885562 , 0.35962319, 0.38485296])
neuron_calculations = float(arr.dot(weights1))

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,1) from the dataset
prediction = sigmoid(neuron_calculations)

neuron_calculations, prediction

(-0.27419072598999994, 0.43187856860760854)

In [294]:
# for example 2 in our dataset 
# inputs are (1,0) 
# sum_synapse0 = inputs * weights0 = (-0.424, -0.74 , -0.96)
# hidden_layer = sigmoid(sum_synapse0) = (0.39555998, 0.32300414, 0.27667802)

arr = np.array([0.39555998, 0.32300414, 0.27667802])
neuron_calculations = float(arr.dot(weights1))

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (1,0) from the dataset
prediction = sigmoid(neuron_calculations)

neuron_calculations, prediction

(-0.25421886972, 0.43678536534288)

In [295]:
# for example 3 in our dataset 
# inputs are (1,1) 
# sum_synapse0 = inputs * weights0 = (-0.066, -1.317, -1.43)
# hidden_layer = sigmoid(sum_synapse0) = (0.48350599, 0.21131785, 0.19309868)

arr = np.array([0.48350599, 0.21131785, 0.19309868])
neuron_calculations = float(arr.dot(weights1))

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (1,1) from the dataset
prediction = sigmoid(neuron_calculations)

neuron_calculations, prediction

(-0.16834783724, 0.4580121586455045)

In [None]:
#### Create the sum_synapse1

In [76]:
# create the sum_synapse1 values. 
# These are the values that are generated
# from the hidden layer to the output
# layer and are considered as the final 
# results of the neural network for 
# each of the items in our dataset. 
sum_synapse1 = np.dot(hidden_layer, weights1)
sum_synapse1

array([[-0.381     ],
       [-0.27419072],
       [-0.25421887],
       [-0.16834784]])

In [89]:
# We create the output layer or as it's 
# often called the prediction of the 
# neural network.
output_layer  = sigmoid(sum_synapse1)
output_layer

array([[0.40588573],
       [0.43187857],
       [0.43678536],
       [0.45801216]])

# 3.04 - Error Functions (_loss function, cost function_)

Calculating the error by comparing the results of the predictions with the outputs of the dataset. The error function is often referred to as the loss function in ML terminology. the simplest formula is `error = correct - prediction`

|x1|x2|Class|Prediction|Error|
|--|--|-----|----------|-----|
|0|0|0|0.405|-0.405|
|0|1|1|0.431|0.569|
|1|0|1|0.436|0.564|
|1|1|0|0.458|-0.458|

In [102]:
# calculate the average error by taking the absolute values by instances 
res = (0.405 + 0.569 + 0.564 + 0.458) / 4
round(res, 3)

0.499

In [103]:
# grab the outputs defined earlier (outputs are the correct results)
outputs

array([[0],
       [1],
       [1],
       [0]])

In [104]:
# get out output layer (predictions)
output_layer

array([[0.40588573],
       [0.43187857],
       [0.43678536],
       [0.45801216]])

In [105]:
# create the errors (error = correct - preditions)
error_output_layer = outputs - output_layer
error_output_layer

array([[-0.40588573],
       [ 0.56812143],
       [ 0.56321464],
       [-0.45801216]])

In [109]:
# get the average error. As a reminder we need to use 
# the absolute vaues of the errors. If this is overlooked
# we will skew the results. 
error_avg = np.mean(abs(error_output_layer))
print(f"Error average: {error_avg}")
print(f"Error average (rounded) : {round(error_avg, 3)}")

Error average: 0.49880848923713045
Error average (rounded) : 0.499


# 3.05 - Multilayered Perceptron Basic Algorithm

![](https://drive.google.com/uc?export=view&id=1JqnDqu0T9k9-g87C_6dEHLoWItFK8D7J)

1. Cost function (loss function)
2. Gradient descent
3. Derivative 
4. Delta
5. Backpropagation

# 3.06 - Gradient Descent


![](https://cdn-images-1.medium.com/max/600/1*iNPHcCxIvcm7RwkRaMTx1g.jpeg)


The idea of gradient descent is to manage out cost function (loss function, error function) to get to the smallest possible error in the adjustment of the weights. The directional control of how a weight set should be adjusted is done by calculating the partial derivative as a means of determining the direction of a gradient. 

If you imagine a x,y axis graph with a curve, the x-axis is the weight and the y-axi is the error value, we are trying to achieve the lowest point of the curve, which may never be zero by the way, in a multi-dip curve we may have a local minimum and a global minimum across the span of measurements (number of epochs). So the purpose is to calculate the slope of a curve based on the partial derivatives.

![](https://www.researchgate.net/profile/Yong_Ma15/publication/267820876/figure/fig1/AS:669428953923612@1536615708709/Schematic-of-the-local-minima-problem-in-FWI-The-data-misfit-has-spurious-local-minima.png)

- reminder of the sigmoid function: $y = \frac{1}{1 + e^{-x}}$

- calculating the partial derivative: $d = y \cdot (1 -y)$

#### hypothetical example 

Assuming that `y` = 0.1 

- calculating the partial derivative: $d = 0.1 \cdot (1 -0.1)$


In [111]:
 def sigmoid_derivative(sigmoid):
        return sigmoid * (1 - sigmoid)

In [113]:
s = sigmoid(0.5)
s

0.6224593312018546

In [114]:
d = sigmoid_derivative(s)
d

0.2350037122015945

# 3.07 - Output layer Delta

The order of sequence is: 

- activation function (sigmoid) $y = \frac{1}{1 + e^{-x}}$

- Derivative $d = y \cdot (1 -y)$

- Delta $delta _{output} = error \cdot sigmoid _{derivative}$
- Gradient

## 3.10.01 - Walkthrough for dataset example 0

In [144]:
# as a reminder we will take the first calculation from the inputs example 0.  
# lets take the results and multiply by the weights 
arr = [0.5, 0.5, 0.5]
print("[Step 01.00] np.dot(input_layer, weights0) = hidden layer :", arr)

# start output layer sequence, multiply by the weights between hidden layer 
# and the output layer with final result (prediction)
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)
print(f"[Step 02.01] arr[0] * weights1[0] : {arr[0] * weights1[0]} ")
print(f"[Step 02.02] arr[1] * weights1[1] : {arr[1] * weights1[1]} ")
print(f"[Step 02.03] arr[2] * weights1[2] : {arr[2] * weights1[2]} ")
print(f"[step 02.04] Apply sum to calculated figures : ", calc_result)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
sigmoid_sum = sigmoid(calc_result)
print(f"[Step 03.00] The prediction - gets sigmoid of sum {calc_result} : ", sigmoid_sum)

error = outputs[0] - sigmoid_sum
print(f"[Step 04.00] Error = {outputs[0]} - {sigmoid_sum} : ", error)

# get the derivative 
derivative = sigmoid_derivative(result)
print(f"[Step 05.00] Derivative - sigmoid_derivative(sigmoid({sigmoid_sum})) : ", derivative)

# get the delta 
delta = error * derivative
print(f"[Step 06.00] Delta : (error * derivative) : {error} * {derivative} : ", delta)


[Step 01.00] np.dot(input_layer, weights0) = hidden layer : [0.5, 0.5, 0.5]
[Step 02.01] arr[0] * weights1[0] : [-0.0085] 
[Step 02.02] arr[1] * weights1[1] : [-0.4465] 
[Step 02.03] arr[2] * weights1[2] : [0.074] 
[step 02.04] Apply sum to calculated figures :  -0.381
[Step 03.00] The prediction - gets sigmoid of sum -0.381 :  0.40588573188433286
[Step 04.00] Error = [0] - 0.40588573188433286 :  [-0.40588573]
[Step 05.00] Derivative - sigmoid_derivative(sigmoid(0.40588573188433286)) :  0.24114250453705233
[Step 06.00] Delta : (error * derivative) : [-0.40588573] * 0.24114250453705233 :  [-0.0978763]


## 3.10.02 - Walkthrough for dataset example 1

In [145]:
# as a reminder we will take the first calculation from the inputs example 0.  
# lets take the results and multiply by the weights 
arr = np.array([sigmoid(0.358), sigmoid(-0.577), sigmoid(-0.469)])

print("[Step 01.00] np.dot(input_layer, weights0) = hidden layer :", arr)

# start output layer sequence, multiply by the weights between hidden layer 
# and the output layer with final result (prediction)
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)
print(f"[Step 02.01] arr[0] * weights1[0] : {arr[0] * weights1[0]} ")
print(f"[Step 02.02] arr[1] * weights1[1] : {arr[1] * weights1[1]} ")
print(f"[Step 02.03] arr[2] * weights1[2] : {arr[2] * weights1[2]} ")
print(f"[step 02.04] Apply sum to calculated figures : ", calc_result)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
sigmoid_sum = sigmoid(calc_result)
print(f"[Step 03.00] The prediction - gets sigmoid of sum {calc_result} : ", sigmoid_sum)

error = outputs[1] - sigmoid_sum
print(f"[Step 04.00] Error = {outputs[1]} - {sigmoid_sum} : ", error)

# get the derivative 
derivative = sigmoid_derivative(result)
print(f"[Step 05.00] Derivative - sigmoid_derivative(sigmoid({sigmoid_sum})) : ", derivative)

# get the delta 
delta = error * derivative
print(f"[Step 06.00] Delta : (error * derivative) : {error} * {derivative} : ", delta)


[Step 01.00] np.dot(input_layer, weights0) = hidden layer : [0.5885562  0.35962319 0.38485296]
[Step 02.01] arr[0] * weights1[0] : [-0.01000546] 
[Step 02.02] arr[1] * weights1[1] : [-0.3211435] 
[Step 02.03] arr[2] * weights1[2] : [0.05695824] 
[step 02.04] Apply sum to calculated figures :  -0.2741907222993588
[Step 03.00] The prediction - gets sigmoid of sum -0.2741907222993588 :  0.43187856951314224
[Step 04.00] Error = [1] - 0.43187856951314224 :  [0.56812143]
[Step 05.00] Derivative - sigmoid_derivative(sigmoid(0.43187856951314224)) :  0.24114250453705233
[Step 06.00] Delta : (error * derivative) : [0.56812143] * 0.24114250453705233 :  [0.13699822]


## 3.10.03 - Walkthrough for dataset example 2

In [146]:
# as a reminder we will take the first calculation from the inputs example 0.  
# lets take the results and multiply by the weights 
arr = np.array([sigmoid(0.424), sigmoid(-0.740), sigmoid(-0.961)])

print("[Step 01.00] np.dot(input_layer, weights0) = hidden layer :", arr)

# start output layer sequence, multiply by the weights between hidden layer 
# and the output layer with final result (prediction)
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)
print(f"[Step 02.01] arr[0] * weights1[0] : {arr[0] * weights1[0]} ")
print(f"[Step 02.02] arr[1] * weights1[1] : {arr[1] * weights1[1]} ")
print(f"[Step 02.03] arr[2] * weights1[2] : {arr[2] * weights1[2]} ")
print(f"[step 02.04] Apply sum to calculated figures : ", calc_result)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
sigmoid_sum = sigmoid(calc_result)
print(f"[Step 03.00] The prediction - gets sigmoid of sum {calc_result} : ", sigmoid_sum)

error = outputs[2] - sigmoid_sum
print(f"[Step 04.00] Error = {outputs[2]} - {sigmoid_sum} : ", error)

# get the derivative 
derivative = sigmoid_derivative(result)
print(f"[Step 05.00] Derivative - sigmoid_derivative(sigmoid({sigmoid_sum})) : ", derivative)

# get the delta 
delta = error * derivative
print(f"[Step 06.00] Delta : (error * derivative) : {error} * {derivative} : ", delta)



[Step 01.00] np.dot(input_layer, weights0) = hidden layer : [0.60444002 0.32300414 0.27667802]
[Step 02.01] arr[0] * weights1[0] : [-0.01027548] 
[Step 02.02] arr[1] * weights1[1] : [-0.2884427] 
[Step 02.03] arr[2] * weights1[2] : [0.04094835] 
[step 02.04] Apply sum to calculated figures :  -0.257769833286676
[Step 03.00] The prediction - gets sigmoid of sum -0.257769833286676 :  0.4359120113833003
[Step 04.00] Error = [1] - 0.4359120113833003 :  [0.56408799]
[Step 05.00] Derivative - sigmoid_derivative(sigmoid(0.4359120113833003)) :  0.24114250453705233
[Step 06.00] Delta : (error * derivative) : [0.56408799] * 0.24114250453705233 :  [0.13602559]


In [147]:
sigmoid_derivative(0.436)

0.245904

## 3.10.04 - Walkthrough for dataset example 3

In [148]:
# as a reminder we will take the first calculation from the inputs example 0.  
# lets take the results and multiply by the weights 
arr = np.array([sigmoid(0.066), sigmoid(-1.317), sigmoid(-1.430)])

print("[Step 01.00] np.dot(input_layer, weights0) = hidden layer :", arr)

# start output layer sequence, multiply by the weights between hidden layer 
# and the output layer with final result (prediction)
calc_result = arr[0] * (-0.017) + arr[1] * (-0.893) + arr[2] * (0.148)
print(f"[Step 02.01] arr[0] * weights1[0] : {arr[0] * weights1[0]} ")
print(f"[Step 02.02] arr[1] * weights1[1] : {arr[1] * weights1[1]} ")
print(f"[Step 02.03] arr[2] * weights1[2] : {arr[2] * weights1[2]} ")
print(f"[step 02.04] Apply sum to calculated figures : ", calc_result)

# get the sigmoid. This produces the final result of 
# the neural network, or the prediction for example (0,0) from the dataset
sigmoid_sum = sigmoid(calc_result)
print(f"[Step 03.00] The prediction - gets sigmoid of sum {calc_result} : ", sigmoid_sum)

error = outputs[3] - sigmoid_sum
print(f"[Step 04.00] Error = {outputs[3]} - {sigmoid_sum} : ", error)

# get the derivative 
derivative = sigmoid_derivative(result)
print(f"[Step 05.00] Derivative - sigmoid_derivative(sigmoid({sigmoid_sum})) : ", derivative)

# get the delta 
delta = error * derivative
print(f"[Step 06.00] Delta : (error * derivative) : {error} * {derivative} : ", delta)


[Step 01.00] np.dot(input_layer, weights0) = hidden layer : [0.51649401 0.21131785 0.19309868]
[Step 02.01] arr[0] * weights1[0] : [-0.0087804] 
[Step 02.02] arr[1] * weights1[1] : [-0.18870684] 
[Step 02.03] arr[2] * weights1[2] : [0.02857861] 
[step 02.04] Apply sum to calculated figures :  -0.16890863149828866
[Step 03.00] The prediction - gets sigmoid of sum -0.16890863149828866 :  0.45787295203081535
[Step 04.00] Error = [0] - 0.45787295203081535 :  [-0.45787295]
[Step 05.00] Derivative - sigmoid_derivative(sigmoid(0.45787295203081535)) :  0.24114250453705233
[Step 06.00] Delta : (error * derivative) : [-0.45787295] * 0.24114250453705233 :  [-0.11041263]


# 3.08 - Delta implementation in Python

As a reminder, the delta is used to determine the direction of the gradient in order to update weights. The process takes each instance of a case's dataset and returns a delta value:

- To calculate the `derivative_output` we need to get the result of the `sigmoid_derivative(output_layer)`
- to calculate the delta output we need to take the `error_output_layer` and multiply by the `derivative_output`

In [163]:
output_layer

array([[0.40588573],
       [0.43187857],
       [0.43678536],
       [0.45801216]])

In [165]:
derivative_output = sigmoid_derivative(output_layer)
derivative_output

array([[0.2411425 ],
       [0.24535947],
       [0.24600391],
       [0.24823702]])

In [166]:
error_output_layer

array([[-0.40588573],
       [ 0.56812143],
       [ 0.56321464],
       [-0.45801216]])

In [167]:
delta_output = error_output_layer * derivative_output
delta_output

array([[-0.0978763 ],
       [ 0.13939397],
       [ 0.138553  ],
       [-0.11369557]])

$delta _{hidden} = sigmoid _{derivavtive} \cdot weight \cdot delta _{output} $

In [206]:
hidden_layer

array([[0.5       , 0.5       , 0.5       ],
       [0.5885562 , 0.35962319, 0.38485296],
       [0.39555998, 0.32300414, 0.27667802],
       [0.48350599, 0.21131785, 0.19309868]])

In [242]:
for i in range(len(hidden_layer)):
    arr = np.array(hidden_layer[i])
    print("")
    for j in range(len(arr)):  
        synapse = round(hidden_layer[i][j], 3)
        sd_val = round(sigmoid_derivative(hidden_layer[i][j]),3)
        weight = float(weights1[j])
        do = round(float(delta_output[i]), 3)
        print(f"[{i}][{j}] : activation: {synapse} : derivative: {sd_val} * lb: {weight} * delta : {do} = {round(float(sd_val * weight * do),5)}")
        


[0][0] : activation: 0.5 : derivative: 0.25 * lb: -0.017 * delta : -0.098 = 0.00042
[0][1] : activation: 0.5 : derivative: 0.25 * lb: -0.893 * delta : -0.098 = 0.02188
[0][2] : activation: 0.5 : derivative: 0.25 * lb: 0.148 * delta : -0.098 = -0.00363

[1][0] : activation: 0.589 : derivative: 0.242 * lb: -0.017 * delta : 0.139 = -0.00057
[1][1] : activation: 0.36 : derivative: 0.23 * lb: -0.893 * delta : 0.139 = -0.02855
[1][2] : activation: 0.385 : derivative: 0.237 * lb: 0.148 * delta : 0.139 = 0.00488

[2][0] : activation: 0.396 : derivative: 0.239 * lb: -0.017 * delta : 0.139 = -0.00056
[2][1] : activation: 0.323 : derivative: 0.219 * lb: -0.893 * delta : 0.139 = -0.02718
[2][2] : activation: 0.277 : derivative: 0.2 * lb: 0.148 * delta : 0.139 = 0.00411

[3][0] : activation: 0.484 : derivative: 0.25 * lb: -0.017 * delta : -0.114 = 0.00048
[3][1] : activation: 0.211 : derivative: 0.167 * lb: -0.893 * delta : -0.114 = 0.017
[3][2] : activation: 0.193 : derivative: 0.156 * lb: 0.148 

#### Calculate the delta_output by weights1 

In [249]:
# delta_output_weight_multiplier = delta_output.dot(weights1)
weights1

array([[-0.017],
       [-0.893],
       [ 0.148]])

In [247]:
weights1T = weights1.T
weights1T

array([[-0.017, -0.893,  0.148]])

In [253]:
# check shapes 
weights1.shape, weights1T.shape, delta_output.shape

((3, 1), (1, 3), (4, 1))

In [252]:
delta_output_weight_multiplier = delta_output.dot(weights1T)
delta_output_weight_multiplier

array([[ 0.0016639 ,  0.08740354, -0.01448569],
       [-0.0023697 , -0.12447882,  0.02063031],
       [-0.0023554 , -0.12372783,  0.02050584],
       [ 0.00193282,  0.10153015, -0.01682694]])

# 3.09 - Backpropagation

# 3.10 - Implementation of multi-layer Perceptron with Python & numpy