![Multi layered perceptron](https://upload.wikimedia.org/wikipedia/commons/c/c2/MultiLayerNeuralNetworkBigger_english.png)

# The Multi-Layered Perceptron

#### Part 1
- 3.01 - Introduction to Multi-layer networks
- 3.02 - Hidden layer activation
- 3.03 - Multilayer Perceptron Implementation steps
- 3.04 - Manual calculation and process checkpoint summary

#### Part 2
- 3.05 - Multilayered Perceptron basic Algorithm
    - 3.05.01 - Error Functions (cost function, loss function)
    - 3.05.02 - Gradient Descent
    - 3.05.03 - Output Layer Delta
    - 3.05.04 - Delta implementation in Python
    - 3.05.05 - Backpropagation
- 3.06 - Implementation of multi-layer Perceptron with Python & numpy

# Part 1

# 3.01 - Introduction to Multi-layer networks

In the depiction at the top of the workbook we can see the main concepts of a multilayer perceptron. The most basic principles are: 

- **There is at least 1 hidden layer**. 
- Each neuron in the hidden layer should have its own `sum function` & `activation function`. 
- The elements are all connected! Which means in the case of the first hidden layer that each neuron is connected to each of the inputs. In the case of subsequent hidden layers each neuron is connected to each neuron of the preceding layer, again each having its own `sum` and `activation` functions.  
- Our structure will conclude with an output layer. This is fed the results of each of the neurons in the final hidden layer and **this result is the prediction** of our neural network.

#### Q: What is a `sum function`?
**A:** The `sum function` is the result of multiplying an input value by the associated weight. In a single layer perceptron that means, a single sum and activation function for the inputs. In a multilayer perceptron where we can have `_n_ layers` that means the sum can be made of the multiplier by the inputs and the weights in the case of the first hidden layer and for subsequent layers it can be the sum of the neuron in that preceding layer multiplied by another weight between hidden layers.   
#### Q: What is an `activation function`?
**A:** The `activation function` is the decision fork of evaluating a sum and deciding if that neuron is fired or not. In the single layer perceptron we seen a `step function` type of activation function. A step can have values of `0` or `1`. Another type of activation function is the `sigmoid function`. What is different here is that the result or activation can be between `0` and `1` and not stepped. That ability to touch all points between `0` and `1` means we need to work out exactly where on the line that value belongs. 
  
#### Q: What if I need to return negative values?  
**A:** If we need to return negative values we can use the `hyperbolic tangent function` which looks like this: $y = \frac{e^{x} - e^{-x} }{e^{x} + e^{-x}}$ evaluating the equation asks to replace the `x` with the value under evaluation and the return will be graded between `-1` & `1`. 

#### Q: Anything else worth remembering? 
**A:** The irrational number `e` is also known as `Euler’s number`. It is approximately 2.718281, and is the base of the natural logarithm, ln (this means that, if $x = l_n y = \log_e y$, then $e^x = y$. In a `sigmoid function` We apply the following equation: $y = \frac{1}{1 + e^{-x}}$ to determine:
- if `x` is high, the value lies closer to, or equal to 1. 
- if `x` is low , the value lies closer to, or equal to 0.  

**Let's see what this looks like in python.**

In [1]:
# hanlde the imports 
import numpy as np

In [2]:
# definition of a sigmoid function. 
def sigmoid(sum):
    return 1 / (1 + np.exp(-sum))

In [3]:
# sample testing the sigmoid function with a range
# of values for demonstration purposes. 
values = [-1, 0, 1, 3, 5, 30.5, -25.5]

In [4]:
# Show a table out test values and the 
# generated output, also demonstrate the 
# rounding to display how a step function
# would be interpreted. 
print("Test\t\tSigmoid\t Rounded_sigmoid")
print("-" * 40)

# loop over the vals 
for val in values:
    print(f"sigmoid({val})\t{round(sigmoid(val),5)}\t {round(sigmoid(val))}")

Test		Sigmoid	 Rounded_sigmoid
----------------------------------------
sigmoid(-1)	0.26894	 0
sigmoid(0)	0.5	 0
sigmoid(1)	0.73106	 1
sigmoid(3)	0.95257	 1
sigmoid(5)	0.99331	 1
sigmoid(30.5)	1.0	 1
sigmoid(-25.5)	0.0	 0


# 3.02 - Hidden Layer activation (using binary `xor` operator)

![](https://static.javatpoint.com/tutorial/coa/images/logic-gates5.png)

We will use the 'XOR' operator as our case for the multi-layer study. The following truth table used as reference. We will focus on the `feed-forward` process from the input layer to the hidden layer. We can declare the following upfront: 
- We have 2 inputs (x,y) as we have had all along.
- We have 3 neurons in our hidden layer, with individual sum and activation functions, of course. 


**Let's see that in simple python code**

- task 1 - define the inputs of the `xor` truth table 
- task 2 - define some weights (demonstration purposes here)
- task 3 - perform the calculation loop to find:
    - product of _(input * weight)_
    - sigmoid of product

In [5]:
# To show a table of inputs, weights and products
def table_border():
    print("-" * 100)

# declare the inputs for a xor truth table     
inputs = [(0,0), (0,1), (1,0), (1,1)]

# define some weights. These are random numbers for demo purposes. 
weights0 = [(-0.424, 0.358), (-0.740, -0.577), (-0.961, -0.469)]

# create a table of outcomes 
for idx, i in enumerate(inputs):
    x1,x2 = i
    table_border()
    print(f"Processing Instance: {idx} - Inputs are: ({x1},{x2})")
    table_border()
    print(f"x1\tw1\tx2\tw2\tmultiplier\t\t\tProduct\tSigmoid of Product ")
    table_border()
    for w1,w2 in weights0:
        product = (x1 * w1) + (x2 * w2)
        sig = sigmoid(product)
        print(f"{x1}\t{w1}\t{x2}\t{w2}\t({x1} * {w1}) + ({x2} * {w2})\t{product}\t{sig} ")
    print("")

----------------------------------------------------------------------------------------------------
Processing Instance: 0 - Inputs are: (0,0)
----------------------------------------------------------------------------------------------------
x1	w1	x2	w2	multiplier			Product	Sigmoid of Product 
----------------------------------------------------------------------------------------------------
0	-0.424	0	0.358	(0 * -0.424) + (0 * 0.358)	0.0	0.5 
0	-0.74	0	-0.577	(0 * -0.74) + (0 * -0.577)	-0.0	0.5 
0	-0.961	0	-0.469	(0 * -0.961) + (0 * -0.469)	-0.0	0.5 

----------------------------------------------------------------------------------------------------
Processing Instance: 1 - Inputs are: (0,1)
----------------------------------------------------------------------------------------------------
x1	w1	x2	w2	multiplier			Product	Sigmoid of Product 
----------------------------------------------------------------------------------------------------
0	-0.424	1	0.358	(0 * -0.424) + (1 * 0

For each of the results, or `products` of the `inputs` * `weights` above, we call the sigmoid function to get the result. 

# 3.03 - Multilayer Perceptron Implementation steps 

We have covered the theory of passing from the inputs to the hidden layer, the calculation of the inputs and weights to generate a sum and the sigmoid to get an activation value. We can now move to present that in slightly more robust python code and utilizing the `numpy` library because it's much more performant, an industry standard and well, pretty awesome too. Here are the tasks.

1. create the inputs 
2. create the outputs 
3. create the np.array(weights_for_each_input)
4. create the np.array(weights_for_each_hidden_layer_to_output)
5. create the epochs threshold. 
6. create the sum_synapse0 as np.dot(inputs, weights_for_each_input)
7. create the hidden_layer results as sigmoid(sum_synapse0) _see Euler's number_ 
8. apply the sum function to each of the hidden layer results (sigmoid) and activation application as sum_synapse1 = np.dot(hidden_layer, weights1)
9. define the error_function

In [6]:
# create the inputs data 
inputs = np.array([[0,0], [0,1], [1,0], [1,1]])
inputs.shape

(4, 2)

In [7]:
outputs = np.array([[0], [1], [1], [0]])
outputs.shape

(4, 1)

In [8]:
# weights0 
# These are the input-X weights & input-Y weights respectively 
# they are the weights used in the sum function to generate a 
# product value. 
weights0 = np.array([[-0.424, -0.740, -0.961], 
                     [0.358, -0.577, -0.469]])


# weights1 
# these are hardcoded in the class lecture of this example. 
# They are the weights we see used between the hidden layer
# and the output layer at the end of our operation. The 
# figures used are for demonstration purposes so don't get 
# hung up on what these particular numbers mean. They are 
# just demo weights. 
weights1 = np.array([[-0.017], 
                     [-0.893], 
                     [0.148]])

### Q: Why do we set an epochs limit?
We set the epochs limit to control the amount of times we'll allow the algorithm to run. Epoch thresholds are used to prevent infinite loops in search of a perfect prediction because in a lot of cases in machine learning we will not achieve a 100% perfect algorithm. I guess we can take some clues from the fact we use ML to make `predictions` of outcomes rather than determine factual outcomes.

In [9]:
# set the epochs limit.
epochs = 100

We can start the sum of the communication between the input layer and the hidden layer. This is basically a matrix multiplication exercise. 

In the proof of concept above in the intro section we are doing this with `for loops` and gather in the values of `x1`, `x2`, `w1` & `w2` which creates a results of: _for each input_layer * each weights0 value_. Here we use the numpy method `np.dot()` as it is far more highly optimised than using a for loop.

In [10]:
input_layer = inputs

# starts the sum of the communication between the input layer and the 
# hidden layer. This is basically a matrix multiplication exercise. 
# Creating the results of: 
#     "for each input_layer * each weights0"  
# Note: using the np.dot is more optimised than using a for loop.

# sum_synapse0 is what we called product above the name sum_synapse0 
# is more descriptive because it is the sum of the synapse at level 0
# which is the first layer, or the input to hidden layer. 
sum_synapse0 = np.dot(input_layer, weights0)
sum_synapse0  

array([[ 0.   ,  0.   ,  0.   ],
       [ 0.358, -0.577, -0.469],
       [-0.424, -0.74 , -0.961],
       [-0.066, -1.317, -1.43 ]])

In [11]:
# calculates the hidden layer values, these are the values
# returned from the sigmoid of the sum_synapse0

hidden_layer = sigmoid(sum_synapse0)
hidden_layer

array([[0.5       , 0.5       , 0.5       ],
       [0.5885562 , 0.35962319, 0.38485296],
       [0.39555998, 0.32300414, 0.27667802],
       [0.48350599, 0.21131785, 0.19309868]])

In [12]:
# create the sum_synapse1 values. These are the values that 
# are generated from the hidden layer to the output layer 
# and are considered as the final results of the neural 
# network for each of the items in our dataset. 
sum_synapse1 = np.dot(hidden_layer, weights1)
sum_synapse1

array([[-0.381     ],
       [-0.27419072],
       [-0.25421887],
       [-0.16834784]])

In [13]:
# We create the output layer or as it's often called the prediction of the
# neural network by applying the sigmoid to the sum_synapse1 value. 
output_layer  = sigmoid(sum_synapse1)

print("Neural net Prediction\tRounded")
print("-" * 40)
for i in range(len(output_layer)):
    print(f"{float(output_layer[i])}\t{round(float(output_layer[i]))}")


Neural net Prediction	Rounded
----------------------------------------
0.40588573188433286	0
0.43187856951314224	0
0.43678536461116163	0
0.4580121591884929	0


![](https://drive.google.com/uc?export=view&id=1wfrCqtzef-qBymeaeIApeI_9VlxlbV0E)

Now we have the results of the `sigmoid` from applying the inputs and weights we apply the `sum function` and `activation function` again to achieve the `sum_synapse1` or outputs and the `sigmoid(output)` will give us the neural network's prediction. Let's work through a manual calculation process to increase our comprehension before adding more complexity to our python implementation.



<h1><center><i>We made a prediction!</i></center></h1>

#### Manual calculation of output results

In [62]:
hidden_layer

array([[0.5       , 0.5       , 0.5       ],
       [0.5885562 , 0.35962319, 0.38485296],
       [0.39555998, 0.32300414, 0.27667802],
       [0.48350599, 0.21131785, 0.19309868]])

In [81]:
round(float(weights1[0]), 8)

-0.00711903

In [96]:
# functionised the descriptor of the process up until this point.
# func accepts the index of the instance
def processing_description(i_idx):

    print(f"[Step Definition] Inputs Passing")
    input1 = inputs[i_idx][0]
    input2 = inputs[i_idx][1]
    print(f"    [001.001] Input1: {input1}")
    print(f"    [001.002] Input2: {input2}")

    print(f"\n[Step Definition] weights0 application")
    for n_idx in range(3):
        print(f"    [002.00{n_idx}] sum function for neuron{n_idx}: ({input1} * {weights0[:,n_idx][0]}) + ({input2} * {weights0[:,n_idx][1]}) = {sum_synapse0[i_idx][n_idx]}")

    print(f"\n[Step Definition] Apply sigmoid to get activation function values")
    for n_idx in range(3):
        print(f"    [003.00{n_idx}] Activation value neuron{n_idx}: sigmoid({sum_synapse0[i_idx][n_idx]}) = {hidden_layer[i_idx][n_idx]}")

    print(f"\n[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1")
    for n_idx in range(3):
        print(f"    [004.001] sum_synpase1 for neuron{n_idx}: {hidden_layer[i_idx][n_idx]} * {float(weights1[n_idx])} = {hidden_layer[i_idx][n_idx] * float(weights1[n_idx])}")

    print(f"\n[Step Definition] Get output as sum of sum_synpase1")
    desc = f"({hidden_layer[i_idx][0]} * {float(weights1[0])}) + ({hidden_layer[i_idx][1]} * {float(weights1[1])}) + ({hidden_layer[i_idx][2]} * {float(weights1[2])})"
    ss1 = float((hidden_layer[i_idx][0] * weights1[0]) + (hidden_layer[i_idx][1] * weights1[1]) + (hidden_layer[i_idx][2] * weights1[2]))
    print(f"    [005.001] describe ouput calculation: {desc}")
    print(f"    [005.002] Sum the sum_synpase1 as Output: {ss1}")

    print(f"\n[Step Definition] Apply sigmoid to output totals to create prediction")
    print(f"    [006.001] sigmoid(output): {ss1}")
    print(f"    [006.002] Neural network prediction: {float(sigmoid(ss1))}")
    
    

In [97]:
processing_description(0)

[Step Definition] Inputs Passing
    [001.001] Input1: 0
    [001.002] Input2: 0

[Step Definition] weights0 application
    [002.000] sum function for neuron0: (0 * -0.424) + (0 * 0.358) = 0.0
    [002.001] sum function for neuron1: (0 * -0.74) + (0 * -0.577) = 0.0
    [002.002] sum function for neuron2: (0 * -0.961) + (0 * -0.469) = 0.0

[Step Definition] Apply sigmoid to get activation function values
    [003.000] Activation value neuron0: sigmoid(0.0) = 0.5
    [003.001] Activation value neuron1: sigmoid(0.0) = 0.5
    [003.002] Activation value neuron2: sigmoid(0.0) = 0.5

[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1
    [004.001] sum_synpase1 for neuron0: 0.5 * -0.007119029167563903 = -0.0035595145837819513
    [004.001] sum_synpase1 for neuron1: 0.5 * -0.8864244669130955 = -0.44321223345654776
    [004.001] sum_synpase1 for neuron2: 0.5 * 0.1543264410978047 = 0.07716322054890234

[Step Definition] Get output as sum of sum_synpase

In [59]:
processing_description(1)

[Step Definition] Inputs Passing
    [001.001] Input1: 0
    [001.002] Input2: 1

[Step Definition] weights0 application
    [002.000] sum function for neuron0: (0 * -0.424) + (1 * 0.358) = 0.358
    [002.001] sum function for neuron1: (0 * -0.74) + (1 * -0.577) = -0.577
    [002.002] sum function for neuron2: (0 * -0.961) + (1 * -0.469) = -0.469

[Step Definition] Apply sigmoid to get activation function values
    [003.000] Activation value neuron0: sigmoid(0.358) = 0.5885562043858291
    [003.001] Activation value neuron1: sigmoid(-0.577) = 0.3596231853677901
    [003.002] Activation value neuron2: sigmoid(-0.469) = 0.38485295749078957

[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1
    [004.001] sum_synpase1 for neuron0: 0.5885562043858291 * -0.007119029167563903 = -0.004189948785773419
    [004.001] sum_synpase1 for neuron1: 0.3596231853677901 * -0.8864244669130955 = -0.3187787903792327
    [004.001] sum_synpase1 for neuron2: 0.384852

In [60]:
processing_description(2)

[Step Definition] Inputs Passing
    [001.001] Input1: 1
    [001.002] Input2: 0

[Step Definition] weights0 application
    [002.000] sum function for neuron0: (1 * -0.424) + (0 * 0.358) = -0.424
    [002.001] sum function for neuron1: (1 * -0.74) + (0 * -0.577) = -0.74
    [002.002] sum function for neuron2: (1 * -0.961) + (0 * -0.469) = -0.961

[Step Definition] Apply sigmoid to get activation function values
    [003.000] Activation value neuron0: sigmoid(-0.424) = 0.39555998258063735
    [003.001] Activation value neuron1: sigmoid(-0.74) = 0.323004143761477
    [003.002] Activation value neuron2: sigmoid(-0.961) = 0.2766780228949468

[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1
    [004.001] sum_synpase1 for neuron0: 0.39555998258063735 * -0.007119029167563903 = -0.0028160030535126267
    [004.001] sum_synpase1 for neuron1: 0.323004143761477 * -0.8864244669130955 = -0.2863187759444881
    [004.001] sum_synpase1 for neuron2: 0.276678

In [61]:
processing_description(3)

[Step Definition] Inputs Passing
    [001.001] Input1: 1
    [001.002] Input2: 1

[Step Definition] weights0 application
    [002.000] sum function for neuron0: (1 * -0.424) + (1 * 0.358) = -0.066
    [002.001] sum function for neuron1: (1 * -0.74) + (1 * -0.577) = -1.317
    [002.002] sum function for neuron2: (1 * -0.961) + (1 * -0.469) = -1.43

[Step Definition] Apply sigmoid to get activation function values
    [003.000] Activation value neuron0: sigmoid(-0.066) = 0.4835059868921233
    [003.001] Activation value neuron1: sigmoid(-1.317) = 0.21131784831127748
    [003.002] Activation value neuron2: sigmoid(-1.43) = 0.19309868423321644

[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1
    [004.001] sum_synpase1 for neuron0: 0.4835059868921233 * -0.007119029167563903 = -0.003442093223376796
    [004.001] sum_synpase1 for neuron1: 0.21131784831127748 * -0.8864244669130955 = -0.1873173110385465
    [004.001] sum_synpase1 for neuron2: 0.1930

In [98]:
# for all the arrays out hidden_layer values show a summary of above 
# processes to have the neuron calculations (or outputs) and the 
# predictions handy for reference. 
arrs = np.array([
        [0.5       , 0.5       , 0.5       ],
        [0.5885562 , 0.35962319, 0.38485296],
        [0.39555998, 0.32300414, 0.27667802],
        [0.48350599, 0.21131785, 0.19309868]])

print("Neuron Index\tNeuron Calculation\tPrediction")
print("-" * 50)
for i in range(len(arrs)):
    arr = arrs[i]
    neuron_calculations = float(arr.dot(weights1))
    prediction = sigmoid(neuron_calculations)

    print(f"{i}\t\t{round(neuron_calculations, 8)}\t\t\t{round(prediction, 8)}")

Neuron Index	Neuron Calculation	Prediction
--------------------------------------------------
0		-0.36960853			0.40863562
1		-0.26357576			0.43448491
2		-0.24643604			0.4387009
3		-0.16095917			0.45984686


# Part 2

# 3.05 - Multi-layered Perceptron Basic Algorithm

- 3.05.01. Error function (`Cost function` or `Loss function`)
- 3.05.02. Gradient descent
- 3.05.03. Derivative 
- 3.05.04. Delta
- 3.05.05. Backpropagation

#### 3.05.01 - Error Functions (`Cost function` or `Loss function`)

The next step is to calculate the error (_cost or loss_) by comparing the results of the predictions with the outputs of the dataset. The simplest formula is `error = correct - prediction`

![](https://drive.google.com/uc?export=view&id=19KkCREOLUUGuFcm5SsP074mEtDjE3H3-)



In [20]:
# create a list to store the alculated error rate of our predictions 
# whilst remembering that the formula: error = (correct - prediction)
# is applied to get the error result. 
errs = []

# create a table that shows the inputs, the expected output, 
# the neural neytworks prediction and how far away (error)
# our model is from the truth. 
print(f"Input1\tInput2\tExpected\tPrediction\tError")
print("-" * 60)
for i in range(len(inputs)):
    x, y = inputs[i]
    exp = outputs[i]
    out = output_layer[i]
    err = exp - out
    errs.append(err)
    print(f"{x}\t{y}\t{exp}\t\t{out}\t{err}")



Input1	Input2	Expected	Prediction	Error
------------------------------------------------------------
0	0	[0]		[0.40588573]	[-0.40588573]
0	1	[1]		[0.43187857]	[0.56812143]
1	0	[1]		[0.43678536]	[0.56321464]
1	1	[0]		[0.45801216]	[-0.45801216]


Now we can take the step of calculating the average error.

In [21]:
# grab the outputs defined earlier (outputs are the correct results)
outputs

array([[0],
       [1],
       [1],
       [0]])

In [22]:
# get out output layer (predictions)
output_layer

array([[0.40588573],
       [0.43187857],
       [0.43678536],
       [0.45801216]])

In [23]:
# create the errors (error = correct - preditions)
error_output_layer = outputs - output_layer
error_output_layer

array([[-0.40588573],
       [ 0.56812143],
       [ 0.56321464],
       [-0.45801216]])

In [24]:
# In order to get the average error we need to heed a reminder: 
#    ** we need to use the absolute values **
# of the errors. If this is overlooked we will skew the results. 

error_avg = np.mean(abs(error_output_layer))
print(f"Error average: {error_avg}")
print(f"Error average (rounded) : {round(error_avg, 3)}")

Error average: 0.49880848923713045
Error average (rounded) : 0.499


#### 3.05.02 - Gradient Descent

![](https://cdn-images-1.medium.com/max/600/1*iNPHcCxIvcm7RwkRaMTx1g.jpeg)


The idea of gradient descent is to manage out cost function (loss function, error function) to get to the **smallest possible error** in the adjustment of the weights. The directional control of how a weight set should be adjusted is done by calculating the partial derivative as a means of determining the direction of a gradient. 

If you imagine a x,y axis graph with a curve, the **x-axis is the weight** and the **y-axis is the error value**, we are trying to achieve the lowest point of the curve, which may never be zero by the way, in a multi-dip curve we may have a local minimum and a global minimum across the span of measurements (number of epochs). So the purpose is to calculate the slope of a curve based on the partial derivatives.

![](https://www.researchgate.net/profile/Yong_Ma15/publication/267820876/figure/fig1/AS:669428953923612@1536615708709/Schematic-of-the-local-minima-problem-in-FWI-The-data-misfit-has-spurious-local-minima.png)

- reminder of the sigmoid function: $y = \frac{1}{1 + e^{-x}}$

- calculating the partial derivative: $d = y \cdot (1 -y)$

#### Hypothetical example 

Assuming that `y` = 0.1 

- calculating the partial derivative: $d = 0.1 \cdot (1 -0.1)$


In [25]:
# already declared above but I have brought the definition
# down in order to prevent jumping about the notebook in 
# order to find earlier definitions. 
def sigmoid(sum):
    return 1 / (1 + np.exp(-sum))

In [26]:
 def sigmoid_derivative(sigmoid_value):
        return sigmoid_value * (1 - sigmoid_value)

In [27]:
s = sigmoid(0.5)
s

0.6224593312018546

In [28]:
d = sigmoid_derivative(s)
d

0.2350037122015945

#### 3.05-03 - Output layer Delta

The order of sequence is: 

- activation function (sigmoid) $y = \frac{1}{1 + e^{-x}}$

- Derivative $d = y \cdot (1 -y)$

- Delta $delta _{output} = error \cdot sigmoid _{derivative}$
- Gradient

In the notebook above we walked through calculating the neuron calculations, we can now expand on that to walk through the error and the derivative.

#### Update the process walkthrough for `error`, `partial derivative` & `delta`

In [29]:
# functionised the descriptor of the process up until this point.
# func accepts the index of the instance
def processing_description_level2(i_idx):

    print(f"[Step Definition] Inputs Passing")
    input1 = inputs[i_idx][0]
    input2 = inputs[i_idx][1]
    print(f"    [001.001] Input1: {input1}")
    print(f"    [001.002] Input2: {input2}")

    print(f"\n[Step Definition] weights0 application")
    for n_idx in range(3):
        weights0_0 = weights0[:,n_idx][0]
        weights0_1 = weights0[:,n_idx][1]
        print(f"    [002.00{n_idx}] sum function for neuron{n_idx}: ({input1} * {weights0_0}) + ({input2} * {weights0_1}) = {sum_synapse0[i_idx][n_idx]}")

    print(f"\n[Step Definition] Apply sigmoid to get activation function values")
    for n_idx in range(3):
        print(f"    [003.00{n_idx}] Activation value neuron{n_idx}: sigmoid({sum_synapse0[i_idx][n_idx]}) = {hidden_layer[i_idx][n_idx]}")

    print(f"\n[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1")
    for n_idx in range(3):
        print(f"    [004.001] sum_synpase1 for neuron{n_idx}: {hidden_layer[i_idx][n_idx]} * {float(weights1[n_idx])} = {hidden_layer[i_idx][n_idx] * float(weights1[n_idx])}")

    print(f"\n[Step Definition] Get output as sum of sum_synpase1")
    output_sum_synapse1 = float((hidden_layer[i_idx][0] * weights1[0]) + (hidden_layer[i_idx][1] * weights1[1]) + (hidden_layer[i_idx][2] * weights1[2]))
    print(f"    [005.001] describe ouput calculation: ({hidden_layer[i_idx][0]} * {float(weights1[0])}) + ({hidden_layer[i_idx][1]} * {float(weights1[1])}) + ({hidden_layer[i_idx][2]} * {float(weights1[2])})")
    print(f"    [005.002] Sum the sum_synpase1 as Output: {output_sum_synapse1}")

    print(f"\n[Step Definition] Apply sigmoid to output total to create the prediction")
    nn_prediction = sigmoid(output_sum_synapse1)
    print(f"    [006.001] sigmoid({output_sum_synapse1}): {nn_prediction}")
    print(f"    [006.002] Neural network prediction: {nn_prediction}")
    
    print(f"\n[Step Definition] Calculate the error")
    err_val = outputs[i_idx] - nn_prediction
    print(f"    [007.001] Get error value: output[{i_idx}] - {nn_prediction} = {err_val}")
    
    print(f"\n[Step Definition] Get the partial derivative")
    derivative = sigmoid_derivative(nn_prediction)
    print(f"    [008.001] Get the partial derivative: sigmoid_derivative({nn_prediction}) = {derivative}")
    print(f"    [008.002] The partial derivative is: {derivative}")
    
    print(f"\n[Step Definition] Get the delta")
    delta = err_val * derivative
    print(f"    [009.001] Get the delta: (err * derivative) : {err_val} * {derivative} = {err_val - derivative}")
    print(f"    [009.002] The delta is: {delta}")
    
    

In [30]:
processing_description_level2(0)

[Step Definition] Inputs Passing
    [001.001] Input1: 0
    [001.002] Input2: 0

[Step Definition] weights0 application
    [002.000] sum function for neuron0: (0 * -0.424) + (0 * 0.358) = 0.0
    [002.001] sum function for neuron1: (0 * -0.74) + (0 * -0.577) = 0.0
    [002.002] sum function for neuron2: (0 * -0.961) + (0 * -0.469) = 0.0

[Step Definition] Apply sigmoid to get activation function values
    [003.000] Activation value neuron0: sigmoid(0.0) = 0.5
    [003.001] Activation value neuron1: sigmoid(0.0) = 0.5
    [003.002] Activation value neuron2: sigmoid(0.0) = 0.5

[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1
    [004.001] sum_synpase1 for neuron0: 0.5 * -0.017 = -0.0085
    [004.001] sum_synpase1 for neuron1: 0.5 * -0.893 = -0.4465
    [004.001] sum_synpase1 for neuron2: 0.5 * 0.148 = 0.074

[Step Definition] Get output as sum of sum_synpase1
    [005.001] describe ouput calculation: (0.5 * -0.017) + (0.5 * -0.893) + (0.5 

In [31]:
processing_description_level2(1)

[Step Definition] Inputs Passing
    [001.001] Input1: 0
    [001.002] Input2: 1

[Step Definition] weights0 application
    [002.000] sum function for neuron0: (0 * -0.424) + (1 * 0.358) = 0.358
    [002.001] sum function for neuron1: (0 * -0.74) + (1 * -0.577) = -0.577
    [002.002] sum function for neuron2: (0 * -0.961) + (1 * -0.469) = -0.469

[Step Definition] Apply sigmoid to get activation function values
    [003.000] Activation value neuron0: sigmoid(0.358) = 0.5885562043858291
    [003.001] Activation value neuron1: sigmoid(-0.577) = 0.3596231853677901
    [003.002] Activation value neuron2: sigmoid(-0.469) = 0.38485295749078957

[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1
    [004.001] sum_synpase1 for neuron0: 0.5885562043858291 * -0.017 = -0.010005455474559095
    [004.001] sum_synpase1 for neuron1: 0.3596231853677901 * -0.893 = -0.32114350453343654
    [004.001] sum_synpase1 for neuron2: 0.38485295749078957 * 0.148 = 0.056

In [32]:
processing_description_level2(2)

[Step Definition] Inputs Passing
    [001.001] Input1: 1
    [001.002] Input2: 0

[Step Definition] weights0 application
    [002.000] sum function for neuron0: (1 * -0.424) + (0 * 0.358) = -0.424
    [002.001] sum function for neuron1: (1 * -0.74) + (0 * -0.577) = -0.74
    [002.002] sum function for neuron2: (1 * -0.961) + (0 * -0.469) = -0.961

[Step Definition] Apply sigmoid to get activation function values
    [003.000] Activation value neuron0: sigmoid(-0.424) = 0.39555998258063735
    [003.001] Activation value neuron1: sigmoid(-0.74) = 0.323004143761477
    [003.002] Activation value neuron2: sigmoid(-0.961) = 0.2766780228949468

[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1
    [004.001] sum_synpase1 for neuron0: 0.39555998258063735 * -0.017 = -0.006724519703870836
    [004.001] sum_synpase1 for neuron1: 0.323004143761477 * -0.893 = -0.288442700378999
    [004.001] sum_synpase1 for neuron2: 0.2766780228949468 * 0.148 = 0.0409483

In [33]:
processing_description_level2(3)

[Step Definition] Inputs Passing
    [001.001] Input1: 1
    [001.002] Input2: 1

[Step Definition] weights0 application
    [002.000] sum function for neuron0: (1 * -0.424) + (1 * 0.358) = -0.066
    [002.001] sum function for neuron1: (1 * -0.74) + (1 * -0.577) = -1.317
    [002.002] sum function for neuron2: (1 * -0.961) + (1 * -0.469) = -1.43

[Step Definition] Apply sigmoid to get activation function values
    [003.000] Activation value neuron0: sigmoid(-0.066) = 0.4835059868921233
    [003.001] Activation value neuron1: sigmoid(-1.317) = 0.21131784831127748
    [003.002] Activation value neuron2: sigmoid(-1.43) = 0.19309868423321644

[Step Definition] Perform neuron calculations (activation * weight1) to generate sum_synapse1
    [004.001] sum_synpase1 for neuron0: 0.4835059868921233 * -0.017 = -0.008219601777166097
    [004.001] sum_synpase1 for neuron1: 0.21131784831127748 * -0.893 = -0.1887068385419708
    [004.001] sum_synpase1 for neuron2: 0.19309868423321644 * 0.148 = 0.02

#### 3.05-04 - Delta implementation in Python

As a reminder, the delta is **used to determine the direction of the gradient in order to update weights**. The process takes each instance of a case's dataset and returns a delta value:

- To calculate the `derivative_output` we need to get the result of the `sigmoid_derivative(output_layer)`
- to calculate the delta output we need to take the `error_output_layer` and multiply by the `derivative_output`

In [34]:
output_layer

array([[0.40588573],
       [0.43187857],
       [0.43678536],
       [0.45801216]])

In [35]:
derivative_output = sigmoid_derivative(output_layer)
derivative_output

array([[0.2411425 ],
       [0.24535947],
       [0.24600391],
       [0.24823702]])

In [36]:
error_output_layer

array([[-0.40588573],
       [ 0.56812143],
       [ 0.56321464],
       [-0.45801216]])

In [37]:
delta_output = error_output_layer * derivative_output
delta_output

array([[-0.0978763 ],
       [ 0.13939397],
       [ 0.138553  ],
       [-0.11369557]])

$delta _{hidden} = sigmoid _{derivative} \cdot weight \cdot delta _{output} $

In [38]:
hidden_layer

array([[0.5       , 0.5       , 0.5       ],
       [0.5885562 , 0.35962319, 0.38485296],
       [0.39555998, 0.32300414, 0.27667802],
       [0.48350599, 0.21131785, 0.19309868]])

In [39]:
print("inputs\tIter\tActivation\tDerivative\tWeight\tDelta\tPrediction")
print("-" * 75)
for i in range(len(hidden_layer)):
    x,y = inputs[i]
    arr = np.array(hidden_layer[i])
    print("")
    for j in range(len(arr)):  
        synapse = round(hidden_layer[i][j], 5)
        sd_val = round(sigmoid_derivative(hidden_layer[i][j]), 5)
        weight = float(weights1[j])
        do = round(float(delta_output[i]), 3)
        print(f"[{x,y}] [{j}]\t{synapse}\t\t{sd_val}\t\t{weight}\t{do}\t{round(float(sd_val * weight * do),8)}")
        

inputs	Iter	Activation	Derivative	Weight	Delta	Prediction
---------------------------------------------------------------------------

[(0, 0)] [0]	0.5		0.25		-0.017	-0.098	0.0004165
[(0, 0)] [1]	0.5		0.25		-0.893	-0.098	0.0218785
[(0, 0)] [2]	0.5		0.25		0.148	-0.098	-0.003626

[(0, 1)] [0]	0.58856		0.24216		-0.017	0.139	-0.00057222
[(0, 1)] [1]	0.35962		0.23029		-0.893	0.139	-0.02858521
[(0, 1)] [2]	0.38485		0.23674		0.148	0.139	0.00487022

[(1, 0)] [0]	0.39556		0.23909		-0.017	0.139	-0.00056497
[(1, 0)] [1]	0.323		0.21867		-0.893	0.139	-0.02714285
[(1, 0)] [2]	0.27668		0.20013		0.148	0.139	0.00411707

[(1, 1)] [0]	0.48351		0.24973		-0.017	-0.114	0.00048398
[(1, 1)] [1]	0.21132		0.16666		-0.893	-0.114	0.01696632
[(1, 1)] [2]	0.1931		0.15581		0.148	-0.114	-0.00262883


#### Rationalise the code.

In [40]:
# delta_output_weight_multiplier = delta_output.dot(weights1)
weights1

array([[-0.017],
       [-0.893],
       [ 0.148]])

In [41]:
# the T denotes transposed, where cols become rows and rows become cols
# this is basically a total reshaping allowing us to continue with the 
# weights array as a usable multiplier because if we attempt to calc
# delta_output.dot(weights1) we have a shape mismatch. 
weights1T = weights1.T
weights1T

array([[-0.017, -0.893,  0.148]])

In [42]:
# check shapes 
print(f"Shape of weights1     : {weights1.shape}")
print(f"Shape of weights1T    : {weights1T.shape}")
print(f"Shape of delta_output : {delta_output.shape}")


Shape of weights1     : (3, 1)
Shape of weights1T    : (1, 3)
Shape of delta_output : (4, 1)


In [43]:
delta_output_weight_multiplier = delta_output.dot(weights1T)
delta_output_weight_multiplier

array([[ 0.0016639 ,  0.08740354, -0.01448569],
       [-0.0023697 , -0.12447882,  0.02063031],
       [-0.0023554 , -0.12372783,  0.02050584],
       [ 0.00193282,  0.10153015, -0.01682694]])

Here we have a reminder of our functions, for the purposes of not having to traverse back up the notebook 

In [44]:
# definition of a sigmoid function. 
def sigmoid(sum):
    return 1 / (1 + np.exp(-sum))

In [45]:
 def sigmoid_derivative(sigmoid):
        return sigmoid * (1 - sigmoid)

In [46]:
hidden_layer

array([[0.5       , 0.5       , 0.5       ],
       [0.5885562 , 0.35962319, 0.38485296],
       [0.39555998, 0.32300414, 0.27667802],
       [0.48350599, 0.21131785, 0.19309868]])

In [47]:
delta_hidden_layer = delta_output_weight_multiplier * sigmoid_derivative(hidden_layer)
delta_hidden_layer

array([[ 0.00041597,  0.02185088, -0.00362142],
       [-0.00057384, -0.02866677,  0.00488404],
       [-0.00056316, -0.02705587,  0.00410378],
       [ 0.00048268,  0.01692128, -0.00262183]])

#### 3.05.05 - Backpropagation (adjusting the weights)

Up until now we have worked on the `feed forward` principle by applying weights to the input layer that lead to hidden layer calculations. Then feeding those results forward until we have an output layer total and `prediction`. Overall the process works from `left to right`. Backproagation, on the other hand, is the reverse of this flow in the sense that we will recalculate the weights updates based on results, weights and activation_function results as the input. Then apply them from `right to left`. We will use the formula: $weight_{n + 1} = weight_{n} + (input \cdot delta \cdot learning\_rate)$

When creating a neural network the user, or programmer, determines the value of the learning rate. This rate defines the speed of the algorithm or how fast it will learn. Where the learning rate is: 
- High : convergence is fast but the risk is to lose the global minimum. 
- Low : convergence is slow but the risk of losing the global minimum is greatly reduced. 

Note: convergence means the neural network has reached the best result, or global minimum. To maximise a neural network's capability with its efficiency and avoiding setting the learning rate so high that it loses the global minimum many libraries will implement a dynamic learning rate that reduces the rate as the number of epochs increases, so a learning rate starts off higher and reduces as it nears the global minimum which is deemed the best of both worlds.  

In [48]:
# define the learning rate as 0.3
learning_rate = 0.3

# CODE ALONG SAMPLE CODE

In [49]:
# lets print off a summary of our strutures and values at this stage
print("Summary")
print("-" * 60)
print(f"inputs:\n {inputs}\n")
print(f"weights0:\n {weights0}\n")
print(f"sum_synapse0:\n {sum_synapse0}\n")
print(f"hidden_layer:\n {hidden_layer}\n")
print(f"weights1:\n {weights1}\n")
print(f"sum_synapse1:\n {sum_synapse1}\n")
print(f"output_layer:\n {output_layer}\n")
print(f"derivative_outut:\n {derivative_output}\n")
print(f"error_output_layer:\n {error_output_layer}\n")
print(f"delta_output:\n {delta_output}\n")

Summary
------------------------------------------------------------
inputs:
 [[0 0]
 [0 1]
 [1 0]
 [1 1]]

weights0:
 [[-0.424 -0.74  -0.961]
 [ 0.358 -0.577 -0.469]]

sum_synapse0:
 [[ 0.     0.     0.   ]
 [ 0.358 -0.577 -0.469]
 [-0.424 -0.74  -0.961]
 [-0.066 -1.317 -1.43 ]]

hidden_layer:
 [[0.5        0.5        0.5       ]
 [0.5885562  0.35962319 0.38485296]
 [0.39555998 0.32300414 0.27667802]
 [0.48350599 0.21131785 0.19309868]]

weights1:
 [[-0.017]
 [-0.893]
 [ 0.148]]

sum_synapse1:
 [[-0.381     ]
 [-0.27419072]
 [-0.25421887]
 [-0.16834784]]

output_layer:
 [[0.40588573]
 [0.43187857]
 [0.43678536]
 [0.45801216]]

derivative_outut:
 [[0.2411425 ]
 [0.24535947]
 [0.24600391]
 [0.24823702]]

error_output_layer:
 [[-0.40588573]
 [ 0.56812143]
 [ 0.56321464]
 [-0.45801216]]

delta_output:
 [[-0.0978763 ]
 [ 0.13939397]
 [ 0.138553  ]
 [-0.11369557]]



In [50]:
hidden_layer

array([[0.5       , 0.5       , 0.5       ],
       [0.5885562 , 0.35962319, 0.38485296],
       [0.39555998, 0.32300414, 0.27667802],
       [0.48350599, 0.21131785, 0.19309868]])

In [51]:
delta_output

array([[-0.0978763 ],
       [ 0.13939397],
       [ 0.138553  ],
       [-0.11369557]])

In [52]:
# transpose the hidden layer so we can 
# work by neuron index across all instances
# with the multiplier. 
hidden_layerT = hidden_layer.T
hidden_layerT

array([[0.5       , 0.5885562 , 0.39555998, 0.48350599],
       [0.5       , 0.35962319, 0.32300414, 0.21131785],
       [0.5       , 0.38485296, 0.27667802, 0.19309868]])

In [53]:
# calculate the deltas needed for the updated weights 
# calculations to be done. 
input_delta1_multiplier = hidden_layerT.dot(delta_output)
input_delta1_multiplier

array([[0.03293657],
       [0.02191844],
       [0.02108814]])

In [54]:
# update the weights from the 
# hidden layer to the output layer 
weights1 = weights1 + (input_delta1_multiplier * learning_rate) 
weights1

array([[-0.00711903],
       [-0.88642447],
       [ 0.15432644]])

In [55]:
delta_hidden_layer

array([[ 0.00041597,  0.02185088, -0.00362142],
       [-0.00057384, -0.02866677,  0.00488404],
       [-0.00056316, -0.02705587,  0.00410378],
       [ 0.00048268,  0.01692128, -0.00262183]])

In [56]:
np.round_(delta_hidden_layer, 4)
#delta_hidden_layer

array([[ 0.0004,  0.0219, -0.0036],
       [-0.0006, -0.0287,  0.0049],
       [-0.0006, -0.0271,  0.0041],
       [ 0.0005,  0.0169, -0.0026]])

# 3.06 - Implementation of multi-layer Perceptron with Python & numpy