# Creating a Backpropagation Pipeline <br>

Now we are programmatically making on how to simulate someone who did a review. Since forward pass sounds like someone who just never learned anything, introducting backpropagation means having a chance to rethink and review for our model to generalize properly.

We will start by importing our essential packages

In [1]:
import numpy as np

Moving on with our pipeline

In [2]:
x = np.array([1,0,1])
y = 1.0

In [3]:
#Loading properties from previous activity
w1 = np.array([[0.2, -0.3],
               [0.4, 0.1],
               [-0.5, 0.2]])
bias = np.array([-0.4, 0.2, 0.1])
w2 = np.array([-0.3, 0.2])

In [4]:
def relu(z): return np.maximum(0, z)

In [5]:
z_hidden = x.dot(w1) + bias[:2]
a_hidden = relu(z_hidden)
z_out = a_hidden.dot(w2) + bias[2]
y_hat = z_out

In [6]:
sse = (y - y_hat) ** 2
half_mse = 0.5 * sse

In [7]:
print("z_hidden:", z_hidden)
print("a_hidden:", a_hidden)
print("y_hat:", y_hat)
print("SSE:", sse)
print("0.5 * SSE:", half_mse)

z_hidden: [-0.7  0.1]
a_hidden: [0.  0.1]
y_hat: 0.12000000000000001
SSE: 0.7744
0.5 * SSE: 0.3872


We did achieve the same result from our manual calculation in our forward pass earlier, albeit in a programmatical manner

At this rate, we will perform a backpropagation task.

In this first part, we will subtract the real observed value to the predicted value, which is our error and serve as our gradient loss.

In [8]:
dL_dyhat = -(y_hat-y)
dL_dyhat

np.float64(0.88)

In [9]:
dL_dW2  = a_hidden * dL_dyhat
dL_dW2

array([0.   , 0.088])

In [10]:
dL_dtheta3 = dL_dyhat

Now we move in calculating the hidden layer...

In [11]:
dL_da_hidden = w2 * dL_dyhat

We did multiply the loss from our previous layer.

In [12]:
#Now we are taking the deribative from ReLU
dL_dz_hidden = dL_da_hidden * (z_hidden > 0).astype(float)

In [13]:
dL_dW1 = np.outer(x, dL_dz_hidden)
dL_dtheta12 = dL_dz_hidden

In [14]:
print("\nBackward pass (gradients):")
print("dL/dW2:", dL_dW2)
print("dL/dtheta3:", dL_dtheta3)
print("dL/dW1:\n", dL_dW1)
print("dL/dtheta1,2:", dL_dtheta12)


Backward pass (gradients):
dL/dW2: [0.    0.088]
dL/dtheta3: 0.88
dL/dW1:
 [[-0.     0.176]
 [-0.     0.   ]
 [-0.     0.176]]
dL/dtheta1,2: [-0.     0.176]


Let's introduce the learning rate

In [15]:
lr = 0.001

In [16]:
w2 -= lr * dL_dW2
bias[2] -= lr * dL_dtheta3
w1 -= lr * dL_dW1
bias[:2] -= lr * dL_dtheta12

In [17]:
print("\nUpdated parameters:")
print("W1:\n", w1)
print("theta[:2]:", bias[:2])
print("W2:", w2)
print("theta3:", bias[2])


Updated parameters:
W1:
 [[ 0.2      -0.300176]
 [ 0.4       0.1     ]
 [-0.5       0.199824]]
theta[:2]: [-0.4       0.199824]
W2: [-0.3       0.199912]
theta3: 0.09912


Let's pass it again to see with the new parameters:

In [18]:
z_hidden = x.dot(w1) + bias[:2]
a_hidden = relu(z_hidden)
z_out = a_hidden.dot(w2) + bias[2]
y_hat = z_out

In [19]:
sse = (y - y_hat) ** 2
half_mse = 0.5 * sse

In [20]:
print("z_hidden:", z_hidden)
print("a_hidden:", a_hidden)
print("y_hat:", y_hat)
print("SSE:", sse)
print("0.5 * SSE:", half_mse)

z_hidden: [-0.7       0.099472]
a_hidden: [0.       0.099472]
y_hat: 0.119005646464
SSE: 0.7761510509623145
0.5 * SSE: 0.38807552548115726


Somehow it's learning anything given that our SSE has reduced. If we take into multiple steps, this can give our very bare bones model the ability to predict the best value.