# Chapter 9 - Backpropagation

The motivation behind backpropagation is find how each paramater (weight and baises) affects the loss function to minimize it.

Effectively a model can be expressed as a set of composite function, where $y$ is the function output.

$$
y = f_3(f_2(f_1(x)))
$$

Because of the chain rule we can express the loss function ($L$) partial derivation as:

$$
\frac{\partial L}{\partial x} = \frac{\partial L}{\partial f_3}.\frac{\partial f_3}{\partial f_2}.\frac{\partial f_2}{\partial f_1}.\frac{\partial f_1}{\partial x}
$$

## Single neuron

### Forward propagation

In this example, we have a single neuron with 2 parameters + a ReLU activation function:

$$
z = w_1.x_1 + w_2.x_2 + b
$$

$$
y = ReLU(z) = max(z, 0)
$$

* $x_1$, $x_2$: input values
* $w_1$, $w_2$: neuron weights
* $b$: neuron bais

The loss function ($L$) is the mean squared error, aka MSE. Where $t$ is a target value:

$$
L = \frac{1}{2}(y-t)^2
$$

### Back propagation

Step 1: Compute loss function partial derivative

$$
\frac{\partial L}{\partial y} = y - t
$$

Step 2: Compute activation function partial derivative

$$
\frac{\partial y}{\partial z}= \begin{cases}
   1 &\text{if } z > 0 \\
   0 &\text{otherwise }
\end{cases}
$$

Step 3: Compute neuron partial derivatives

$$
\frac{\partial z}{\partial w_1} = x_1, \quad \frac{\partial z}{\partial w_2} = x_2, \quad \frac{\partial z}{\partial b} = 1
$$

Step 4: Apply chain rule to compute the partial derivative (assuming ReLU is activated)

$$
\frac{\partial L}{\partial w_1} = (y - t) . 1 . x_1
$$

$$
\frac{\partial L}{\partial w_2} = (y - t) . 1 . x_2
$$

$$
\frac{\partial L}{\partial b} = (y - t) . 1 . 1
$$

## Code example

Setup the intial inputs and neurons parameters:

In [14]:
x = [1.0, -2.0, 3.0] # Inputs
w = [-3.0, -1.0, 2.0] # Weights
b = 1.0 # Bais

Apply forward propagations. For this example we will optimize the neural network output, by making it converge to 0, which isn't a realistic example. In practice the goal is to optimize the loss function, not the neural network output. 

In [15]:
# Calculate neuron output pre-activation
xw0 = w[0] * x[0]
xw1 = w[1] * x[1]
xw2 = w[2] * x[2]
z = xw0 + xw1 + xw2 + b

# Calculate neuron output by applying ReLU
y = max(0, z)

print("Output:", y)

Output: 6.0


Apply derivation starting from the end activation function. Using the `dx_dy` variable naming convention to denote the partial derivative of `x` over `y`: $\frac{\partial x}{\partial y}$

In [16]:
# Activation function derivative.
dy_dz = (1 if z > 0 else 0)

# Neuron derivative pre-activation. 
dz_dw0 = x[0]
dz_dw1 = x[1]
dz_dw2 = x[2]
dz_db = b

In [20]:
# Apply chain rule to caculate activation function derivative relative to 
# weights and bais.
dy_dw0 = dy_dz * dz_dw0
dy_dw1 = dy_dz * dz_dw1
dy_dw2 = dy_dz * dz_dw2
dy_db = dy_dz * dz_db

dw = [dy_dw0, dy_dw1, dy_dw2] # Weights gradient
db = dy_db # Bais gradient

print("Weight gradient:", dw)
print("Bais gradient:", db)

Weight gradient: [1.0, -2.0, 3.0]
Bais gradient: 1.0


Using the gradient to make slight modifications to the model parameters.

In [21]:
epsilon = -0.01

w[0] += dw[0] * epsilon
w[1] += dw[1] * epsilon
w[2] += dw[2] * epsilon
b += db * epsilon

Re-running the same forward pass with the adjusted weights.

In [22]:
# Calculate neuron output pre-activation
xw0 = w[0] * x[0]
xw1 = w[1] * x[1]
xw2 = w[2] * x[2]
z = xw0 + xw1 + xw2 + b

# Calculate neuron output by applying ReLU
y = max(0, z)

print("Output:", y)

Output: 5.8500000000000005


We can see that after adjusting the parameters, the output when from `6.00` to `5.85`. Yeah!