# Simple Steps Back-Propagation

The  steps for the backpropagation of a simple neural network with one input layer, one hidden layer with three ReLU nodes, and one linear output. Let's denote:

- $x$ as the input,
- $h_1, h_2, h_3$ as the hidden layer nodes with ReLU activation,
- $y$ as the output,
- $w$ as the weights, and
- $b$ as the bias.

Assuming a Mean Squared Error (MSE) loss function, the steps for backpropagation are as follows:

1. **Forward Pass:**
   - Compute the weighted sum and apply the ReLU activation for the hidden layer nodes:
     $$
     \begin{align*}
     a_1 &= w_{11}^{(1)}x + b_{1}^{(1)} \\
     h_1 &= \max(0, a_1) \\
     a_2 &= w_{21}^{(1)}x + b_{2}^{(1)} \\
     h_2 &= \max(0, a_2) \\
     a_3 &= w_{31}^{(1)}x + b_{3}^{(1)} \\
     h_3 &= \max(0, a_3) \\
     \end{align*}
     $$
   - Compute the weighted sum for the output node:
     $$
     \begin{align*}
     a &= w_{1}^{(2)}h_1 + w_{2}^{(2)}h_2 + w_{3}^{(2)}h_3 + b^{(2)} \\
     y &= a
     \end{align*}
     $$

2. **Compute the Loss:**
   - Compute the loss using the MSE formula:
     $$
     L = \frac{1}{2}(y_{\text{true}} - y_{\text{predicted}})^2
     $$

3. **Backward Pass:**
   - Compute the gradient of the loss with respect to the output node:
     $$
     \frac{\partial L}{\partial a} = y_{\text{predicted}} - y_{\text{true}}
     $$
   - Update the weights and biases for the output layer:
     $$
     \begin{align*}
     \frac{\partial L}{\partial w_{i}^{(2)}} &= \frac{\partial L}{\partial a} \cdot h_i \\
     \frac{\partial L}{\partial b^{(2)}} &= \frac{\partial L}{\partial a}
     \end{align*}
     $$
   - Propagate the gradient through the ReLU activation for the hidden layer nodes:
     $$
     \begin{align*}
     \frac{\partial L}{\partial a_i} &= 
     \begin{cases}
     \frac{\partial L}{\partial a_i}, & \text{if } a_i > 0 \\
     0, & \text{otherwise}
     \end{cases}
     \end{align*}
     $$
   - Update the weights and biases for the hidden layer:
     $$
     \begin{align*}
     \frac{\partial L}{\partial w_{ij}^{(1)}} &= \frac{\partial L}{\partial a_j} \cdot x_i, \quad \text{where } j \in \{1,2,3\} \\
     \frac{\partial L}{\partial b^{(1)}_j} &= \frac{\partial L}{\partial a_j}
     \end{align*}
     $$

These equations give you the gradients needed to perform gradient descent and update the weights and biases in the network. The learning rate and the number of iterations would be additional parameters to consider when implementing this in practice.