```{contents}
```

## Workflow

FNNs follow a **sequential pipeline**:
$$
\text{Input Data} ;→; \text{Forward Pass} ;→; \text{Loss Computation} ;→; \text{Backward Pass} ;→; \text{Parameter Update}
$$

---

### 🧩 **1. Data Preparation**

### Input Features and Labels

You start with a dataset:
$$
{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \ldots, (x^{(m)}, y^{(m)})}
$$

* $x^{(i)}$: input vector (e.g., pixel values, numeric features)
* $y^{(i)}$: target output (e.g., class label or numeric value)

### Preprocessing

* Normalize or standardize input features
* One-hot encode categorical labels (for classification)
* Split into **train**, **validation**, and **test** sets

---

### Initialize the Model Parameters

For each layer (l):

* Randomly initialize **weights** ( W^{(l)} )
* Initialize **biases** ( b^{(l)} ) to zeros or small constants

Example:
$$
W^{(l)} \in \mathbb{R}^{n_l \times n_{l-1}}, \quad b^{(l)} \in \mathbb{R}^{n_l}
$$

---

### Forward Propagation

Goal: Compute the predicted output ( \hat{y} ) for input ( x ).

For each layer ( l = 1, 2, ..., L ):

1. **Linear transformation:**
   $$
   z^{(l)} = W^{(l)} a^{(l-1)} + b^{(l)}
   $$
   (for the first layer, ( a^{(0)} = x ))

2. **Activation function:**
   $$
   a^{(l)} = f(z^{(l)})
   $$

After the final layer, ( a^{(L)} = \hat{y} ) is the model’s prediction.

---

### Compute the Loss

Compare prediction ( \hat{y} ) and true output ( y ) using a **loss function**:
$$
L = \text{Loss}(y, \hat{y})
$$

Examples:

* **Mean Squared Error (MSE):**
  $$
  L = \frac{1}{m} \sum (y - \hat{y})^2
  $$
* **Cross Entropy (for classification):**
  $$
  L = -\frac{1}{m} \sum y \log(\hat{y})
  $$

Loss gives a **numerical measure of error** — how far predictions are from ground truth.

---

### Backpropagation (Gradient Computation)

Goal: Compute how much each parameter contributed to the loss.

#### Using Chain Rule:

For each layer ( l ):
$$
\frac{\partial L}{\partial W^{(l)}} = \frac{\partial L}{\partial a^{(l)}} \cdot \frac{\partial a^{(l)}}{\partial z^{(l)}} \cdot \frac{\partial z^{(l)}}{\partial W^{(l)}}
$$

Each layer computes:

* Gradient of loss wrt weights: ( \frac{\partial L}{\partial W^{(l)}} )
* Gradient of loss wrt biases: ( \frac{\partial L}{\partial b^{(l)}} )

These gradients represent **how to adjust parameters** to reduce loss.

---

### Weight Update (Optimization)

Once gradients are computed, update weights and biases:

$$
W^{(l)} := W^{(l)} - \eta \frac{\partial L}{\partial W^{(l)}}
$$
$$
b^{(l)} := b^{(l)} - \eta \frac{\partial L}{\partial b^{(l)}}
$$

where ( \eta ) = learning rate (step size).

**Optimizers**:

* **SGD** (Stochastic Gradient Descent)
* **Adam** (adaptive momentum)
* **RMSProp**, **Adagrad**, etc.

---

### Repeat for Multiple Epochs

An **epoch** = one full pass through all training samples.

Repeat:

1. Forward pass
2. Loss computation
3. Backpropagation
4. Parameter update

until convergence (loss stops decreasing significantly).

---

### Model Evaluation

After training:

* Test the model on unseen data
* Use metrics like **accuracy**, **precision**, **recall**, **F1**, or **RMSE**

Example for classification:

$$
\text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}}
$$

---

### Model Tuning

You can refine performance by:

* Adjusting number of layers or neurons
* Changing activation functions
* Optimizing learning rate
* Adding **regularization** (L2, dropout)
* Early stopping to avoid overfitting

---

## 🔹 **10. Deployment**

Once the model performs well:

* Save parameters (( W, b ))
* Deploy as API or embedded model
* Monitor predictions and retrain periodically with new data

---

# 🧠 Full Workflow Summary Table

| Step | Process                  | Input            | Output                      |
| ---- | ------------------------ | ---------------- | --------------------------- |
| 1    | Data preparation         | Raw data         | Normalized features, labels |
| 2    | Parameter initialization | Input dimensions | Random $W, b$             |
| 3    | Forward propagation      | $x$            | Predicted $\hat{y}$       |
| 4    | Compute loss             | $y, \hat{y}$   | Error value                 |
| 5    | Backpropagation          | Loss             | Gradients of $W, b$       |
| 6    | Weight update            | Gradients        | New $W, b$                |
| 7    | Training loop            | Dataset          | Trained model               |
| 8    | Evaluation               | Test set         | Metrics                     |
| 9    | Tuning                   | Model parameters | Improved accuracy           |
| 10   | Deployment               | Final model      | Production-ready system     |

---

### Visual Summary

```
         ┌────────────────────────────────────────────────────┐
         │              FEEDFORWARD WORKFLOW                  │
         ├────────────────────────────────────────────────────┤
         │ 1. Data Preparation → Normalize, Split             │
         │ 2. Initialize Weights and Biases                   │
         │ 3. Forward Propagation (Input → Output)            │
         │ 4. Compute Loss (Compare y, ŷ)                     │
         │ 5. Backpropagation (Compute Gradients)             │
         │ 6. Update Weights (Gradient Descent / Adam)        │
         │ 7. Repeat for Multiple Epochs                      │
         │ 8. Evaluate on Test Data                           │
         │ 9. Tune Hyperparameters                            │
         │10. Deploy and Monitor                              │
         └────────────────────────────────────────────────────┘
```

---

**Key Intuition Recap**

| Concept             | Explanation                                         |
| ------------------- | --------------------------------------------------- |
| **Feedforward**     | Data flows one way (no loops)                       |
| **Learning**        | Minimize error through gradient descent             |
| **Backpropagation** | Adjusts weights efficiently                         |
| **Activation**      | Adds non-linearity to learn complex patterns        |
| **Iteration**       | Gradually improves mapping between input and output |

---


> The **Feedforward Neural Network workflow** is a cyclic process of forward computation, loss evaluation, backward gradient propagation, and parameter optimization — repeated over data until the network learns an accurate input-output mapping.
