## Cost Function vs Loss Function 

Both measure **error**, but they work at **different scales**.

---

### Loss Function

**Error for a single training example**

When one input goes through the network and produces prediction ŷ:

$$
\text{Loss} = L(y, \hat{y})
$$

Example (Mean Squared Error for one sample):

$$
L = (y - \hat{y})^2
$$

Used during:

* Forward pass (per sample)
* Backpropagation (per sample or per mini-batch)

---

### Cost Function

**Average (or total) loss over the entire dataset or batch**

If there are $N$ samples:

$$
\text{Cost} = J = \frac{1}{N} \sum_{i=1}^{N} L(y_i, \hat{y}_i)
$$

Example (MSE over many samples):

$$
J = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
$$

Used by:

* Optimizers like Gradient Descent, Adam, RMSProp
* Backpropagation to update all weights

---

### Key Difference

| Feature | Loss Function               | Cost Function                         |
| ------- | --------------------------- | ------------------------------------- |
| Scope   | One sample                  | Many samples (batch or full set)      |
| Purpose | Measures individual error   | Measures overall model performance    |
| Used in | Forward/backprop per sample | Optimization step                     |
| Example | ((y - \hat{y})^2)           | (\frac{1}{N}\sum (y_i - \hat{y}_i)^2) |

---

**Why They Matter**

* **Loss** tells how wrong prediction is for one example.
* **Cost** aggregates loss, and the optimizer minimizes it by adjusting weights.

Training loop:

1. Forward pass → compute ŷ
2. Compute loss
3. Combine to form cost
4. Backprop through cost
5. Update weights