```{contents}
```

## Loss Functions for Regression in Neural Networks

Regression problems use **continuous output**, so these loss functions measure how far predictions $ŷ$ are from actual values $y$.

---

### 1. Mean Squared Error (MSE)

**Loss (single sample):**
$$
(y - \hat{y})^2
$$

**Cost (all samples):**
$$
\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
$$

### ✔ Advantages

* Differentiable → works smoothly with gradient descent
* Single global minimum
* Fast convergence due to smooth parabola shape

### ✘ Disadvantage

* **Not robust to outliers**
  Squaring errors makes large deviations influence the model heavily.

---

### 2. Mean Absolute Error (MAE)

**Loss:**
$$
|y - \hat{y}|
$$

**Cost:**
$$
\frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i|
$$

### ✔ Advantages

* Robust to outliers → no squaring, so extreme values don't dominate

### ✘ Disadvantages

* No smooth parabola → gradient is constant
* Slower convergence
* Need **sub-gradient** methods (not standard gradient descent)

---

### 3. Huber Loss

Hybrid of MSE and MAE. Uses a threshold δ (hyperparameter).

$$
\text{If } |y - \hat{y}| < \delta:
\quad \frac{1}{2}(y - \hat{y})^2
$$

$$
\text{Else: }
\delta |y - \hat{y}| - \frac{1}{2} \delta^2
$$

### ✔ Why use?

* Behaves like **MSE** for small errors → smooth optimization
* Behaves like **MAE** for large errors → ignores outliers

---

### 4. Root Mean Squared Error (RMSE)

$$
\sqrt{
\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
}
$$

Just the **square root of MSE**.

### Notes:

* Same sensitivity to outliers as MSE
* Output is in the **same unit as target variable**

You were asked to think about pros/cons — here they are:

**Pros:**

* Interpretable (same units as data)

**Cons:**

* Still penalizes outliers heavily

---

**Quick Use-Case Guide**

| Loss Function | Outliers? | Convergence | Smooth Gradient | Best For                      |
| ------------- | --------- | ----------- | --------------- | ----------------------------- |
| MSE           | No        | Fast        | Yes             | Clean data                    |
| MAE           | Yes       | Slower      | No              | Noisy data                    |
| Huber         | Some      | Moderate    | Yes (mostly)    | Mixed data                    |
| RMSE          | No        | Fast        | Yes             | When interpretability matters |

