# Loss Functions:

In this notebook we describe commonly used **machine learning loss functions**, grouped by task. We provide the formal definitions as well as the formal definition and use cases.

## üìÇ Table of Contents

1. [Regression Losses](#-regression-losses)
2. [Classification Losses](#-classification-losses)
3. [Probabilistic / Likelihood-based Losses](#-probabilistic--likelihood-based-losses)
4. [References & Further Reading](#-references--further-reading)


## üìê Regression Losses

Used when the output is continuous.

| Loss Name         | Notation             | Notes                                     |
|------------------|----------------------|-------------------------------------------|
| Mean Squared Error (MSE) | \( \frac{1}{n} \sum_i (y_i - \hat{y}_i)^2 \) | Sensitive to outliers                     |
| Mean Absolute Error (MAE) | \( \frac{1}{n} \sum_i |y_i - \hat{y}_i| \) | Robust to outliers                        |
| Huber Loss        | Piecewise: quadratic & linear | Combines MSE and MAE benefits            |
| Log-Cosh Loss     | \( \sum_i \log(\cosh(y_i - \hat{y}_i)) \) | Smooth and less sensitive to outliers     |
| Quantile Loss     | Asymmetric penalties | Used in quantile regression               |




## üßæ Classification Losses

Used when the output is a discrete label or probability distribution over classes.

| Loss Name               | Formula / Intuition                       | Notes                                     |
|------------------------|--------------------------------------------|-------------------------------------------|
| Binary Cross Entropy    | \( -\sum y \log(\hat{y}) + (1-y)\log(1-\hat{y}) \) | For binary classification                 |
| Categorical Cross Entropy | \( -\sum y_i \log(\hat{y}_i) \)         | For multiclass classification             |
| Hinge Loss              | \( \max(0, 1 - y \cdot \hat{y}) \)        | Used in SVMs                              |
| Focal Loss              | Weighted cross-entropy with a modulating factor | Useful for class imbalance                |
| Label Smoothing        | Smoothed targets in cross-entropy         | Improves generalization                   |





## üé≤ Probabilistic / Likelihood-based Losses

Used in generative models, variational inference, and Bayesian methods.

| Loss Name               | Description                              | Example Use Cases                         |
|------------------------|------------------------------------------|-------------------------------------------|
| Negative Log-Likelihood (NLL) | \( -\log p(y | \hat{y}) \)          | General-purpose likelihood loss           |
| KL Divergence           | \( \sum p(x) \log \frac{p(x)}{q(x)} \)  | Used in VAEs, distribution matching       |
| Jensen-Shannon Divergence | Symmetrized KL                         | Stable variant, used in GANs              |
| Evidence Lower Bound (ELBO) | Reconstruction - KL term            | Key in variational inference              |
| Wasserstein Distance     | Measures distribution difference        | GAN training, optimal transport           |

---