# Bias-Variance tradeoff

In statistics and machine learning, the **bias–variance tradeoff** is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. The **bias–variance dilemma** or **bias–variance problem** is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set:

1. The *bias error* is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
2. The *variance* is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

$y = f(x) + \epsilon$

$f = f(x) \text{ is deterministic, thus } \mathbb{E}[f] = f$

$\mathbb{E}[\epsilon] = 0 \implies \text{Var}[\epsilon]=\mathbb{E}[\epsilon^2] \text{ because } \text{Var}[\epsilon]=\mathbb{E}[\epsilon^2]-(\mathbb{E}[\epsilon])^2$

$\hat{f} = \hat{f}(x) \text{ is the } f \text{ estimator}$

$\text{MSE} = \mathbb{E}[(y - \hat{f})^2]$

$\text{Var}[Y] = \mathbb{E}[Y^2] - (\mathbb{E}[Y])^2$

$
\begin{aligned}
 \text{MSE} = \mathbb{E}[(y - \hat{f})^2]  &= \mathbb{E}[y^2 + \hat{f}^2 - 2y\hat{f}] \\
                              &= \mathbb{E}[(f + \epsilon)^2 + \hat{f}^2 - 2(f + \epsilon)\hat{f}] \\
                              &= \mathbb{E}[f^2 + \epsilon^2 + 2f\epsilon + \hat{f}^2 - 2f\hat{f} - \epsilon\hat{f}] \\
                              &= \mathbb{E}[f^2] + \mathbb{E}[\epsilon^2] + \mathbb{E}[2f\epsilon] + \mathbb{E}[\hat{f}^2] - \mathbb{E}[2f\hat{f}] - \mathbb{E}[\epsilon\hat{f}] \\
                              &= \{\mathbb{E}[\epsilon] = 0\} \\
                              &= \mathbb{E}[f^2] + \text{Var}[\epsilon] + \mathbb{E}[\hat{f}^2] - \mathbb{E}[2f\hat{f}] \\
                              &= \mathbb{E}[f^2] + \text{Var}[\epsilon] + \text{Var}[\hat{f}] + (\mathbb{E}[\hat{f}])^2 - \mathbb{E}[2f\hat{f}] \\
                              &= \mathbb{E}[f^2] - \mathbb{E}[2f\hat{f}] + (\mathbb{E}[\hat{f}])^2 + \text{Var}[\hat{f}] + \text{Var}[\epsilon]\\
                              &= (\mathbb{E}[f - \hat{f}])^2 + \text{Var}[\hat{f}] + \text{Var}[\epsilon]\\
                              &= (\text{Bias}[\hat{f}])^2 + \text{Var}[\hat{f}] + \text{Var}[\epsilon] \\
                              &= \text{Bias}^2 + \text{Variance} + \text{Irriducible Error}
\end{aligned}
$