# The Bias-Variance Tradeoff in Estimating Risk

Disclaimer: I'm going to be a bit hand-wavey and maybe even wrong with notation and calculations in this one, because it's a work-in-progress.

Suppose you have a series of observations $(r_1,\, r_2,\, \dots,\, r_{T})$ where each observation is i.i.d. Normal with ground-truth mean $\mu$ and ground-truth variance $\sigma^2$.


## Estimators

Consider three estimators for $\sigma^2$. We're implicitly going to consider Mean Squared Error (MSE) as our loss function when evaluating them, but MSE isn't necessarily the best one. Which loss function is most appropriate can depend on the setting and application. For example, maybe in your particular use case, underestimating $\sigma^2$ is more dangerous than overestimating.


### Standard Bessel-Corrected Sample Variance Estimator

Define
$$s^2_A := \frac{1}{T-1} \sum(r_t - \bar{r})^2.$$

This will be distributed as
$$\frac{1}{T-1}\sigma^2\chi^2_{T-1}.$$

Its bias is $0$, so its squared bias is also $0$.

Its squared standard error is $\frac{1}{(T-1)^2}\sigma^4 2(T-1) = 2\frac{1}{T-1}\sigma^4$.

The sum of its squared bias plus squared standard error is
$$2\frac{1}{T-1}\sigma^4.$$

This has one undesirable property in the case I mentioned before: If underestimating $\sigma^2$ is more dangerous than overestimating. This estimator will grossly underestimate, for instance, if all the $r$'s just randomly happen to come out to the same number.


### Overriden Zero-Mean Sample Variance Estimator

Define
$$s^2_B := \frac{1}{T} \sum r_t^2.$$

This will be distributed as
$$\mu^2 + \frac{1}{T}\sigma^2\chi^2_{T}.$$

Its bias is $\mu^2$, so its squared bias is $\mu^4$.

Its squared standard error is $\frac{1}{T^2}\sigma^4 2T = 2\frac{1}{T}\sigma^4$.

The sum of its squared bias plus squared standard error is
$$\mu^4 + 2\frac{1}{T}\sigma^4.$$


### Minimum-MSE Sample Variance Estimator

Define
$$s^2_C := \frac{1}{T+1} \sum(r_t - \bar{r})^2.$$

This is the best you can do in terms of MSE [[cf](https://web.archive.org/web/20210522072302/https://en.wikipedia.org/wiki/Mean_squared_error#Variance)], but I'm not sure what its distribution is.


## Upshot

Let's compare $s^2_B$ vs $s^2_A$. When will the overriden estimator's sum of squared bias plus squared standard error be better (i.e. smaller) than the standard's?

Well, when
$$\mu^4 + 2\frac{1}{T}\sigma^4 < 2\frac{1}{T-1}\sigma^4$$
$$\mu^4 < 2\left(\frac{1}{T-1} - \frac{1}{T}\right)\sigma^4$$
$$\mu^4 < 2\frac{1}{(T-1)T}\sigma^4$$
$$\mu < \sqrt[4]{2\frac{1}{(T-1)T}}\sigma$$
$$\frac{\mu}{\sigma} < \sqrt[4]{2\frac{1}{(T-1)T}}.$$

So for example if $T = 25$, then $s^2_B$ will have a lower sum-of-squared-bias-plus-squared-standard-error (TODO: is this equivalent to saying "lower MSE"?) than $s^2_A$ as long as the ratio of $\mu$ to $\sigma$ is less than $\approx 0.24$. Keep in mind that both $\mu$ and $\sigma$ here are for a single observation.