THE BIAS-VARIANCE TRADEOFF IN RISK ESTIMATION

author: [@sparshsah](https://github.com/sparshsah)

# Setting

Suppose you have a series of observations $(r_1,\, r_2,\, \dots,\, r_{T})$ where each observation is i.i.d. Normal with ground-truth mean $\mu$ and ground-truth variance $\sigma^2$.

# Estimators

Consider three estimators for $\sigma^2$. We're implicitly going to consider Mean Squared Error (MSE) as our loss function when evaluating them, but MSE isn't necessarily the best one. Which loss function is most appropriate can depend on the setting and application. For example, maybe in your particular use case, underestimating $\sigma^2$ is more dangerous than overestimating.


## Standard Bessel-Corrected Demeaned Sample Variance Estimator

Define
$$s^2_A := \frac{1}{T-1} \sum(r_t - \bar{r})^2.$$

This will be distributed as
$$\frac{1}{T-1}\sigma^2\chi^2_{T-1}.$$

Its bias is $0$, so its squared bias is also $0$.

Its squared standard error is $\frac{1}{(T-1)^2}\sigma^4 2(T-1) = 2\frac{1}{T-1}\sigma^4$.

The sum of its squared bias plus squared standard error is
$$2\frac{1}{T-1}\sigma^4.$$

This has one undesirable property in the case I mentioned before: If underestimating $\sigma^2$ is more dangerous than overestimating. This estimator will grossly underestimate, for instance, if all the $r$'s just randomly happen to come out to the same number.


## Overriden Zero-Meaned Sample Variance Estimator

Define
$$s^2_B := \frac{1}{T} \sum r_t^2.$$

This will be distributed as
$$\mu^2 + \frac{1}{T}\sigma^2\chi^2_{T}.$$

Its bias is $\mu^2$, so its squared bias is $\mu^4$.

Its squared standard error is $\frac{1}{T^2}\sigma^4 2T = 2\frac{1}{T}\sigma^4$.

The sum of its squared bias plus squared standard error is
$$\mu^4 + 2\frac{1}{T}\sigma^4.$$


## Minimum-MSE Sample Variance Estimator

Define
$$s^2_C := \frac{1}{T+1} \sum(r_t - \bar{r})^2.$$

This is the best you can do in terms of MSE [[cf](https://web.archive.org/web/20210522072302/https://en.wikipedia.org/wiki/Mean_squared_error#Variance)], but I'm not sure what its distribution is.

# MSE comparison

Let's compare $s^2_B$ vs $s^2_A$. When will the overriden estimator's sum of squared bias plus squared standard error be better (i.e. smaller) than the standard's?

Well, when
$$\mu^4 + 2\frac{1}{T}\sigma^4 < 2\frac{1}{T-1}\sigma^4$$
$$\mu^4 < 2\left(\frac{1}{T-1} - \frac{1}{T}\right)\sigma^4$$
$$\mu^4 < 2\frac{1}{(T-1)T}\sigma^4$$
$$\mu < \sqrt[4]{2\frac{1}{(T-1)T}}\sigma$$
$$\frac{\mu}{\sigma} < \sqrt[4]{2\frac{1}{(T-1)T}}.$$


## Upshot

For example, if $T = 65$ days, then $s^2_B$ will have a lower sum-of-squared-bias-plus-squared-standard-error (i.e. lower MSE) than $s^2_A$ as long as the ratio of $\mu$ to $\sigma$ (each for a single observation, i.e. a single day) is less than $\approx 0.148$ (i.e. a daily Sharpe less than $\approx 0.148$). In other words: With a single business quarter of daily-returns data, the zero-meaned estimator would be better (from an MSE perspective) than the demeaned estimator as long as the asset's ground-truth business-annualized Sharpe was less than $\approx 261^{0.5} \cdot 0.148 = 2.39$. A business quarter is a reasonable and popular estimation horizon (to deal with the fact that market data-generating processes are highly non-stationary), and most assets' ground-truth annualized Sharpes are much less than $2.39$, so this is a pretty common scenario.

_On the other hand_: Even if you had a full business century ($T = 100 \cdot 261 = 26,100$ days) of daily-returns data, the zero-meaned estimator would _still_ be better as long as the asset's ground-truth daily Sharpe was less than $\approx 0.0074$, which annualizes to $\approx 261^{0.5} \cdot 0.0074 \approx 0.12$ -- A surprisingly high figure in my eyes. Put another way: There are commodities out there whose ground-truth annualized Sharpes are widely assumed to be around $0.10$. This result says that even if you had high-quality daily returns data going back to 1922, you should _still_ use the zero-meaned variance estimator if you want to get lower expected squared estimation error.


## An interesting observation: Asymptotic frequency-independence

I'm going to do a blind-plug-and-chug exercise here.

Take $T = 26,100$. We get crossover at a $\mu$-to-$\sigma$ ratio of $0.0074$. Now take $T = 100$. We get crossover at a $\mu$-to-$\sigma$ ratio of $0.12$.

Ok. Let's interpret each observation in the first case as being a single day's return. That means our daily-mean-to-daily-volatility ratio, i.e. our daily Sharpe ratio, must be less than $0.0074$. Annualizing, we get an annualized Sharpe ratio of $261^{0.5} * 0.0074 = 0.12$.

Now let's interpret each observation in the second case as being a 

# Simulations

In [1]:
import sys
# https://github.com/sparshsah/foggy-lib/tree/main/util
sys.path.append("../../../foggy-lib/util/")
del sys

import pandas as pd
import numpy as np
# https://github.com/sparshsah/foggy-lib/blob/main/util/foggy_pylib/core.py
import foggy_pylib.core as fc
# https://github.com/sparshsah/foggy-lib/blob/main/util/foggy_pylib/fin.py
import foggy_pylib.fin as ff

## How does the crossover point decay in sample size?