# Excess Risk Decomposition

## Notes to self: Excess risk theory reminder

Ideally we'd like to find the Bayes decision function that minimises risk
$$
f^* = \arg \min_f \mathbb E(l(x | y))
$$

However to keep optimisation feasible (and add some regularisation) we only look for functions within an hypothesis space $\mathbb F$.
$$
f^\mathbb F = \arg \min_{f \in \mathbb F} \mathbb E (l(x | y))
$$

The difference between these is the **approximation error**
$$
R(f^\mathbb F) - R(f^*)
$$

However in general we can't compute $f^\mathbb F$, as we only have a limited number of data. As such we can only compute the **empirical risk minimiser**
$$
\hat{f_n} = \arg \min \frac{1}{n} \sum_{i=1}^n l\left(f(x_i), y_i\right)
$$

This introduces the **estimation error**.
$$
R(\hat{f_n}) - R(f^\mathbb F)
$$

However, because when trying to find $\hat{f_n}$, the optimisation algorithm doesn't give the perfect result, but it yields a result $\tilde{f_n}$. Thus we also get an **optimisation error**
$$
R(\tilde{f_n}) - R(\hat{f_n})
$$

The **Excess Risk** is the difference between the final risk, and the risk of the Bayes decision function
$$
R(\tilde{f_n}) - R(f^*) \\
= \underbrace{R(f^\mathbb F) - R(f^*)}_{\text{Approximation Error}} 
  + \underbrace{R(\hat{f_n}) - R(f^\mathbb F)}_{\text{Estimation Error}}
  + \underbrace{R(\tilde{f_n}) - R(\hat{f_n})}_{\text{Optimisation Error}}
$$

### Some notes

* The approximation error and the estimation error are always positive.
* The optimisation error could be negative
* If the approximation error and estimation error dominate the optimisation error (which seems to be quite often the case),
  there is no point in using advanced optimisation methods to minimise the optimisation error,
  as it will have a negligable effect on the total excess risk.