2022-03-24 Ludovico Massaccesi

# Robust estimators
In practice we have to evaluate the variance of estimators on a distribution that we do not know completely: obviously we can not do this from theory.
We might not even know what the space of possibilities is!

_How can we evaluate the uncertainty of a location estimator, if we do not know the real distribution of the data?_

## Bootstrap method
_Take some data, and pretend the real distribution is exactly like that._

This seems self-referential, but it can actually be proven to work (with theorems, not just with practical results).
It is a very safe technique to use, in particular, for extracting the variance of estimators (even complicated estimators, not just location parameter estimators).

 - We take some samples $x_i,i=1,\dots,N$ from real data.
 - We take the discrete distribution $p^*(x)=\frac{1}{N}\sum_i\delta(x-x_i)$.
 - Whenever we need the real distribution $p$, we use $p^*$ instead as an approximation;
    - e.g. we run our Monte Carlo, we extract data from $p^*$;
    - this can be done using the _resampling with repetition_ method, i.e. we extract an integer $j$ in $[1,N]$ and take the sample $x_j$, which is very fast and easy to implement;
    - allowing the repetition is extremely important, otherwise we would have always the same as the original sample, which would make our simulation pointless;
    - the different number of copies of each event in each repetition is what generates the variability we need to evaluate our estimator's variance;
    - this can also be done by drawing the number of copies of each $x_i$ from a Poisson distribution with mean $S/N$, where $S$ is the number of samples we want to draw (but now we will have $S$ examples _in average_, but the actual number of samples will fluctuate, which we may want in some cases, but not in others).
    
We can, of course, also bootstrap from an histogram.

## Bootstrap uncertainty
_What is the error that I make by using the bootstrap approximation $p^*$ instead of $p$?_

This would be a systematic uncertainty on my estimate of the variance of my estimator.

The solution, surprisingly, is the _double bootstrap_, i.e. bootstrapping from bootstrapped samples.
This might seem like a recusrive thing that may diverge, but it can actually be proven that the double bootstrap improves our estimate!

 - We take a data sample $x_i$.
 - We take $M$ samples $y_{i,j}$ by bootstrapping $x_i$.
 - We compute the variance from every $y_j$ set.
 - We compute the variance of the variance from the set of $M$ variances we obtained.

# Exercise
 - Generate a sample from one of the distributions from last time.
 - Choose a location estimator and calculate its distribution.
 - Estimate its variance with the bootstrap method.
 - Compare the previous results with the one obtained with the double-bootstrap method.

Repeat for a couple of distributions/estimators at most.