#### LICENSE
These notes are released under the 
"Creative Commons Attribution-ShareAlike 4.0 International" license. 
See the **human-readable version** [here](https://creativecommons.org/licenses/by-sa/4.0/)
and the **real thing** [here](https://creativecommons.org/licenses/by-sa/4.0/legalcode). 

#### INSTALLATION instructions

To use this noteboook you may need to install a few packages in `R`:
```
install.packages(c('rmutil', 'robustbase', 'RobStatTM'))
```

# M-estimators are generally very close to being optimal

Although M-estimators are generally not optimal, they are often "quite good", 
and rarely "any bad". Here we illustrate this with simple numerical experiments.

To fix ideas, we will focus here on the simplest location/scale model, where
our observations $X_i$ are assumed to satisfy
$$
X_i = \mu_0 + \sigma_0 \, \varepsilon \, , \quad 1 \le i \le n \, ,
$$
where $\varepsilon \sim F_0$ a fixed distribution, symmetric
around zero. We are interested
in estimating the "true" center parameter $\mu_0$. 

As long as the expected value of $F_0$ exists, then the sample
mean $\bar{X}_n$ will be an unbiased estimator for $\mu_0$, and 
it often is the default estimator of choice. However, 
note that the good properties of the sample mean as
an estimator for the population mean only hold under
strict distributional assumptions. Even for symmetric
errors, the sample mean may be highly inefficient 
(high variance). 
For example, if the errors have heavier tails than 
gaussian (double exponential, say), then the sample
mean can perform significantly worse than the MLE
(which, for the Laplace / double exponential case,
is the sample median). 
However,  the data **are** Gaussian, then the sample
median is notably inefficient (highly variable), leading
to much less informative inference results. 

In practice, however, one rarely knows the actual
distribution $F_0$, or how heavy its tails may be. So
the choice of which estimator to use is a bit of a
gamble. 

Alternatively, robust estimators try to find estimation methods that
perform well in a variety of situations. They will typically
not be optimal, but will generally be 
good enough, in the sense of constantly being "very close second best". 

The code below exemplifies simple Monte Carlo experiments
comparing the efficiency (in terms of mean squared 
errors (MSE)) of natural estimators in each setting:
the sample mean, the sample median, the MLE under each 
model, plus two M-estimators: 
the estimator labelled `monotoneM` is the M-estimator with
Huber's non-decreasing `psi` function `pmax(pmin(t, k), -k)`, while
the `redecsM` one uses a `psi` function in the Tukey bisquare
family, that is zero for large 
residuals. 

In all the experiments below we use 10,000 samples of size $n = 50$, 
and use the computed estimators to estimate their  
MSEs. 

#### Laplace errors

Generate 10,000 samples, and put each of them in a row in the
matrix `x`:

In [None]:
suppressPackageStartupMessages(library(rmutil))
n <- 50
M <- 10000
set.seed(123)
x <- matrix(rlaplace(n*M), M, n)

Now compute the 10,000 sample means, sample medians, and the two M-estimators

In [None]:
mus <- rowMeans(x)
meds <- apply(x, 1, median)
monm <- apply(x, 1, function(a) robustbase::huberM(a, k = 1.345)$mu )
redm <- apply(x, 1, function(a) RobStatTM::locScaleM(a, psi='bisquare')$mu )

Now compute the MSE for each estimator. Note that in these experiments $\mu_0 = 0$. 

In [None]:
MSE.means <- mean( mus^2 )
MSE.medians <- mean( meds^2 )
MSE.monotoneM <- mean( monm^2 )
MSE.redescM <- mean( redm^2 )

Show the MSEs in an organized way:

In [None]:
round(rbind(MSE.means = MSE.means, MSE.medians= MSE.medians,
  MSE.monotoneM = MSE.monotoneM, MSE.redescM = MSE.redescM ), 4)

Show the relative loss (in terms of larger MSEs) of each estimator relative to the optimal estimator for this model (median):

In [None]:
round(rbind(eff.means = MSE.medians / MSE.means, 
            eff.medians = MSE.medians / MSE.medians, 
            eff.monotoneM = MSE.medians / MSE.monotoneM, 
            eff.redescM = MSE.medians / MSE.redescM), 4)

Note that the M-estimators do much better than the sample mean, 
and fairly close to the optimal MLE. Note that the tuning constants of
these M-estimators have been chosen to give a high efficiency at the 
Gaussian distribution, so to be "close" to the sample mean when that 
is appropriate. The efficiencies above for Laplace distributions could be
improved at the price of a slight loss in efficiency at the Gaussian 
model. For example, you are encouraged to
repeat these experiments using $k = 2$ or $k = 2.5$ 
in `robustbase::huberM`. 

#### Gaussian errors (M- vs MLE(mean))

It is easy to see then when the sample mean is the optimal
estimator (for example, when the errors are Gaussian), the
M-estimators again behave very similarly to the optimal one.

In [None]:
set.seed(123)
x <- matrix(rnorm(n*M), M, n)
mus <- rowMeans(x)
meds <- apply(x, 1, median)
# M-estimators
monm <- apply(x, 1, function(a) robustbase::huberM(a, k = 1.345)$mu )
redm <- apply(x, 1, function(a) RobStatTM::locScaleM(a, psi='bisquare')$mu )
MSE.means <- mean( mus^2 )
MSE.medians <- mean( meds^2 )
MSE.monotoneM <- mean( monm^2 )
MSE.redescM <- mean( redm^2 )

In [None]:
round(rbind(MSE.means = MSE.means, MSE.medians= MSE.medians,
  MSE.monotoneM = MSE.monotoneM, MSE.redescM = MSE.redescM ), 4)

In [None]:
round(rbind(eff.means = MSE.means / MSE.means, 
            eff.medians = MSE.means / MSE.medians, 
            eff.monotoneM = MSE.means / MSE.monotoneM, 
            eff.redescM = MSE.means / MSE.redescM), 4)

#### T4 errors (M- vs MLE)

We repeat the experiment with Student's T errors (df = 4),
and include the MLE estimator. The conclusion is the same as
above.

In [None]:
options(warn = -1) # remove all warning messages
set.seed(123)
x <- matrix(rt(n*M, df=4), M, n)
mus <- rowMeans(x)
meds <- apply(x, 1, median)
mles <- apply(x, 1, function(a) MASS::fitdistr(a, 't', df=4)$estimate[1] )
# M-estimators
monm <- apply(x, 1, function(a) robustbase::huberM(a, k = 1.345)$mu )
redm <- apply(x, 1, function(a) RobStatTM::locScaleM(a, psi='bisquare')$mu )
MSE.means <- mean( mus^2 )
MSE.medians <- mean( meds^2 )
MSE.monotoneM <- mean( monm^2 )
MSE.redescM <- mean( redm^2 )
MSE.mles <- mean(mles^2)

In [None]:
round(rbind(MSE.means = MSE.means, MSE.medians= MSE.medians,
            MSE.mles = MSE.mles, MSE.monotoneM = MSE.monotoneM, 
            MSE.redescM = MSE.redescM ), 4)

In [None]:
round(rbind(eff.means = MSE.mles / MSE.means, 
            eff.medians = MSE.mles / MSE.medians, 
            eff.mles = MSE.mles / MSE.mles,
            eff.monotoneM = MSE.mles / MSE.monotoneM, 
            eff.redescM = MSE.mles / MSE.redescM), 4)

#### Gross error oultiers ("point contamination") (M- vs all)

Finally, if we use a "gross error"-type departure from a T-4 model, 
here the median and the redescending M-estimator are better than the 
MLE (the redescending M wins by a considerable marging).

In [None]:
options(warn = -1) # remove all warning messages
generate <- function(n, epsilon, x0) {
  tmp <- rbinom(n, size=1, prob=epsilon)
  x <- rt(n, df=4)
  x[ tmp == 1 ] <- rnorm(sum(tmp), mean=x0, sd=1)
  return(x)
}
x <- matrix(NA, M, n)
set.seed(123)
for(i in 1:M) x[i,] <- generate(n=n, x0=8, epsilon=.1) 
mus <- rowMeans(x)
meds <- apply(x, 1, median)
mles <- apply(x, 1, function(a) MASS::fitdistr(a, 't', df=4)$estimate[1] )
# M-estimators
monm <- apply(x, 1, function(a) robustbase::huberM(a, k = 1.345)$mu )
redm <- apply(x, 1, function(a) RobStatTM::locScaleM(a, psi='bisquare')$mu )
MSE.means <- mean( mus^2 )
MSE.medians <- mean( meds^2 )
MSE.monotoneM <- mean( monm^2 )
MSE.redescM <- mean( redm^2 ) 
MSE.mles <- mean(mles^2)

In [None]:
round(rbind(MSE.means = MSE.means, MSE.medians= MSE.medians,
            MSE.mles = MSE.mles, MSE.monotoneM = MSE.monotoneM, 
            MSE.redescM = MSE.redescM ), 4)

In [None]:
round(rbind(acc.means = MSE.mles / MSE.means, 
            acc.medians = MSE.mles / MSE.medians, 
            acc.mles = MSE.mles / MSE.mles,
            acc.monotoneM = MSE.mles / MSE.monotoneM, 
            acc.redescM = MSE.mles / MSE.redescM), 4)