# Intro to M-estimators

#### LICENSE
These notes are released under the 
"Creative Commons Attribution-ShareAlike 4.0 International" license. 
See the **human-readable version** [here](https://creativecommons.org/licenses/by-sa/4.0/)
and the **real thing** [here](https://creativecommons.org/licenses/by-sa/4.0/legalcode). 

#### INSTALLATION instructions

To use this noteboook you may need to install a few packages in `R`:
```
install.packages(c('rmutil', 'robustbase', 'RobStatTM'))
```

## Intro

In this notebook we will review simple location M-estimators, some of their 
robustness properties, and algorithms to compute them. 

We first start by loading a simple data set `robustbase::cushny`. Refer to 
`help(cushny, package='robustbase')` for information on the data. 

In [None]:
x <- robustbase::cushny

It is always a good idea to look at the data

In [None]:
boxplot(x, col='tomato3', cex=1.5, pch=19)

In [None]:
rbind(mean = mean(x), median = median(x))

We now compute an M-estimator, using a Huber loss, and without standardizing. We 
write our own code. 

In [None]:
huberPsi <- function(r, cc)
    return( pmin(pmax(-cc, r), cc) )

In [None]:
mest0 <- function(x, cc=1.345, init=median(x), max.it = 100, eps=1e-8) {
    m1 <- init
    m0 <- m1 + 10*eps
    it <- 0
    while( ((it <- it+1) < max.it ) & (abs(m1-m0) > eps ) ) {
        re <- (x - m1)
        w <- huberPsi(re, cc=cc)/re
        w[ is.na(w) ] <- 1
        m0 <- m1
        m1 <- sum( x*w ) / sum(w)
    }
    return(m1)
}

We compute the M-estimator

In [None]:
(mu0 <- mest0(x))

and verify that it is "between" the mean and the median. We can also check that it is correctly computed (sanity check): 

In [None]:
mean( huberPsi(x-mu0, cc=1.345)) # this should be essentially zero

## Lack of scale invariance, robustness

As we discussed in class, this estimator is not scale equivariant. For example, if we divide all the data by 100, and then multiply the resulting estimator by 100, we do not recover the original estimator. In fact, something much more "surprising" happens:

In [None]:
rbind(mean=c(mean(x), mean(x/100)*100),
      median=c(median(x), median(x/100)*100),
      Mest=c(mest0(x), mest0(x/100)*100))

The suppossedly robust M-estimator computed on the "scaled" data is identical to the sample mean! This is a serious problem, as the estimator is not robust any longer. As discussed in class, the problem is that the tuning parameter (the choice of loss function rho depends on the "size" of the data / residuals). 

We now add 2 outliers to illustrate that this non-scale-equivariant M-estimator really is not robust.

In [None]:
xc <- c(x, rnorm(2, mean=5.5, sd=.5))

We now compute the estimators again. Note that the performance of the M-estimator deteriorates (it appears to be affected by the outliers), but not as much as the sample mean.  

In [None]:
rbind(mean=c(mean(x), mean(xc)),
      median=c(median(x), median(xc)),
      Mest=c(mest0(x), mest0(xc)))

To again illustrate the problem of the relative magnitudes of the data and the tuning constant of the (hopefully) robust loss, we compute the estimators on "proportionally smaller" data, and then re-scale it back to the original units:

In [None]:
rbind(mean=c(mean(x), mean(xc), mean(xc/100)*100),
      median=c(median(x), median(xc), median(xc/100)*100),
      Mest=c(mest0(x), mest0(xc), mest0(xc/100)*100))

Now we can clearly see the deterioration of the M-estimator. It is just not working well. 

## Using scaled residuals helps in choosing the robust loss

The solution, as we discussed in more detail in class, is to use standardized residuals. The only difference between the "good" M-estimator computed with `mest` below and the previous one (`mest0`) is the inclusion of the robust scale estimator (`si <- mad(x)`), and its use in the computation of residuals (`re <- (x - m1) / si`):

In [None]:
mest <- function(x, cc=1.345, init=median(x), si = mad(x), max.it = 100, eps=1e-8) {
    m1 <- init
    m0 <- m1 + 10*eps
    it <- 0
    while( ((it <- it+1) < max.it ) & (abs(m1-m0) > eps ) ) {
        re <- (x - m1) / si
        w <- huberPsi(re, cc=cc)/re
        w[ is.na(w) ] <- 1
        m0 <- m1
        m1 <- sum( x*w ) / sum(w)
    }
    return(m1)
}

And now everything works fine!

In [None]:
rbind(mean=c(mean(x), mean(xc), mean(xc/100)*100),
      median=c(median(x), median(xc), median(xc/100)*100),
      Mest=c(mest(x), mest(xc), mest(xc/100)*100))

Sanity check again. First order conditions:

In [None]:
si <- mad(xc)
mu1 <- mest(xc)
mean( huberPsi((xc-mu1)/si, cc=1.345))

## M-estimators are robust, not immutable

Note, however, that the M-estimator is in fact, affected by the outliers. Fortunately, this effect is bounded, and will not get any worse even if the outliers were much more extreme. For example, if the outliers were placed at `+20` (instead of `5.5`)

In [None]:
xc2 <- c(x, rnorm(2, mean=20, sd=.5))

... then the M-estimator does not shift any further to the right, as opposed to what happens with the sample mean: 

In [None]:
rbind(mean=c(mean(x), mean(xc), mean(xc2)),
      median=c(median(x), median(xc), median(xc2)),
      Mest=c(mest(x), mest(xc), mest(xc2)))