#### LICENSE
These notes are released under the 
"Creative Commons Attribution-ShareAlike 4.0 International" license. 
See the **human-readable version** [here](https://creativecommons.org/licenses/by-sa/4.0/)
and the **real thing** [here](https://creativecommons.org/licenses/by-sa/4.0/legalcode). 

# Linear Regression 

In this section we briefly introduce robust M-estimators 
for linear regression models. We will start with 
so-called "monotone" M-estimators, which generalize
what we have done in the location/scale model. This class
of estimators includes the L1 (quantile regression) 
estimators. Unfortunately, if outliers are present among
the explanatory variables, these M-estimators may not
be robust. Furthermore, determining whether outliers are 
present in the data is often very difficult to do,
without using a robust estimator in the first place. 
We will illustrate these issues with a simple example
below. 

In general, robust M-estimators will need to be 
based on minimizing a bounded
(and thus non-convex) loss function, which 
creates obvious computational challenges, but also 
conceptual ones (computing a robust residual scale
estimator is not easy). 

## M-estimators

M-estimators for linear regression are a natural extension 
of the M-estimators used in the location/scale models. 
They can be motivated intuitively in a similar manner 
to that used for the location / scale model: start 
with a Gaussian MLE estimator
and truncate the loss / score function. Such a monotone score function
(corresponding to a convex loss function that grows 
at a slower rate than the squared loss) was first proposed by 
Huber (in the location model: [1964](https://doi.org/10.1214/aoms/1177703732), 
in a more general univariate setting [1967](https://projecteuclid.org/ euclid.bsmsp/1200512988), 
and for linear regression [1973](https://doi.org/10.1214/aos/1176342503)). 
Note that the family of M-estimators based on 
monotone (non-decreasing) score functions includes the L1 (quantile
regression) estimators. 

### Simple linear regression
We will use the `phosphor` data in 
package `robustbase` (for `R`). 
Details can be found using `help(phosphor, package='robustbase')`. 
The response variable is `plant` and, 
to simplify the example, we will use only one explanatory variable,
`organic`. Furthermore, in order to 
highlight the potential impact of outliers, we will 
change the position of the single outlier in these data (from the 
right end of the plot to the left):

In [None]:
library(robustbase)
data(phosphor)
phosphor[17, 'organic'] <- 15
plot(plant ~ organic, data=phosphor, pch=19, col='gray50')

We now fit the usual least squares estimator 
plus a robust one  
(an MM-estimator, which is a type of M-estimators that 
will be discussed below) and overlay them on the scatter plot:

In [None]:
MMfit <- lmrob(plant ~ organic, data=phosphor)
LSfit <- lm(plant ~ organic, data=phosphor)
plot(plant ~ organic, data=phosphor, pch=19, col='gray50')
abline(MMfit, lwd=4, col='hotpink')
abline(LSfit, lwd=4, col='steelblue3', lty=1)
legend('topright', lwd=3, lty=1, col=c('hotpink', 'steelblue3'), 
       legend=c('MM', 'LS'))

<!-- We look at the estimated regression coefficients: -->
<!-- ```{r coef} -->
<!-- cbind(MM=coef(MMfit), LS=coef(LSfit)) -->
<!-- ``` -->
We can easily check that if outliers were not present in the data, then the 
robust and the least squares estimator coincide. 
The green line in the next plot corresponds to the OLS estimator
computed without the outlier. Note that the robust fit 
is indistinguishable from the LS one on the clean data:

In [None]:
plot(plant ~ organic, data=phosphor, pch=19, col='gray50')
abline(MMfit, lwd=3, col='hotpink')
abline(LSfit, lwd=3, col='steelblue3')
LSclean <- lm(plant ~ organic, data=phosphor, subset=-17)
abline(LSclean, lwd=3, col='green3')
legend('topright', lwd=3, lty=1, 
       col=c('hotpink', 'steelblue3', 'green3'), 
       legend=c('MM', 'LS', 'LS(clean)'))

Just for completeness, we can also look at the estimated 
regression coefficients:

In [None]:
cbind(MM=coef(MMfit), LS=coef(LSfit), LSclean=coef(LSclean))

### Fixed designs - Monotone M-estimators

When the explanatory variables are "fixed" (in the sense 
of being "controlled", as in a designed experiment, or 
because they are intrinsically or naturally bounded, 
for example) then 
M-estimators with a monotone and bounded score function
(derivative of the loss function)
have good robustness properties. 
For example, in this case, 
quantile regression
(L1) estimators are robust (but not efficient). 
M-estimators computed with a Huber score function
will also have high-breakdown point, and 
the loss function can be tuned to result in
estimators that are also highly efficient.

As we discussed in class, to compute these more efficient M-estimators
estimators we need a robust 
residual scale estimator with which to standardize 
the residuals in the estimating equations.  
Similarly to what we did in the location model 
(where we used a robust scale estimator of 
the residuals computed from the median), 
here we can use a robust scale estimator of
the residuals with respect to the L1 regression
estimator, which does not require standardized
residuals to be computed. 
Hence, an effective strategy to obtain high-breakdown point and
high-efficiency estimators in this case is as follows:

1. Compute the L1 regression estimator;
2. Compute `s_n`, a robust scale estimator of the corresponding residuals;
3. Use `s_n` to compute an M-estimator of regression (e.g. using Huber's loss function).

Note that since monotone score functions correspond to 
convex loss functions, the third step in the algorithm above
is computationally relatively simple. 

We will later see that better robustness properties 
are obtained if the loss function is bounded. In this case
the score function (the derivative of the loss) is zero
for larger residuals, and thus this type of score
functions are called "re-descending". 

So, we will modify the strategy above to compute an M-estimator
with a bounded loss in step (3). Since this now implies 
optimizing a non-convex function, the computational complexity
can be prohibitive. However, as we will discuss in class, 
computing a "local minimum" (starting from a "good"
initial point) yields an estimator with very good 
robustness and efficiency properties,
and one that it is very simple to compute. The 
corresponding algorithm is:

1. Compute the L1 regression estimator;
2. Compute `s_n`, a robust scale estimator of the corresponding residuals;
3. Use `s_n` and the L1 regression estimator to start the minimizing iterations 
of an M-estimator with a bounded loss function. 

This estimator is implemented in the function `lmrobM` of package `RobStatTM`.
As an example, consider the `phosphor` data in package `robustbase`.

In [None]:
library(RobStatTM)
data(phosphor)
myc <- lmrobM.control(family='bisquare', efficiency=.95)
ph.M <- lmrobM(plant ~ inorg, data=phosphor, control=myc)
plot(plant ~ inorg, data=phosphor, pch=19, cex=1.2, col='gray50')
abline(ph.M, lwd=4, col='tomato3')
ph.rq <- quantreg::rq(plant ~ inorg, data=phosphor)
abline(ph.rq, lwd=4, col='steelblue')
legend('topleft', legend=c('lmrobM fit', 'L1'), 
       lwd=4, col=c('tomato3', 'steelblue'))

### Outliers in the explanatory variables

If outliers may be present in the explanatory variables 
(which precludes "designed" or "controlled experiments" 
situations), then monotone-M estimators (i.e. M-estimators with 
a monotone (non-decreasing) $\psi$ function, such as Huber's 
M-estimators, or L1 (quantile) regression estimators) may 
not be robust. 

As an example, consider the `alcohol` data set (available 
from package `RobStaTM`). Although several explanatory variables
are available, to fix ideas here we focus on a simpler linear
regression model with a single covariate `SAG`. 

We first compare the L1 (quantile regression) estimator and
the M-estimator (initialized with the L1 estimator, as described
above).

In [None]:
data(alcohol)
a <- quantreg::rq(logSolubility ~ SAG, data=alcohol)
b <- lmrobM(logSolubility ~ SAG, data=alcohol, control=myc)
plot(logSolubility ~ SAG, data=alcohol, pch=19, cex=1.2, col='gray50')
abline(a, lwd=4, col='steelblue3')
abline(b, col='tomato3', lwd=4)
legend('topright', legend=c('lmrobM fit', 'L1'), lwd=4, col=c('tomato3', 'steelblue3'))

Note that both estimators are very close to each other, and capture
the linear relationship between the variables. To illustrate the point
we are trying to make here, we will move the 3 right-most observations 
in the data further to the right, and also shift them up:

In [None]:
alcohol$logSolubility[c(38, 43:44)] <- 
  alcohol$logSolubility[c(38, 43:44)] + 15 
alcohol$SAG[c(38, 43:44)] <- 
  alcohol$SAG[c(38, 43:44)] + 300

We now compute again the L1 and monotone M estimators:

In [None]:
a <- quantreg::rq(logSolubility ~ SAG, data=alcohol)
b <- lmrobM(logSolubility ~ SAG, data=alcohol, control=myc)
plot(logSolubility ~ SAG, data=alcohol, pch=19, cex=1.2, col='gray50')
abline(a, col='steelblue3', lwd=4)
abline(b, col='tomato3', lwd=4)
legend('topleft', legend=c('lmrobM fit', 'L1'), lwd=4, col=c('tomato3', 'steelblue3'))

Note that neither of these estimators is now able to identify the 
(linear) relationship that holds for the majority of the data: 
`r (nrow(alcohol) - 3)`/`r nrow(alcohol)` = `r round((nrow(alcohol) -
3)/nrow(alcohol)*100, 2)`%. 
The proportion of "bad" data is only `r 3`/`r nrow(alcohol)`, or 
`r round(3/nrow(alcohol)*100, 2)`%.

However, a properly initialized redescending M-estimator has no 
problem at all:

In [None]:
d <- lmrob(logSolubility ~ SAG, data=alcohol)
plot(logSolubility ~ SAG, data=alcohol, pch=19, cex=1.2, col='gray50')
abline(a, col='steelblue3', lwd=4)
abline(b, col='tomato3', lwd=4)
abline(d, col='hotpink', lwd=4)
legend('topleft', legend=c('lmrobM fit', 'L1', 'Redesc'), lwd=4, lty=1, 
       col=c('tomato3', 'steelblue3', 'hotpink' ))

Note that this last estimator also works very well when there 
are no atypical observations in the data. 


### A synthetic toy example (diagnostics and estimation)

This example will illustrate that:

- outliers can be severely damaging without being "obviously" apparent;
- quantile regression estimators (L1) offer limited protection against atypical observations; and
- classical diagnostic tools may not work as advertised.

Our example contains $n = 200$ observations with $p = 6$
explanatory variables. The regression model is $Y = 
V1 + 2*V2 + V3 + V4 + V5 + \varepsilon$, where 
$\varepsilon$ follows a $N(0, 1.7)$ distribution. 
Hence, the true vector of regression 
coefficients is `(1, 2, 1, 1, 1, 0)` and the true intercept is 
zero. The explanatory variables are all independent standard
normal random variables. I used the following code 
to generate the data

In [None]:
n <- 200
p <- 6
set.seed(123)
x0 <- as.data.frame(matrix(rnorm(n*p), n, p))
x0$y <- with(x0, V1 + 2*V2 + V3 + V4 + V5 + rnorm(n, sd=1.7))

We now replace the last 20 observations
with outliers (for a total proportion of atypical observations of 20/200 = 10%).

In [None]:
eps <- .1
n1 <- ceiling(n*(1-eps))
x0[n1:n, 1:p] <- matrix(rnorm((n-n1+1)*p, mean=+1.85, sd=.8))
x0$y[n1:n] <- rnorm(n-n1+1, mean=-7, sd=1.7)

These atypical observations cannot be seen easily in 
a pairwise plot, specially if one does not know 
in advance that they are present:

In [None]:
pairs(x0)

Standard diagnostic plots do not flag anything of 
importance either:

In [None]:
m0 <- lm(y~., data=x0)
par(mfrow=c(2,2))
plot(m0, which=c(1, 2, 5))
par(mfrow=c(1,1))

Note that all the Cook distances are below 0.15, 
for example. 
However, the estimated regression coefficients are
very different from the true ones
(1, 2, 1, 1, 1, 0)

In [None]:
cbind(Truth=c(0,1, 2, 1, 1, 1, 0), LS=coef(m0))

We now compare the LS regression 
estimator and the L1-estimator (which is a quantile
regression estimator).

In [None]:
m3 <- quantreg::rq(y~., data=x0)
cbind(Truth=c(0, 1, 2, 1, 1, 1, 0),
      LS=coef(m0), L1=coef(m3)) #, MM=coef(m1))

Note that the L1 estimator is similarly affected by these
outliers. 

Not surprisingly, the approach described above (of using 
the L1 estimator to compute residuals, and use their estimated
scale to compute an M-estimator), does not work well either
(the result is in fact an estimator very close to LS!):

In [None]:
m2 <- lmrobM(y ~ ., data=x0, control=myc)
cbind(Truth=c(0, 1, 2, 1, 1, 1, 0),
      LS=coef(m0), L1=coef(m3), L1M=coef(m2)) #, MM=coef(m1))

So, in this case, none of the strategies above 
provide reliable regression estimators. 
**And** we cannot even detect that outliers may
be present in the data. 

We will later discuss in class a strategy to compute
robust regression estimators that can deal with situations
like this. They are called MM estimators, and are
implemented in the function `lmrob` of the 
`robustbase` package. 
We compute it now and show that it indeed provides
a much better fit.

In [None]:
m1 <- lmrob(y~., data=x0, 
            control=lmrob.control(family='bisquare', efficiency=.95))
cbind(Truth=c(0, 1, 2, 1, 1, 1, 0),
      LS=coef(m0), L1=coef(m3), L1M=coef(m2), 
      MM=coef(m1))

Moreover, the estimator is
very close (again) to the LS estimator we would have
obtained had we known which points were atypical. 
In this sense, the robust estimator behaves like an
**"oracle estimator"**.

In [None]:
m0.cl <- lm(y~., data=x0, subset = -(n1:n) )
cbind(Truth=c(0, 1, 2, 1, 1, 1, 0),
      LS=coef(m0), L1=coef(m3), L1M=coef(m2), 
      MM=coef(m1), LSclean = coef(m0.cl))

Finally, we can also look at the diagnostic plots obtained with
the robust estimator, where the outliers are now
clearly visible.

In [None]:
par(mfrow=c(2,2))
plot(m1, which=c(1, 2, 4))
par(mfrow=c(1,1))

### Random features (explanatory variables)

When explanatory variables are observed (are part of the
random phenomenon being measured), outliers and other
atypical data points can be present. Observations that 
are outlying in the space of features are 
usually called high-leverage. When such points are 
present M-estimators for linear regression computed with a monotone 
score function (e.g. Huber's, or L1 [quantile regression]) 
may have a breakdown point as low as $1/p$, where $p$
is the number of features (for an illustration, see the first 
example in the [Lecture 1](Lecture1.md) notes). 
Some references include
[Maronna et al (1979)](https://doi.org/10.1007/BFb0098492) and 
[Maronna & Yohai (1991)](https://doi.org/10.2307/2290400).

A solution to this problem is to use a 
re-descending score function: a 
score function $\psi(t)$ that is zero for 
$|t| > c$ for some $c > 0$. This corresponds to 
a bounded loss function
$\rho$. Since bounded loss functions are necessarily non-convex,
the optimization problem that defines these estimators 
is computationally challenging. In particular, there may be 
several critical points (first-order conditions equal to zero) 
that do not correspond to the 
global minimum. However, 
[Yohai (1987)](https://doi.org/10.1214/aos/1176350366) showed 
that it is enough to find a local minimum 
starting from a consistent estimator. This is discussed below
in Sections "S-estimators" and "M-estimators with a preliminary scale".  

<!-- The corresponding regression  -->
<!-- estimators may have  -->
<!-- a very low breakdown point (as low as $$1/p$$, where $$p$$ -->
<!-- is the number of features) if high-leverage outliers  -->
<!-- (outliers among the explanatory variables) can be  -->
<!-- present (see, e.g. [Maronna et al, 1979](https://doi.org/10.1007/BFb0098492)).  -->


## The issue of scale

An often overlooked problem is that in order to use these estimators
in practice we need to estimate the scale (standard deviation, if
second moments exist) of the residuals (standardized residuals 
have to be used in the estimating equations). Naturally, this issue also 
afects M-estimators for location / scale, but for them it can 
be solved relatively easily by using the MAD of the observations, 
for example. Note that this robust residual scale estimator 
can be computed independently from the M-estimator. In regression models, 
however, where outliers may be present in the explanatory variables, 
there is no simple robust regression estimator that 
can be used to obtain reliable residuals, in order to compute
a preliminary residual scale estimator. In other words, to
compute a robust regression estimator we need a robust residual
scale estimator. But to compute 
a robust residual
scale estimator we need a robust regression estimator (in order
to obtain reliable residuals). S-estimators can 
break this impasse. 

## S-estimators

S-estimators are defined as the regression coefficients 
that result in residuals that minimize a (robust) 
estimator
of scale. In particular, we use M-estimators of scale, because
they are relatively easier to minimize in practice than
would be the case if used others like the MAD. 
These regression estimators can be tuned to have high-breakdown
point, but their efficiency is typically low. This is not
a concern, as the resulting residual scale estimator is used
to compute an M-estimator of regression that can be tuned to have
high-efficiency. 

### Computational challenges

S-estimators can be difficult to compute. They are defined as
the point at which a (typically) non-convex function attains its
minimum. The loss function that needs to be minimized is only
defined implicitly (as the solution to a non-linear equation). 
However, its gradient can be computed explicitly, and iterative
algorithms that decrease the objective function at each step
exist ([SB and Yohai (2006)](http://dx.doi.org/10.1198/106186006X113629)).
The main computational bottleneck is the need for a "good" 
starting point. Data-dependent random starts have been used 
for a long time. This approach is implemented in the function 
`lmrob` of the package `robustbase`. 

<!-- Here is a simple example, using the well-known  -->
<!-- stack loss data (see `help(stackloss)` for more information -->
<!-- on these data).  -->
<!-- Note that the main objective of `lmrob()` is to compute the -->
<!-- subsequent M-estimator,  -->
<!-- the S-estimator is included in one entry (`$init.S`) of  -->
<!-- the list returned by `lmrob()`.  -->
<!-- ```{r stackloss} -->
<!-- data(stackloss) -->
<!-- set.seed(123) -->
<!-- a <- lmrob(stack.loss ~ ., data=stackloss) -->
<!-- Sest <- a$init.S -->
<!-- coef(Sest) -->
<!-- ``` -->
<!-- We can look at the fitted vs. residuals plot, and easily -->
<!-- identify 4 potential outliers.  -->
<!-- ```{r stackloss2}  -->
<!-- plot(fitted(Sest), resid(Sest), pch=19, cex=1.1,  -->
<!--      xlab='Fitted values', ylab='Residuals') -->
<!-- abline(h=Sest$scale*2.5*c(-1, 0, 1), lty=2) -->
<!-- n <- length(resid(Sest)) -->
<!-- labels.id <- paste(1L:n) -->
<!-- iid <- 1:4 -->
<!-- show.r <- sort.list(abs(resid(Sest)), decreasing = TRUE)[iid] -->
<!-- text(fitted(Sest)[show.r]-1.5, resid(Sest)[show.r],  -->
<!--      show.r, cex = 1.1, xpd = TRUE, offset = 0.25) -->
<!-- ``` -->

## M-estimators with a preliminary scale

The function `lmrob` in package `robustbase` implements
M-estimators with a re-descending score (bounded loss) function,
computed using a preliminary residual scale estimator 
(an S-estimator as above). This implementation uses data-dependent
random starts for the S-estimator.

In [None]:
set.seed(123)
a <- lmrob(stack.loss ~ ., data=stackloss)
par(mfrow=c(2,2))
plot(a, which=c(1, 2, 4))
par(mfrow=c(1,1))

<!-- Note that the M-estimator identifies fewer outliers than -->
<!-- the S-estimator. This is because, by default, the  -->
<!-- M-estimator is tuned to have high-efficiency (95% if the -->
<!-- errors have a Gaussian distribution), and this induces -->
<!-- a relatively high asymptotic bias. If we reduce the -->
<!-- efficiency to 85%, then the M-estimator resembles -->
<!-- the S- one. We use the function `RobStatTM::bisquare()` -->
<!-- to compute the tuning constant the corresponds to  -->
<!-- a desired efficiency, for regression estimators -->
<!-- computed using Tukey's bisquare loss function.  -->
<!-- ```{r stackloss4} -->
<!-- library(robustbase) -->
<!-- set.seed(123) -->
<!-- myc <- lmrob.control(tuning.psi=RobStatTM::bisquare(.85)) -->
<!-- a <- lmrob(stack.loss ~ ., data=stackloss, control=myc) -->
<!-- par(mfrow=c(2,2)) -->
<!-- plot(a, which=c(1, 2, 4), id.n=4) -->
<!-- par(mfrow=c(1,1)) -->
<!-- ``` -->

<!-- The function `lmrobdetMM` in package `RobStatTM` implements -->
<!-- a different starting point for the iterative  -->
<!-- algorithm that computes the S-estimator. Instead of using -->
<!-- data-dependent random starts, a few deterministic starting -->
<!-- points are considered.  -->
<!-- The code below compares the resulting fit on the `stackloss` data: -->
<!-- ```{r stackloss5} -->
<!-- library(RobStatTM) -->
<!-- set.seed(123) -->
<!-- myc <- lmrobdet.control(family='bisquare', efficiency=.95) -->
<!-- a.det <- lmrobdetMM(stack.loss ~ ., data=stackloss, control=myc) -->
<!-- par(mfrow=c(2,2)) -->
<!-- plot(a.det, which=c(1, 2, 4), id.n=4) -->
<!-- par(mfrow=c(1,1)) -->
<!-- ``` -->

<!-- We see that in this case, both estimators yield essentially -->
<!-- the same fit -->
<!-- ```{r stackcomp} -->
<!-- cbind(lmrob=coef(a), lmrobdetMM=coef(a.det)) -->
<!-- ``` -->

The least squares fit only identifies a single
potential mild outlier (observation 21), and the
regression coefficients are somewhat different from
the robust ones (specially for `Water.Temp` and `Acid.Conc.`)

In [None]:
a.ls <- lm(stack.loss ~ ., data=stackloss)
par(mfrow=c(2,2))
plot(a.ls, which=c(1, 2, 5))
par(mfrow=c(1,1))
cbind(ls=coef(a.ls), lmrob=coef(a)) #, lmrobdetMM=coef(a.det))

### Choosing the score / loss function

For this class of M-estimators we can choose the 
family of loss/score functions, and the corresponding tuning
constant. For example, Tukey's bisquare loss is
$\rho(t) = \min(k^2/6, k^2/6*(1-(1-(t/k)^2)^3))$. The next 
figures illustrate $\rho$ and its derivative $\psi$ (the corresponding
score function):

In [None]:
tt <- seq(-6, 6, length=200)
tun.cnst <- bisquare(0.95)
par(mfrow=c(2,1))
plot(tt, rho(tt, family='bisquare', cc=tun.cnst), type='l', 
     lwd=4, col='red', xlab='t', ylab=expression(rho(t)))
abline(v=0, lty=2)
plot(tt, rhoprime(tt, family='bisquare', cc=tun.cnst), type='l', 
     lwd=4, col='red', xlab='t', ylab=expression(psi(t)))
abline(v=0, lty=2); abline(h=0, lty=2); par(mfrow=c(1,1))

The tuning constant is typically chosen to obtain an estimator
with a desired efficiency when the errors follow a specific distribution.
For example, the function `bisquare()` used above returns the 
value of the tuning parameter that should be used with Tukey's 
family of loss functions to obtain
a desired efficiency when errors are Gaussian. 
Although the breakdown point of these estimators is high (as 
high as that of the auxiliary S-estimator for the residual scale, 
which can be chosen to be 50%), and their efficiency can 
then subsequently be set by selecting an appropriate tuning parameter,
there is a bias / variance trade-off (the higher the efficiency (the
lower the variance), the higher the asymptotic bias). 

There is, however, another  "parameter" that can be chosen
to reduce the bias of the estimator
(for a given breakdown point and efficiency)&mdash;the **family 
of loss functions** itself. The package `RobStatTM` implements the optimal
loss function (`opt`), which can be set using the control 
argument in `lmrobdetMM` (see
Section 5.8.1 in [Maronna et al (2019)](https://doi.org/10.1002/9781119214656)).
Below we revisit the stack loss example, using a 95% efficient
estimator computed with the bias-optimal loss, and
compare it with the 95% efficient one based on the bisquare
loss function.

In [None]:
library(RobStatTM)
set.seed(123)
myc <- lmrobdet.control(family='opt', efficiency=.95)
a.opt <- lmrobdetMM(stack.loss ~ ., data=stackloss, control=myc)
par(mfrow=c(2,2))
plot(a.opt, which=c(1, 2, 4), id.n=4)
par(mfrow=c(1,1))

<!-- Note that by using a loss function with better asymptotic bias -->
<!-- properties we are able to detect all four outliers detected by -->
<!-- the S-estimator but using a highly efficient and robust  -->
<!-- regression estimator, which results in better  -->
<!-- (e.g. more powerful) inference for the regression parameters.  -->
<!-- In other words, we obtain a more efficient regression estimator -->
<!-- incurring in a much smaller increase in asymptotic bias, which -->
<!-- results in better outlier-detection capabilities. -->

The estimated regression parameters are

In [None]:
cbind(ls=coef(a.ls), Tukey=coef(a), Opt=coef(a.opt))