#### LICENSE
These notes are released under the 
"Creative Commons Attribution-ShareAlike 4.0 International" license. 
See the **human-readable version** [here](https://creativecommons.org/licenses/by-sa/4.0/)
and the **real thing** [here](https://creativecommons.org/licenses/by-sa/4.0/legalcode). 

## Ridge regression 

### Standardizing  responses and explanatory variables

As we discussed in class (also see the course notes) 
penalty terms in regularized linear regression estimators do not typically include the intercept
(if one if present). Furthemore, unless the covariates are
standardized (e.g. to all have the same norm), the 
the magnitude of the regularization (penalization)
may depend on the scale of the covariates. For example, 
the scale of any one feature (which is 
arbitrary and can be modified without changing the model) 
may radically change the final level of "shrinkage"
or regularization. 

Here we illustrate this issue with a simple example. 

First, load the `alcohol` data, and set up the "design matrix" `x`
and the response variable `y` for use with `glmnet`. We will use
`glmnet()` with `alpha = 0` to compute a Ridge Regression estimator,
and choose an optimal level of penalization via the default
5-fold cross validation. 

In [None]:
library(glmnet)
data(alcohol, package='robustbase')

x <- model.matrix(logSolubility ~ ., data=alcohol)
x <- x[, -1]
y <- alcohol$logSolubility

We now set up a (rather wide) grid of possible values for the penalization
constant, and use `cv.glmnet()` to explore the prediction properties of
the corresponding fits. We will select the value of lambda (the penalty
parameter) with smallest
CV criterion. 

We will force `glmnet` to not standardize our data (use the
argument `standardize = FALSE`), and report the vector of estimated
regression coefficients for that optimal value of lambda:

In [None]:
lambdas <- exp( seq(-20, 10, length=200))
set.seed(123)
a <- cv.glmnet(x=x, y=y, family='gaussian', alpha=0, lambda=lambdas, standardize = FALSE)
a$lambda.min
round(coef(a, s='lambda.min'), 4)

To illustrate our point, we now change the scale of one of the covariates (`RM`), 
repeat our Ridge Regression fit and compare the results. 

We would, naturally, like that the amount of regularization only changes 
(proportionally) for the modified covariate, but that the rest of the 
fit model remains the same. Alas, this is not to be:

In [None]:
x[, 5] <- x[, 5] / 100000
set.seed(123)
b <- cv.glmnet(x=x, y=y, family='gaussian', alpha=0, lambda=lambdas, standardize = FALSE)
b$lambda.min
round(cbind(coef(a, s='lambda.min'), coef(b, s='lambda.min')), 4)

Note that the CV curves can change quite dramatically as well:

In [None]:
plot(a)

In [None]:
plot(b)

The default behaviour of `glmnet` is to fit the model on properly standardized variables,
and then re-express them in the original scale. Thus, the results are appropriately 
equivariant: 

In [None]:
### w/standardization
x <- model.matrix(logSolubility ~ ., data=alcohol)
x <- x[, -1]
y <- alcohol$logSolubility
set.seed(123)
a2 <- cv.glmnet(x=x, y=y, family='gaussian', alpha=0, lambda=lambdas, standardize = TRUE)
x[, 5] <- x[, 5] / 100000
set.seed(123)
b2 <- cv.glmnet(x=x, y=y, family='gaussian', alpha=0, lambda=lambdas, standardize = TRUE)
round(cbind(coef(a2, s='lambda.min'), coef(b2, s='lambda.min')), 4)
c(a2$lambda.min, b2$lambda.min)

And now the CV curves are also identical:

In [None]:
plot(a2)

In [None]:
plot(b2)