## L7.2 High-dimensional data: Ridge and Lasso

In [None]:
# plot settings
options(repr.plot.width=8, repr.plot.height=6)

library("glmnet")

In [None]:
load("QuickStartExample.RData") # This data comes from the glmnet authors

# the data has 20 predictor variables.
# artificially construct an example where we are underconstrained. 
y <- head(y,19)
x <- head(x,19)

# do a linear fit with lm
lm(y ~ x)

## Linear regression with "Ridge"

The function glmnet with alpha=0 will perform linear regression that maximizes the function

$$\mbox{log}\left(L(\beta)\right) + \lambda \sum_i  \beta_i ^2$$

where $L$ is the likelihood function. Conveniently, it performs this regression for many values of $\lambda$ at once.


In [None]:
# do a fit with glmnet, which performs a linear fit

lambdaArray <- exp(seq(-8,5,length=200))

ridgeFit <- glmnet(x, y, alpha=0,lambda=lambdaArray)

# coefficients versus lambda
plot(ridgeFit, xvar = "lambda", label = TRUE)

## Linear regression with "Lasso"

The function glmnet with alpha=1 will perform linear regression that maximizes the function

$$\mbox{log}\left(L(\beta)\right) + \lambda \sum_i | \beta_i |$$

where $L$ is the likelihood function.


In [None]:
# do a fit with glmnet, which performs a linear fit
lassoFit = glmnet(x, y, alpha=1)

plot(lassoFit, xvar = "lambda", label = TRUE)

We can choose the lambda value using cross-validation. 

Note high values of lambda correspond to simpler models (fewer parameters) while lower values of lambda correspond to more complex models (more parameters).

In [None]:
crossValidationOutput <- cv.glmnet(x, y, alpha=1, grouped=FALSE)

plot(crossValidationOutput)