# 📈 Lecture 15 Lab: Regularization

<img src="img/ted-lasso.jpg" alt= “ed-lasso” width="300" />

## ✅ Setup and data import
In this lab, we will learn the basics of regularization methods.

In [0]:
# Load in additional functions
library(tidyverse)
library(lubridate)

if (! require(ROCR)) {
  install.packages('ROCR')
}
library(ROCR)

# Takes a couple minutes to install glmnet in Google Colab
if (! require(glmnet)) {
  install.packages('glmnet')
}
library(glmnet)

# Load in helper functions for fitting lasso and ridge, and computing AUC.
source('https://jdgrossman.com/assets/hw6-helpers.R')

# Use three digits past the decimal point,
# and don't use scientific notation.
options(digits = 3, scipen = 999)

# Format plots with a white background and dark features.
theme_set(theme_bw())

# Increase the default text size of plots.
# If you are *not* working in Google Colab, we recommend commenting
# out this line of code.
theme_update(text = element_text(size = 20))

# Increase the default plot width and height.
# If you are *not* working in Google Colab, we recommend commenting
# out this line of code.
options(repr.plot.width=12, repr.plot.height=8)

# Read in the data
cars = read_csv('https://jdgrossman.com/assets/used-cars-regularization.csv')

# peek at 10 random rows
sample_n(cars, 10)

## 🚀 Exercise 1

Fit a linear regression model to the `cars` data. Your model should predict price as a function of mileage, year, and make.

Use the `coef` function to print the model coefficients.

In [0]:
# Your code here!



## 🚀 Exercise 2

The `ridge_lm` function can be used to fit a ridge regression model with similar syntax to `lm`.

`ridge_lm` has three inputs:
1. `formula`: The model formula
2. `data`: A dataframe
3. `lambda`: The desired value of lambda in the ridge objective function.

Fit a ridge regression model with the same specification as the model in Exercise 1. For lambda, use `lambda=0`. 

How do the coefficients compare between the two models? Make sure to use `coef`, and not `summary`.

> The `ridge_lm`, `ridge_glm`, `lasso_lm`, and `lasso_glm` functions are wrappers for the `glmnet` function from the `glmnet` package for fitting regularized modeels. You are welcome to learn how to use `glmnet` directly, but it is not required for MS&E 125.
>
> By default, `glmnet` standardizes all covariates before fitting the model, and then transforms them back to their raw scale after the model is fit.

In [0]:
# Your code here!



## 🚀 Exercise 3

Repeat Exercise 2 using `lambda=10000`.

How do the coefficients change?

In [0]:
# Your code here!



## 🚀 Exercise 4

Fit a ridge regression model for each of the following values of `lambda`:

```
lambdas = 10^seq(-1, 8, by=1)
```

Use a `for` loop to iterate over `lambdas` and fit a model for each value of `lambda`. Store the coefficients of each model in a list named `coef_list`.

> You can use `c(my_list, new_element)` to append `new_element` to the end of `my_list`.

In [0]:
# Your code here!



## 🚀 Exercise 5

Use the following code to make a dataframe out of your list of coefficients.

```
coef_plot_df = map2_dfr(
  coef_list, 
  lambdas, 
  function(coefs, lambda) {
    coef_matrix = as.matrix(coefs)
    coef_df = tibble(
      coef_name = row.names(coef_matrix), 
      coef_value = coef_matrix, 
      log10_lambda=log10(lambda)
    )
    coef_df
  }
)
```

Using `coef_plot_df`, plot your coefficients as a function of `log10_lambda`. What happens to the coefficients as `lambda` goes up?

> You may want to remove the `(Intercept)` coefficient from your plot.

In [0]:
# Your code here!



## 🚀 Exercise 6

Repeat the previous model fitting and plotting exercises using lasso instead of ridge. How do the lasso and ridge plots differ?

> The `lasso_lm` function will come in handy!

In [0]:
# Your code here!

