<div >
<img src = "../../banner.jpg" />
</div>

<a target="_blank" href="https://colab.research.google.com/github/ignaciomsarmiento/BDML_SS/blob/main/Lecture06/Notebook_SS06_Ridge.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


# Regularization: Ridge

## Predicting Wages

Our objective today is to construct a model of individual wages

$$
w = f(X) + u 
$$

where w is the  wage, and X is a matrix that includes potential explanatory variables/predictors. In this problem set, we will focus on a linear model of the form

\begin{align}
 ln(w) & = \beta_0 + \beta_1 X_1 + \dots + \beta_p X_p  + u 
\end{align}

were $ln(w)$ is the logarithm of the wage.

To illustrate I'm going to use a sample of the NLSY97. The NLSY97 is  a nationally representative sample of 8,984 men and women born during the years 1980 through 1984 and living in the United States at the time of the initial survey in 1997.  Participants were ages 12 to 16 as of December 31, 1996.  Interviews were conducted annually from 1997 to 2011 and biennially since then.  

Let's load the packages and the data set:

In [None]:
# install.packages("pacman") #run this line if you use Google Colab

In [None]:
#packages
require("pacman")
p_load("tidyverse","stargazer")

nlsy <- read_csv('https://raw.githubusercontent.com/ignaciomsarmiento/datasets/main/nlsy97.csv')

nlsy = nlsy  %>%   drop_na(educ) #dropea los valores faltantes (NA)

We want to construct a model that predicts well out of sample, and we have potentially 994 regressors. We are going to regularize this regression using Ridge.

## Ridge

We first illustrate ridge regression, which can be fit using glmnet() with alpha = 0 and seeks to minimize

$$
\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p} \beta_j x_{ij}    \right) ^ 2 + \lambda \sum_{j=1}^{p} \beta_j^2 .
$$

Notice that the intercept is not penalized. 


Ridge penalizes the squares  of the coefficients. As a result, ridge shrinks coefficients toward zero, but not all the way.

We are going to use Glmnet. Glmnet is a package that fits generalized linear and similar models via penalized maximum likelihood. The regularization path is computed for the lasso or elastic net penalty at a grid of values (on the log scale) for the regularization parameter lambda. The algorithm is extremely fast!

In [None]:
p_load("glmnet")

In [None]:
# Matrix of predictos (all but lnw_2016)
X0 <- as.matrix(nlsy  %>% select(-lnw_2016))

#Vector that needs predicting
y <- nlsy$lnw_2016


ridge0 <- glmnet(
  x = X0,
  y = y,
  alpha = 0 #ridge
)


plot(ridge0, xvar = "lambda")

In [None]:
ridge0$lambda

## Scale Equivariance

We are going to illustrate the scale problems using just `education` and `afqt` scores

In [None]:


#Vector that needs predicting
y <- nlsy$lnw_2016

# Matrix of predictos (all but lnw_2016)
X <- as.matrix(nlsy  %>% select(educ,afqt))



In [None]:
stargazer(data.frame(X),type="text")

Let's run the ridge regression (we need to set the parameter `alpha` to zero)

In [None]:
ridge <- glmnet(
  x = X,
  y = y,
  alpha = 0, #ridge
 lambda=1,
  standarize=FALSE,
)

Let's see the coefficients we obtained

In [None]:
coef(ridge)

Compare to OLS

In [None]:
ols<-lm(y~X)
summary(ols)

### What happens if we change the scale for education?

In [None]:
X[,1]<-X[,1]*1000

In [None]:
ridge_1000 <- glmnet(
  x = X,
  y = y,
  alpha = 0, #ridge
 lambda=1,
  standarize=FALSE,
)

In [None]:
coef(ridge_1000)

In [None]:
ols_1000<-lm(y~X)
summary(ols_1000)

In [None]:
ols_1000$coefficients[2]*1000

## Selección de la penalización

In [None]:
p_load("caret")

In [None]:
set.seed(123)
fitControl <- trainControl(## 5-fold CV, 10 better
                           method = "cv",
                           number = 5)

In [None]:
ridge<-train(lnw_2016~.,
             data=nlsy,
             method = 'glmnet', 
             trControl = fitControl,
             tuneGrid = expand.grid(alpha = 0, #Ridge
                                    lambda = ridge0$lambda)
              ) 


In [None]:
plot(ridge$results$lambda,
     ridge$results$RMSE,
     xlab="lambda",
     ylab="Root Mean-Squared Error (RMSE)"
     )

In [None]:
ridge$bestTune

In [None]:
coef_ridge<-coef(ridge$finalModel, ridge$bestTune$lambda)
coef_ridge

### Compare to OLS fit

In [None]:
ridge$results$RMSE[which.min(ridge$results$lambda)]

In [None]:
linear_reg<-train(lnw_2016~.,
                 data=nlsy,
                  method = 'lm', 
                  trControl = fitControl
) 


linear_reg

### Compare to Lasso?

In [None]:
lasso<-train(lnw_2016~.,
             data=nlsy,
             method = 'glmnet', 
             trControl = fitControl,
             tuneGrid = expand.grid(alpha = 1, 
                                    lambda = ridge0$lambda)
              ) 


In [None]:
RMSE_df<-cbind(linear_reg$results$RMSE,
               ridge$results$RMSE[which.min(ridge$results$lambda)],
               lasso$results$RMSE[which.min(lasso$results$lambda)]
              )
colnames(RMSE_df)<-c("OLS","RIDGE","LASSO")
RMSE_df