<div >
    <img src = "../banner/banner_ML_UNLP_1900_200.png" />
</div>

<a target="_blank" href="https://colab.research.google.com/github/ignaciomsarmiento/ML_UNLP_Lectures/blob/main/Week03/Notebook_Lasso.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>




# Regularization: Lasso

## Predicting Wages

Our objective today is to construct a model of individual wages

$$
w = f(X) + u 
$$

where w is the  wage, and X is a matrix that includes potential explanatory variables/predictors. In this problem set, we will focus on a linear model of the form

\begin{align}
 ln(w) & = \beta_0 + \beta_1 X_1 + \dots + \beta_p X_p  + u 
\end{align}

were $ln(w)$ is the logarithm of the wage.


Let's load the modules:

In [None]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

and the data set that is a sample of the NLSY97. The NLSY97 is  a nationally representative sample of 8,984 men and women born during the years 1980 through 1984 and living in the United States at the time of the initial survey in 1997.  Participants were ages 12 to 16 as of December 31, 1996.  Interviews were conducted annually from 1997 to 2011 and biennially since then.  

In [None]:
nlsy = pd.read_csv('https://raw.githubusercontent.com/ignaciomsarmiento/datasets/main/nlsy97.csv')

In [None]:
X1 = nlsy[[ "educ", "exp", "afqt", "mom_educ", "dad_educ"]]
X2=nlsy.drop(columns=['lnw_2016'])
y=nlsy['lnw_2016']

We want to construct a model that predicts well out of sample, and we have potentially 994 regressors. We are going to regularize this regression using Ridge.

## Lasso

We first illustrate ridge regression, which can be fit using `sklearn` and seeks to minimize

$$
\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p} \beta_j x_{ij}    \right) ^ 2 + \lambda \sum_{j=1}^{p} |\beta_j| .
$$

Notice that the intercept is not penalized. 


Lasso penalizes the absolute value  of the coefficients. As a result, lasso shrinks coefficients toward zero all the way.



In [None]:
from sklearn.linear_model import  Lasso, LassoCV

In [None]:
?Lasso

The Lasso() function has an alpha argument (λ, but with a different name!) that is used to tune the model.

Let's run the ridge regression (we need to set the parameter `alpha` to zero)

In [None]:
lasso = Lasso(alpha = 0)

lasso.fit(X1, y)

In [None]:
lasso.coef_

In [None]:
from sklearn import linear_model
lm=linear_model.LinearRegression().fit(X1,y)
lm.coef_

In [None]:
lasso = Lasso(alpha = .11)

lasso.fit(X1, y)

lasso.coef_

 We'll generate an array of alpha values ranging from very big to very small, essentially covering the full range of scenarios from the null model containing only the intercept, to the least squares fit:

In [None]:
alphas = 10**np.linspace(10,-2,100)*0.5
alphas

Associated with each alpha value is a vector of ridge regression coefficients, which we'll store in a matrix coefs. In this case, it is a 5×100 matrix, with 19 rows (one for each predictor) and 100 columns (one for each value of alpha). Remember that we'll want to standardize the variables so that they are on the same scale. To do this, we can use the normalize = True parameter:

In [None]:
ridge = Lasso()
coefs = []

for a in alphas:
    ridge.set_params(alpha = a)
    ridge.fit(X1, y)
    coefs.append(ridge.coef_)
    
np.shape(coefs)


Let's see how  how much the coefficients are penalized for different values of $\lambda$. Notice none of the coefficients are forced to be zero, although they get close to it.

In [None]:
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale('log')
plt.axis('tight')
plt.xlabel('alpha/lambda')
plt.ylabel('Coefs')

#### All the predictors

In [None]:
lasso = Lasso()
coefs = []

for a in alphas:
    lasso.set_params(alpha = a)
    lasso.fit(X2, y)
    coefs.append(ridge.coef_)
    
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale('log')
plt.axis('tight')
plt.xlabel('alpha/lambda')
plt.ylabel('Coefs')

## Penalty selection

Instead of arbitrarily choosing , it would be better to use cross-validation to choose the tuning parameter alpha. We can do this using the cross-validated lasso regression function, `LassoCV()`.

In [None]:
lassocv = LassoCV(alphas = alphas, scoring = 'neg_mean_squared_error')
lassocv.fit(X2, y)
lassocv.alpha_

In [None]:
lasso_cv_star = Lasso(alpha = ridgecv.alpha_)
lasso_cv_star.fit(X2, y)

In [None]:
lasso_cv_star.coef_