# Model Selection via Regularization
In this exercise, we will introduce regularization terms in our regression models to prevent overfitting. We will compare the effects of L2 (Ridge) to that of L1 (LASSO) regularization on the values of coefficients. For this exercise, we will be using the cars dataset. It is provided as <b>cars.csv</b> in the same folder as this notebook.

In [None]:
#Import the necessary libraries
%matplotlib inline

from matplotlib import pyplot as plt
import utils
import numpy as np
import statsmodels.api as sm
from scipy import stats

Read the data as a pandas dataframe. For refernece: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

In [2]:
# Read the data
filename = 'cars.csv'
df = utils.read_dataset_from_csv(filename)

(0a). Print the number of observations in the dataset. 

(0b). Print all the columns in the dataset

(0c).Produce a Scatter plot of all variables against each other. <br> 
Feel free to use the <b>scatter_plot_dataframe()</b> function in utils.py. <br>
Note: this function call may take a while.

(0d). Produce a plot of correaltions between all variables. <br>
Feel free to use the <b>correlation_plot()</b> function in utils.py

(0e). Using the plots above, which variables have a (roughly) linear relationship with 'mpg'? 

(Type your answer here)

<b> (1). Ridge Regression.</b> <br>
We will run Ridge regression by introducing an L2 penalty on the regression coefficients.

(1a). Build a simple OLS model using <b>statsmodels.OLS</b> <br>
Your dependent variable is 'mpg'. <br>
The independent variables are 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb'. <br>
Do include the  intercept using the add_constant() function in statsmodels. <br>
Hint: http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html <br>
Store your multi-variate model in a variable called <b>sv_model</b>

(1b). Use statsmodels' <b>fit_regularized()</b> function to write a function <b>get_sv_ridge()</b> which fits the model with an L2 penalty of weight $\alpha$ on an OLS model. Your function should take as arguments the statsmodels OLS model and $\alpha$ and return the fit.

(1c). Use <b>get_sv_ridge()</b> to plot the Ridge regression coefficients vs. $\alpha$ for each independent variable and for vs. $log(\alpha)$ (use log base 10) for each independent variable and for $\alpha$ in the range [0.1,100.1] in small increments.<br>
Ensure that your legend clearly labels all lines (one per each variable) and the axes are appropriately labeled (x-axis should have $log(\alpha)$ and the y-axis should have the parameter values).

(1d). What do you notice about the coefficients as the regularization penalty in increased? 

(Write your answer here)

<b>(2). LASSO (Least Absolute Shrinkage and Selection Operator). </b> <br>
We will now run LASSO by introducing an L1 penalty on the regression coefficients.

(2a). Use statsmodels' <b>fit_regularized()</b> function to write a function <b>get_sv_lasso()</b> which fits the model with an L1 penalty of weight $\alpha$ and returns the output. Your function should take as arguments the statsmodels OLS model and $\alpha$ and return the fit.

(2b). Use <b>get_sv_lasso()</b> to plot the LASSO coefficients vs. $log(\alpha)$ (use log base 10) for each independent variable and for $\alpha$ in the range [0.1,100.1] in small increments. <br>
Ensure that your legend clearly labels all lines (one per each variable) and the axes are appropriately labeled (x-axis should have $log(\alpha)$ and the y-axis should have the parameter values).

(2c). What do you notice about the coefficients as the regularization penalty is increased? Contrast the behavior with what you observed for Ridge Regression? Which might you prefer for model selection, i.e. choosing a sparse subset of features (columns) in a regression model?

(Write your answer here)