# Regularization and the Bias-Variance Tradeoff

In this lecture, we will discuss ridge and lasso regression, and see how these regularization techniques can help improve your linear regression models.

<b>Functions and attributes in this lecture: </b>
- `sklearn.linear_model` - Submodule for linear models
 - `Ridge` - Implements Ridge Regression
 - `Lasso` - Implements Lasso Regression

In [2]:
# Non-sklearn packages
import numpy as np
import pandas as pd

# Sklearn packages
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_validate

## Importing the dataset and creating a base linear model

In [3]:
# Importing the cleaned tips dataset
cleaned_tips = pd.read_csv("cleaned_tips.csv")

In [4]:
# Checking out the dataset
cleaned_tips.head()

Unnamed: 0,total_bill,tip,size,sex_Female,smoker_No,day_Fri,day_Sat,day_Sun,day_Thur,time_Dinner
0,16.99,1.01,2,1.0,1.0,0.0,0.0,1.0,0.0,1.0
1,10.34,1.66,3,0.0,1.0,0.0,0.0,1.0,0.0,1.0
2,21.01,3.5,3,0.0,1.0,0.0,0.0,1.0,0.0,1.0
3,23.68,3.31,2,0.0,1.0,0.0,0.0,1.0,0.0,1.0
4,24.59,3.61,4,1.0,1.0,0.0,0.0,1.0,0.0,1.0


In [5]:
# Splitting into features and targets
X = cleaned_tips.drop(columns=["tip"], axis=1)
y = cleaned_tips["tip"]

In [19]:
# A baseline linear model
linear_reg = LinearRegression()
linear_result = cross_validate(linear_reg, X, y, cv=5, scoring="neg_mean_squared_error")
print("Result: ", -np.mean(linear_result["test_score"]))

Result:  1.125279779569277


## Ridge Regression and Lasso Regression

In [25]:
# Ridge Regression
from sklearn.linear_model import Ridge
ridge_reg = Ridge(30)
ridge_result = cross_validate(ridge_reg, X, y, cv=5, scoring="neg_mean_squared_error")
print("Result: ", -np.mean(ridge_result["test_score"]))

Result:  1.085993827471052


In [28]:
# Lasso Regression
from sklearn.linear_model import Lasso
lasso_reg = Lasso(1)
lasso_result = cross_validate(lasso_reg, X, y, cv=5, scoring="neg_mean_squared_error")
print("Result: ", -np.mean(lasso_result["test_score"]))

Result:  1.0691001605259944


## Finding the best $\alpha$ for the Lasso Regression

In [57]:
# Searching for a good alpha value
alphas = [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50]
scores = []

for alpha in alphas:
    lasso_model = Lasso(alpha=alpha)
    lasso_result = cross_validate(lasso_model, X, y, cv=5, scoring="neg_mean_squared_error")
    score = -np.mean(lasso_result["test_score"])
    print(f"Result for {alpha}: {score}")
    scores.append(score)


Result for 0.001: 1.1211365981262227
Result for 0.005: 1.1062899113874016
Result for 0.01: 1.0917016243989326
Result for 0.05: 1.052339323765285
Result for 0.1: 1.0605979804997434
Result for 0.5: 1.0620375541693847
Result for 1: 1.0691001605259944
Result for 5: 1.367233407573625
Result for 10: 1.9239219035570045
Result for 50: 1.9239219035570045


In [62]:
# Finding the best of the alpha values
best_alpha = alphas[np.argmin(scores)]
best_alpha

0.05

In [65]:
# Getting the best model
best_model = Lasso(alpha=best_alpha)
best_model