# Ridge and Lasso Regression - Lab

## Introduction

In this lab, you'll practice your knowledge on Ridge and Lasso regression!

## Objectives

You will be able to:

- Use Lasso and ridge regression in Python
- Compare Lasso and Ridge with standard regression

## Housing Prices Data

Let's look at yet another house pricing data set.

In [1]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv('Housing_Prices/train.csv')

Look at df.info

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
Id               1460 non-null int64
MSSubClass       1460 non-null int64
MSZoning         1460 non-null object
LotFrontage      1201 non-null float64
LotArea          1460 non-null int64
Street           1460 non-null object
Alley            91 non-null object
LotShape         1460 non-null object
LandContour      1460 non-null object
Utilities        1460 non-null object
LotConfig        1460 non-null object
LandSlope        1460 non-null object
Neighborhood     1460 non-null object
Condition1       1460 non-null object
Condition2       1460 non-null object
BldgType         1460 non-null object
HouseStyle       1460 non-null object
OverallQual      1460 non-null int64
OverallCond      1460 non-null int64
YearBuilt        1460 non-null int64
YearRemodAdd     1460 non-null int64
RoofStyle        1460 non-null object
RoofMatl         1460 non-null object
Exterior1st      1460 non-n

We'll make a first selection of the data by removing some of the data with `dtype = object`, this way our first model only contains **continuous features**

Make sure to remove the SalesPrice column from the predictors (which you store in `X`), then replace missing inputs by the median per feature.

Store the target in `y`.

In [3]:
# Load necessary packages
import numpy as np
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.model_selection import train_test_split

# remove "object"-type features and SalesPrice from `X`
X = df.drop(df.select_dtypes(['object']), axis=1)
X = X.drop(['Id', 'SalePrice'], axis=1)

# Impute Null Values & look at information of X again
for column in X:
    X[column] = X[column].fillna(np.mean(X[column]))
X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 36 columns):
MSSubClass       1460 non-null int64
LotFrontage      1460 non-null float64
LotArea          1460 non-null int64
OverallQual      1460 non-null int64
OverallCond      1460 non-null int64
YearBuilt        1460 non-null int64
YearRemodAdd     1460 non-null int64
MasVnrArea       1460 non-null float64
BsmtFinSF1       1460 non-null int64
BsmtFinSF2       1460 non-null int64
BsmtUnfSF        1460 non-null int64
TotalBsmtSF      1460 non-null int64
1stFlrSF         1460 non-null int64
2ndFlrSF         1460 non-null int64
LowQualFinSF     1460 non-null int64
GrLivArea        1460 non-null int64
BsmtFullBath     1460 non-null int64
BsmtHalfBath     1460 non-null int64
FullBath         1460 non-null int64
HalfBath         1460 non-null int64
BedroomAbvGr     1460 non-null int64
KitchenAbvGr     1460 non-null int64
TotRmsAbvGrd     1460 non-null int64
Fireplaces       1460 non-null int64
G

In [4]:
# Create y
y = df['SalePrice']

## Let's use this data to perform a first naive linear regression model

Compute the R squared and the MSE for both train and test set.

In [5]:
from sklearn.metrics import mean_squared_error, mean_squared_log_error
r_square, mse = [], []

# Split in train and test
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Fit the model and print R2 and MSE for train and test
linreg = LinearRegression()
linreg.fit(X_train, y_train)

r_square.append(linreg.score(X_test, y_test))
mse.append(mean_squared_error(y_test, linreg.predict(X_test)))

print(f'Train R2: {linreg.score(X_train, y_train)}')
print(f'Train MSE: {mean_squared_error(y_train, linreg.predict(X_train))}\n')
print(f'Test R2: {linreg.score(X_test, y_test)}')
print(f'Test MSE: {mean_squared_error(y_test, linreg.predict(X_test))}')

Train R2: 0.8014454598333407
Train MSE: 1211823228.0893695

Test R2: 0.8309418776148063
Test MSE: 1169153880.84934


## Normalize your data

We haven't normalized our data, let's create a new model that uses `preprocessing.scale` to scale our predictors!

In [6]:
from sklearn import preprocessing

# Scale the data and perform train test split
X_scaled = preprocessing.scale(X)
X_scaled = pd.DataFrame(X_scaled, columns = X.columns)

X_train, X_test, y_train, y_test = train_test_split(X, y)

Perform the same linear regression on this data and print out R-squared and MSE.

In [7]:
# Your code here
linreg = LinearRegression()
linreg.fit(X_train, y_train)

r_square.append(linreg.score(X_test, y_test))
mse.append(mean_squared_error(y_test, linreg.predict(X_test)))

print(f'Train R2: {linreg.score(X_train, y_train)}')
print(f'Train MSE: {mean_squared_error(y_train, linreg.predict(X_train))}\n')
print(f'Test R2: {linreg.score(X_test, y_test)}')
print(f'Test MSE: {mean_squared_error(y_test, linreg.predict(X_test))}')

Train R2: 0.7896144454222709
Train MSE: 1321906021.06095

Test R2: 0.8739507118092888
Test MSE: 803688886.9981474


## Include dummy variables

We haven't included dummy variables so far: let's use our "object" variables again and create dummies

In [8]:
# Create X_cat which contains only the categorical variables
X_cat = df.select_dtypes(['object'])
X_cat.shape

(1460, 43)

In [9]:
# Make dummies
X_cat = pd.get_dummies(X_cat)
X_cat.head()

Unnamed: 0,MSZoning_C (all),MSZoning_FV,MSZoning_RH,MSZoning_RL,MSZoning_RM,Street_Grvl,Street_Pave,Alley_Grvl,Alley_Pave,LotShape_IR1,...,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,SaleCondition_Abnorml,SaleCondition_AdjLand,SaleCondition_Alloca,SaleCondition_Family,SaleCondition_Normal,SaleCondition_Partial
0,0,0,0,1,0,0,1,0,0,0,...,0,0,0,1,0,0,0,0,1,0
1,0,0,0,1,0,0,1,0,0,0,...,0,0,0,1,0,0,0,0,1,0
2,0,0,0,1,0,0,1,0,0,1,...,0,0,0,1,0,0,0,0,1,0
3,0,0,0,1,0,0,1,0,0,1,...,0,0,0,1,1,0,0,0,0,0
4,0,0,0,1,0,0,1,0,0,1,...,0,0,0,1,0,0,0,0,1,0


Merge `x_cat` together with our scaled `X` so you have one big predictor dataframe.

In [10]:
# Your code here
X_all = X_scaled.join(X_cat, how='outer')
X_all.head()

Unnamed: 0,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,...,SaleType_ConLw,SaleType_New,SaleType_Oth,SaleType_WD,SaleCondition_Abnorml,SaleCondition_AdjLand,SaleCondition_Alloca,SaleCondition_Family,SaleCondition_Normal,SaleCondition_Partial
0,0.073375,-0.229372,-0.207142,0.651479,-0.5172,1.050994,0.878668,0.511418,0.575425,-0.288653,...,0,0,0,1,0,0,0,0,1,0
1,-0.872563,0.451936,-0.091886,-0.071836,2.179628,0.156734,-0.429577,-0.57441,1.171992,-0.288653,...,0,0,0,1,0,0,0,0,1,0
2,0.073375,-0.09311,0.07348,0.651479,-0.5172,0.984752,0.830215,0.32306,0.092907,-0.288653,...,0,0,0,1,0,0,0,0,1,0
3,0.309859,-0.456474,-0.096897,0.651479,-0.5172,-1.863632,-0.720298,-0.57441,-0.499274,-0.288653,...,0,0,0,1,1,0,0,0,0,0
4,0.073375,0.633618,0.375148,1.374795,-0.5172,0.951632,0.733308,1.36457,0.463568,-0.288653,...,0,0,0,1,0,0,0,0,1,0


Perform the same linear regression on this data and print out R-squared and MSE.

In [11]:
# Your code here
X_train, X_test, y_train, y_test = train_test_split(X_all, y)

linreg = LinearRegression()
linreg.fit(X_train, y_train)

r_square.append(linreg.score(X_test, y_test))
mse.append(mean_squared_error(y_test, linreg.predict(X_test)))

print(f'Train R2: {linreg.score(X_train, y_train)}')
print(f'Train MSE: {mean_squared_error(y_train, linreg.predict(X_train))}\n')
print(f'Test R2: {linreg.score(X_test, y_test)}')
print(f'Test MSE: {mean_squared_error(y_test, linreg.predict(X_test))}')

Train R2: 0.9412873432381577
Train MSE: 370141607.4729763

Test R2: -2584134392727683.5
Test MSE: 1.6316563220169412e+25


Notice the severe overfitting above; our training R squared is quite high, but the testing R squared is negative! Our predictions are far far off. Similarly, the scale of the Testing MSE is orders of magnitude higher then that of the training.

## Perform Ridge and Lasso regression

Use all the data (normalized features and dummy categorical variables) and perform Lasso and Ridge regression for both! Each time, look at R-squared and MSE.

## Lasso

With default parameter (alpha = 1)

In [12]:
lasso = Lasso(alpha=1)
lasso.fit(X_train, y_train)

r_square.append(lasso.score(X_test, y_test))
mse.append(mean_squared_error(y_test, lasso.predict(X_test)))

print(f'Train R2: {lasso.score(X_train, y_train)}')
print(f'Train MSE: {mean_squared_error(y_train, lasso.predict(X_train))}\n')
print(f'Test R2: {lasso.score(X_test, y_test)}')
print(f'Test MSE: {mean_squared_error(y_test, lasso.predict(X_test))}')

Train R2: 0.9387376365214576
Train MSE: 386215697.7075937

Test R2: 0.8602850379785778
Test MSE: 882178580.5883722


With a higher regularization parameter (alpha = 10)

In [13]:
lasso = Lasso(alpha=10)
lasso.fit(X_train, y_train)

r_square.append(lasso.score(X_test, y_test))
mse.append(mean_squared_error(y_test, lasso.predict(X_test)))

print(f'Train R2: {lasso.score(X_train, y_train)}')
print(f'Train MSE: {mean_squared_error(y_train, lasso.predict(X_train))}\n')
print(f'Test R2: {lasso.score(X_test, y_test)}')
print(f'Test MSE: {mean_squared_error(y_test, lasso.predict(X_test))}')

Train R2: 0.9365992388090562
Train MSE: 399696776.7515116

Test R2: 0.880917719382856
Test MSE: 751901126.1796783


## Ridge

With default parameter (alpha = 1)

In [14]:
ridge = Ridge(alpha=1)
ridge.fit(X_train, y_train)

r_square.append(ridge.score(X_test, y_test))
mse.append(mean_squared_error(y_test, ridge.predict(X_test)))

print(f'Train R2: {ridge.score(X_train, y_train)}')
print(f'Train MSE: {mean_squared_error(y_train, ridge.predict(X_train))}\n')
print(f'Test R2: {ridge.score(X_test, y_test)}')
print(f'Test MSE: {mean_squared_error(y_test, ridge.predict(X_test))}')

Train R2: 0.9254243430159194
Train MSE: 470146559.13817555

Test R2: 0.8700790627793349
Test MSE: 820337824.4376137


With default parameter (alpha = 10)

In [15]:
ridge = Ridge(alpha=10)
ridge.fit(X_train, y_train)

r_square.append(ridge.score(X_test, y_test))
mse.append(mean_squared_error(y_test, ridge.predict(X_test)))

print(f'Train R2: {ridge.score(X_train, y_train)}')
print(f'Train MSE: {mean_squared_error(y_train, ridge.predict(X_train))}\n')
print(f'Test R2: {ridge.score(X_test, y_test)}')
print(f'Test MSE: {mean_squared_error(y_test, ridge.predict(X_test))}')

Train R2: 0.900746787031944
Train MSE: 625721025.431947

Test R2: 0.874471546830359
Test MSE: 792603104.4811692


## Look at the metrics, what are your main conclusions?

In [21]:
r_square, mse

([0.8309418776148063,
  0.8739507118092888,
  -2584134392727683.5,
  0.8602850379785778,
  0.880917719382856,
  0.8700790627793349,
  0.874471546830359],
 [1169153880.84934,
  803688886.9981474,
  1.6316563220169412e+25,
  882178580.5883722,
  751901126.1796783,
  820337824.4376137,
  792603104.4811692])

In [27]:
rs_imp, mse_imp = [], []
for i in range(len(mse)):
    rs_imp.append(r_square[i] / r_square[0])
    mse_imp.append(mse[i] / mse[0])
    
rs_imp, mse_imp

([1.0,
  1.051759136653382,
  -3109885856451673.5,
  1.0353131321868145,
  1.0601436070492727,
  1.0470997866624205,
  1.0523859374382518],
 [1.0,
  0.6874106994489912,
  1.3955873121095174e+16,
  0.7545444573536441,
  0.6431156227557099,
  0.7016508586890834,
  0.6779288145589332])

Conclusions here: Lasso with an alpha rating of 10 produced the best results, giving the highest R Squared value and lowest MSE. Ridge with an alpha of 10 yielded the second best results, outperforming the Lasso with an alpha of 1.

## Compare number of parameter estimates that are (very close to) 0 for Ridge and Lasso

In [35]:
# number of Ridge params almost zero
ridge_zeros, lasso_zeros = 0, 0
for i in range(len(ridge.coef_)):
    ridge_zeros += 1 if round(ridge.coef_[i], 10) == 0 else 0
    lasso_zeros += 1 if round(lasso.coef_[i], 10) == 0 else 0
print(f'Ridge Params Near Zero: {ridge_zeros}\nLasso Params Near Zero: {lasso_zeros}')

Ridge Params Near Zero: 4
Lasso Params Near Zero: 72


In [19]:
# number of Lasso params almost zero

Compare with the total length of the parameter space and draw conclusions!

In [32]:
X_all.shape

(1460, 288)

With 288 predictors, the lasso method dropped 72 of the 288 independent variables, while the ridge metho

## Summary

Great! You now know how to perform Lasso and Ridge regression.