# Ridge and Lasso Regression 

## Objectives

The goal is to use Ridge and Lasso in Linear regression and compare Lasso and Ridge with standard regression
The data source:
https://github.com/learn-co-students/dsc-ridge-and-lasso-regression-lab-seattle-ds-062419

## Housing Prices Data

In [None]:
import pandas as pd
import numpy as np
import pylab as plt
import warnings
warnings.filterwarnings('ignore')

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_squared_log_error
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

In [None]:
df = pd.read_csv('Housing_Prices/train.csv')
df.head()

In [None]:
df.fillna(value = df.median(), inplace= True) # Impute null values: replace missing by the median per feature.

In [None]:
#Make sure to remove the SalesPrice column from the predictors (which you store in `X`), 
X = df.drop(['SalePrice'], axis=1)
#Store the target in `y`.
y = df['SalePrice'] 

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42)

Make a selection of the data by removing some of the data with `dtype = object`, this way our first model only contains **continuous features**

In [None]:
X_train_nonobj = X_train[[col for col,dtype in list(zip(X_train.columns, X_train.dtypes)) 
                          if dtype != np.dtype('O')]]
X_test_nonobj = X_test[[col for col,dtype in list(zip(X_test.columns, X_test.dtypes)) 
                        if dtype != np.dtype('O')]]

## Let's use this data to perform a first naive linear regression model

Compute the R squared and the MSE for both train and test set.

In [None]:
# Fit the model and print R2 and MSE for train and test
lr = LinearRegression()
lr.fit(X_train_nonobj, y_train)
print(lr.score(X_train_nonobj, y_train))
print(lr.score(X_test_nonobj, y_test))
print(mean_squared_error(y_train, lr.predict(X_train_nonobj)))
print(mean_squared_error(y_test, lr.predict(X_test_nonobj)))

## Normalize your data

From sklearn.preprocessing import StandardScaler to scale predictors.

In [None]:
ss = StandardScaler()  
ss.fit(X_train_nonobj)
# note that scaling is after train test split.
X_train_nonobj_scaled = ss.transform(X_train_nonobj)
#note that only X_train fit to ss not X_test
X_test_nonobj_scaled = ss.transform(X_test_nonobj)

Perform the same linear regression on this data and print out R-squared and MSE.

In [None]:
lr = LinearRegression()
lr.fit(X_train_nonobj_scaled, y_train)
preds = lr.predict(X_test_nonobj_scaled)

print(lr.coef_)
print(lr.score(X_train_nonobj_scaled, y_train))
print(lr.score(X_test_nonobj_scaled, y_test))
print(mean_squared_error(y_train, lr.predict(X_train_nonobj_scaled)))
print(mean_squared_error(y_test, lr.predict(X_test_nonobj_scaled)))

## One hot encoding categorical variables

Let's use the "object" variables and do one hot encoding

In [None]:
# Create train and test dataframes with only the categorical variables

X_train_obj = X_train[[col for col,dtype in list(zip(X_train.columns, X_train.dtypes)) 
                          if dtype == np.dtype('O')]]
X_test_obj = X_test[[col for col,dtype in list(zip(X_test.columns, X_test.dtypes)) 
                        if dtype == np.dtype('O')]]

In [None]:
X_train_obj = X_train_obj.fillna('NaN') 
X_test_obj = X_test_obj.fillna('NaN') 

In [None]:
# OneHotEncode....
from sklearn.preprocessing import OneHotEncoder

onehotencoder = OneHotEncoder(handle_unknown='ignore')
X_train_obj_ohe = onehotencoder.fit_transform(X_train_obj)
X_test_obj_ohe = onehotencoder.transform(X_test_obj)

In [None]:
plt.spy(X_test_obj_ohe.toarray())

Merge categorical dataframe with scaled `X` of noncategorical to have one predictor dataframe.

In [None]:
# Your code here
X_train_scaled_all = pd.concat([pd.DataFrame(X_train_nonobj_scaled), pd.DataFrame(X_train_obj_ohe.todense())], axis=1)
X_test_scaled_all = pd.concat([pd.DataFrame(X_test_nonobj_scaled), pd.DataFrame(X_test_obj_ohe.todense())], axis=1)

Perform the same linear regression on this data and print out R-squared and MSE.

In [None]:
# Your code here
lr = LinearRegression()
lr.fit(X_train_scaled_all, y_train)
preds = lr.predict(X_test_scaled_all)

# print(lr.coef_)
print(lr.score(X_train_scaled_all, y_train))
print(lr.score(X_test_scaled_all, y_test))
print(mean_squared_error(y_train, lr.predict(X_train_scaled_all)))
print(mean_squared_error(y_test, lr.predict(X_test_scaled_all)))

Notice the severe overfitting above. Training R squared is very high, but the testing R squared is negative. predictions are not accurate and the scale of the test MSE is much higher than that of the training.

## Perform Ridge and Lasso regression

Use all the data (normalized features and categorical variables after onehotencoder) and perform Lasso and Ridge regression for both! Each time, look at R-squared and MSE.

## Lasso

With default parameter (alpha = 1)

In [None]:
from sklearn.linear_model import Lasso, Ridge

lasso = Lasso(alpha = 1) 
lasso.fit(X_train_scaled_all, y_train)
print(lasso.score(X_train_scaled_all, y_train))
print(lasso.score(X_test_scaled_all, y_test))
print(mean_squared_error(y_train, lasso.predict(X_train_scaled_all)))
print(mean_squared_error(y_test, lasso.predict(X_test_scaled_all)))

In [None]:
from numpy import linspace

ls_alpha=[]
MSE_trn= []
MSE_tst= []
for alpha in linspace(0.5,75.5,75):
    ls_alpha.append(alpha)
    lasso = Lasso(alpha = alpha) 
    lasso.fit(X_train_scaled_all, y_train)
    MSE_trn.append(mean_squared_error(y_train, lasso.predict(X_train_scaled_all)))
    MSE_tst.append(mean_squared_error(y_test, lasso.predict(X_test_scaled_all)))

In [None]:
plt.scatter(ls_alpha,MSE_trn)   
plt.scatter(ls_alpha,MSE_tst);

The good news is that the test loss goes down while the train loss didn't change as much so the model is getting better

In [None]:
alpha_MSE_tst = pd.Series(MSE_tst, index=ls_alpha)
alpha_MSE_tst.argmin()

At alpha of 33.945, loss of test set is minimum. 

## Ridge

With default parameter (alpha = 1)

In [None]:
from sklearn.linear_model import Lasso, Ridge

ridge = Ridge() #Lasso is also known as the L1 norm.
ridge.fit(X_train_scaled_all, y_train)
print(ridge.score(X_train_scaled_all, y_train))
print(ridge.score(X_test_scaled_all, y_test))
print(mean_squared_error(y_train, ridge.predict(X_train_scaled_all)))
print(mean_squared_error(y_test, ridge.predict(X_test_scaled_all)))

## Compare Ridge and Lasso parameter estimates that are very close to 0

Compare with the total length of the parameter space and draw conclusions!

In [None]:
# number of Lasso params almost zero
sum(abs(lasso.coef_) < 10**(-10))

In [None]:
# number of Ridge params almost zero

sum(abs(ridge.coef_) < 10**(-10))

In [None]:
sum(abs(lasso.coef_) < 10**(-10))/len(lasso.coef_)

## Conclusions

The model was initially overfit to train. Once I did regularization and decreased the effect of some coefficients that weren't really serving, I moved away from overfitting and made my predictions way better. Lasso was effective to essentially perform variable selection and remove about 12% of the variables from your model with a default alpha. However, to get better regularization, I need to play around with alpha.