# Linear Regression vs Ridge vs Lasso

For this notebook, we'll use the Cereals.csv dataset available in the datasets folder

### Linear Regression

Import Libraries and dataset

In [1]:
from sklearn.linear_model import LinearRegression
import pandas as pd

In [2]:
df = pd.read_csv("cereals.CSV")

Selecting the target and the predictors

In [5]:
df.columns

Index(['Name', 'Manuf', 'Type', 'Calories', 'Protein', 'Fat', 'Sodium',
       'Fiber', 'Carbo', 'Sugars', 'Potass', 'Vitamins', 'Shelf', 'Weight',
       'Cups', 'Rating', 'Cold', 'Nabisco', 'Quaker', 'Kelloggs',
       'GeneralMills', 'Ralston', 'AHFP'],
      dtype='object')

In [3]:
X = df.iloc[:,3:8] #Predictors - Calories, Protein, Fat, Sodium, Fiber
y = df['Rating'] #Target

Standardize the predictors

In [7]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaled_X = scaler.fit_transform(X)
scaled_X = pd.DataFrame(scaled_X, columns=X.columns)

Split the data

In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(scaled_X, y, test_size = 0.33, random_state = 5)

Linear regression

In [12]:
lm = LinearRegression()
model = lm.fit(X_train,y_train)
model.coef_

array([-5.87173411,  5.02729917, -4.07821314, -2.94667722,  3.37461453])

Prediction from the test set

In [13]:
y_test_pred = model.predict(X_test)

MSE

In [14]:
from sklearn.metrics import mean_squared_error
lm_mse = mean_squared_error(y_test, y_test_pred)

Prediction from the training set

In [15]:
y_train_pred = model.predict(X_train)

In [16]:
mean_squared_error(y_train, y_train_pred)

30.491113847957383

### Ridge Regression

In [17]:
from sklearn.linear_model import Ridge

Regression with no penalty

In [19]:
ridge1 = Ridge(alpha=0) # penalty is set to 0, in other words no penalty
model1 = ridge1.fit(X_train,y_train)

Prediction

In [21]:
y_test_pred1 = model1.predict(X_test) #test set
ridge_mse = mean_squared_error(y_test, y_test_pred1)

Regression with penalty

In [22]:
ridge2 = Ridge(alpha=1)
model2 = ridge2.fit(X_train,y_train)

In [24]:
model2.coef_

array([-5.82339932,  4.91201241, -3.98539246, -2.89120078,  3.37984124])

Prediction

In [25]:
y_test_pred2 = model2.predict(X_test)

In [26]:
ridge_penalty_mse = mean_squared_error(y_test, y_test_pred2)

In [28]:
print('MSE Ridge with no penalty'+ ' ',ridge_mse)
print('MSE Ridge with penalty'+ ' ',ridge_penalty_mse)

MSE Ridge with no penalty  39.723787963685815
MSE Ridge with penalty  40.12287359221361


The MSE is better with no penalty. Let's check with other penalty value:

In [29]:
for i in range (1,10):
    ridge3 = Ridge(alpha=i)
    model3 = ridge3.fit(X_train,y_train)
    y_test_pred3 = model3.predict(X_test)
    print('Alpha = ',i,' / MSE =',mean_squared_error(y_test, y_test_pred3))

Alpha =  1  / MSE = 40.12287359221361
Alpha =  2  / MSE = 40.56640387907688
Alpha =  3  / MSE = 41.048428768138834
Alpha =  4  / MSE = 41.563895409467236
Alpha =  5  / MSE = 42.10847913329299
Alpha =  6  / MSE = 42.67845279264998
Alpha =  7  / MSE = 43.27058435490195
Alpha =  8  / MSE = 43.882055642962186
Alpha =  9  / MSE = 44.51039714760829


MSE keeps on increasing with the penalty.

### Lasso

We're using the same process

In [30]:
from sklearn.linear_model import Lasso
lasso1 = Lasso(alpha=1)
model4 = lasso1.fit(X_train,y_train)

In [31]:
y_test_pred4 = model4.predict(X_test)

In [33]:
lasso_mse = mean_squared_error(y_test, y_test_pred4)
print('Lasso MSE'+' ',lasso_mse)

Lasso MSE  53.230755966755275


In [34]:
model4.coef_

array([-6.21048538,  3.97280159, -2.36889073, -1.6173548 ,  2.84854426])

In [35]:
for i in range (1,10):
    lasso3 = Lasso(alpha=i)
    model5 = lasso3.fit(X_train,y_train)
    y_test_pred5 = model5.predict(X_test)
    print('Alpha = ',i,' / MSE =',mean_squared_error(y_test, y_test_pred5))

Alpha =  1  / MSE = 53.230755966755275
Alpha =  2  / MSE = 81.19659961380344
Alpha =  3  / MSE = 102.08519715487013
Alpha =  4  / MSE = 118.42300565357677
Alpha =  5  / MSE = 137.7503644675328
Alpha =  6  / MSE = 152.03337767219023
Alpha =  7  / MSE = 165.00463287678195
Alpha =  8  / MSE = 178.96647591930483
Alpha =  9  / MSE = 188.29382147859874
