# Ridge And Lasso Regression

Ridge and Lasso regression are two types of regularization techniques used in linear regression to prevent overfitting and improve model generalization

## Ridge Regression

### Purpose:

Ridge regression, also known as Tikhonov regularization, adds a penalty to the size of the coefficients in the linear regression model to reduce model complexity and multicollinearity.

### Penalty Term:

It adds a penalty proportional to the square of the magnitude of the coefficients

### Effect:

This penalty discourages large coefficients, which can help in reducing overfitting. Ridge regression tends to shrink all coefficients towards zero but does not necessarily set any of them exactly to zero.

## Lasso Regression

### Purpose:

Lasso regression, which stands for Least Absolute Shrinkage and Selection Operator, also aims to prevent overfitting by adding a penalty to the size of the coefficients but in a different way.

### Penalty Term:

It adds a penalty proportional to the absolute value of the coefficients.

### Effect:

The Lasso penalty can shrink some coefficients to exactly zero, effectively performing feature selection. This means Lasso can produce simpler models by excluding some variables entirely.

## Key Differences

### Penalty Type: 

Ridge uses the square of the coefficients, while Lasso uses the absolute value. This leads to different effects on the coefficients.
Feature Selection: Lasso can perform feature selection by setting some coefficients to zero, whereas Ridge tends to shrink coefficients but keeps all features in the model.

## Imports

In [115]:
import pandas as pd
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.metrics import mean_squared_error,mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures,StandardScaler

## Data Loading 

In [116]:

df = pd.read_csv('data/auto-mpg.csv') 

df.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


In [117]:


y = df[['mpg']]
X = df.drop(['mpg', 'car name', 'origin'], axis=1)

In [118]:

# Perform test train split
X_train , X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=12)

In [119]:
scaler = MinMaxScaler()
X_train_transformed = scaler.fit_transform(X_train)
X_test_transformed = scaler.transform(X_test)

In [120]:
# Build a ridge, lasso and regular linear regression model  
# Note that in scikit-learn, the regularization parameter is denoted by alpha (and not lambda)
ridge = Ridge(alpha=0.5)
ridge.fit(X_train_transformed, y_train)

lasso = Lasso(alpha=0.5)
lasso.fit(X_train_transformed, y_train)

lin = LinearRegression()
lin.fit(X_train_transformed, y_train)

In [121]:
# Generate preditions for training and test sets
y_h_ridge_train = ridge.predict(X_train_transformed)
y_h_ridge_test = ridge.predict(X_test_transformed)

y_h_lasso_train = lasso.predict(X_train_transformed)
y_h_lasso_test = lasso.predict(X_test_transformed)

y_h_lin_train = lin.predict(X_train_transformed)
y_h_lin_test = lin.predict(X_test_transformed)

In [122]:
print('Train Error Ridge Model', mean_absolute_error(y_train, y_h_ridge_train))
print('Test Error Ridge Model', mean_absolute_error(y_test, y_h_ridge_test))
print('\n')

print('Train Error Lasso Model', mean_absolute_error(y_train, y_h_lasso_train))
print('Test Error Lasso Model', mean_absolute_error(y_test, y_h_lasso_test))
print('\n')

print('Train Error Unpenalized Linear Model', mean_absolute_error(y_train, y_h_lin_train))
print('Test Error Unpenalized Linear Model', mean_absolute_error(y_test, y_h_lin_test))

Train Error Ridge Model 2.4166962586744964
Test Error Ridge Model 3.0494966518822952


Train Error Lasso Model 3.098655456639357
Test Error Lasso Model 4.023848271759962


Train Error Unpenalized Linear Model 2.4199602792106476
Test Error Unpenalized Linear Model 2.9813518258159175


We note that ridge is clearly better than lasso here, but that the unpenalized model performs best here. This makes sense because a linear regression model with these features is probably not overfitting, so adding regularization just contributes to underfitting

In [123]:
print('Ridge parameter coefficients:', ridge.coef_)
print('Lasso parameter coefficients:', lasso.coef_)
print('Linear model parameter coefficients:', lin.coef_)

Ridge parameter coefficients: [[ -2.06904445  -2.88593443  -1.81801505 -15.23785349  -1.45594148
    8.1440177 ]]
Lasso parameter coefficients: [-9.09743525 -0.         -0.         -4.02703963  0.          3.92348219]
Linear model parameter coefficients: [[ -1.33790698  -1.05300843  -0.08661412 -19.26724989  -0.37043697
    8.56051229]]


## Regularized Polynomial Regression vs. Polynomial Regression
Now let's compare this to a model built using PolynomialFeatures, which has more complexity than an ordinary multiple regression.

In [124]:
# Prepare data
poly = PolynomialFeatures(degree=6)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

X_train_transformed = scaler.fit_transform(X_train_poly)
X_test_transformed = scaler.transform(X_test_poly)


In [125]:
# Fit models
ridge.fit(X_train_transformed, y_train)
lasso.fit(X_train_transformed, y_train)
lin.fit(X_train_transformed, y_train)


In [126]:
# Generate predictions
y_h_ridge_train = ridge.predict(X_train_transformed)
y_h_ridge_test = ridge.predict(X_test_transformed)
y_h_lasso_train = lasso.predict(X_train_transformed)
y_h_lasso_test = lasso.predict(X_test_transformed)
y_h_lin_train = lin.predict(X_train_transformed)
y_h_lin_test = lin.predict(X_test_transformed)

In [127]:
# Display results
print('Train Error Polynomial Ridge Model', mean_absolute_error(y_train, y_h_ridge_train))
print('Test Error Polynomial Ridge Model', mean_absolute_error(y_test, y_h_ridge_test))
print('\n')
print('Train Error Polynomial Lasso Model', mean_absolute_error(y_train, y_h_lasso_train))
print('Test Error Polynomial Lasso Model', mean_absolute_error(y_test, y_h_lasso_test))
print('\n')
print('Train Error Unpenalized Polynomial Model', mean_absolute_error(y_train, y_h_lin_train))
print('Test Error Unpenalized Polynomial Model', mean_absolute_error(y_test, y_h_lin_test))
print('\n')

Train Error Polynomial Ridge Model 1.7373569289336006
Test Error Polynomial Ridge Model 2.138334086871118


Train Error Polynomial Lasso Model 3.136873255231132
Test Error Polynomial Lasso Model 4.045443327394729


Train Error Unpenalized Polynomial Model 1.3446394622244945e-09
Test Error Unpenalized Polynomial Model 140.97752346886767




In this case, the unpenalized model was overfitting. Therefore when ridge and lasso regression were applied, this reduced overfitting and made the overall model fit better. Note that the best model we have seen so far is the polynomial + ridge model, which seems to have the best balance of bias and variance.

In [128]:
print('Polynomial Ridge Parameter Coefficients:', len(ridge.coef_[ridge.coef_ != 0]), 
      'non-zero coefficient(s) and', len(ridge.coef_[ridge.coef_ == 0]), 'zeroed-out coefficient(s)')
print('Polynomial Lasso Parameter Coefficients:',  len(lasso.coef_[lasso.coef_ != 0]), 
      'non-zero coefficient(s) and', len(lasso.coef_[lasso.coef_ == 0]), 'zeroed-out coefficient(s)')
print('Polynomial Model Parameter Coefficients:',  len(lin.coef_[lin.coef_ != 0]), 
      'non-zero coefficient(s) and', len(lin.coef_[lin.coef_ == 0]), 'zeroed-out coefficient(s)')

Polynomial Ridge Parameter Coefficients: 923 non-zero coefficient(s) and 1 zeroed-out coefficient(s)
Polynomial Lasso Parameter Coefficients: 3 non-zero coefficient(s) and 921 zeroed-out coefficient(s)
Polynomial Model Parameter Coefficients: 924 non-zero coefficient(s) and 0 zeroed-out coefficient(s)
