# Ridge Regression
Ridge Regression is a type of linear regression that includes a regularization term to prevent overfitting. The regularization term is the L2 penalty, which is the sum of the squared coefficients multiplied by a regularization parameter (lambda). This penalty term shrinks the coefficients towards zero, but not exactly zero, which helps to reduce the model complexity and multicollinearity.

The objective function for Ridge Regression is:

$$
\text{Minimize} \left( \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} \beta_j^2 \right)
$$

where:
- $ y_i $ is the actual value
- $ \hat{y}_i $ is the predicted value
- $ \beta_j $ are the coefficients
- $ \lambda $ is the regularization parameter

Ridge Regression is particularly useful when there are many predictors, and multicollinearity is present. It helps to improve the model's generalization by adding bias but reducing variance.

In [13]:
# import libraries 
from sklearn.linear_model import Ridge
import numpy as np

# Example data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# Target variable
y = np.dot(X, np.array([1, 2])) + 3

# Create a Ridge regression model
ridge_model = Ridge(alpha=1.0) # aplha is the equivalent of lambda in the equation
# Fit the model to the data
ridge_model.fit(X, y)

# Coefficients
print("Coefficients:", ridge_model.coef_)
# Intercept
print("Intercept:", ridge_model.intercept_)

Coefficients: [0.8 1.4]
Intercept: 4.5


# Comparing Simple Linear Regression and Ridge Regression

## Import Libraries & Load the dataset

In [14]:
# import libraries
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import Ridge , LinearRegression
from sklearn.model_selection import train_test_split    
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, mean_absolute_percentage_error
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Load the dataset
df = sns.load_dataset('titanic')

## Preprocessing of data


In [15]:
# Selecting a subset of columns for simplicity
columns_to_use = ['survived', 'pclass' , 'sex' , 'age', 'fare']
df = df[columns_to_use]
# Handling missing values
df['age'] = df['age'].fillna(df['age'].median())
df['fare'] = df['fare'].fillna(df['fare'].median())

# Defining features and target variable
X = df.drop('survived', axis=1)
y = df['survived']
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Creating a pipeline

In [16]:
# Define a pipeline for preprocessing and modeling
categorical_features = ['sex']
numeric_features = ['pclass', 'age', 'fare']

# preprocess
preprocessor = ColumnTransformer(
    transformers=[
        ('num', 'passthrough', numeric_features),
        ('cat', OneHotEncoder(), categorical_features)
    ])
# Create a linear regression pipeline
lr_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', LinearRegression())
])
# Create a Ridge regression pipeline
ridge_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', Ridge(alpha=1.0))
])

## Train and evaluate the model 

In [18]:
# Train the linear regression model
lr_pipeline.fit(X_train, y_train)
# Train the Ridge regression model
ridge_pipeline.fit(X_train, y_train)
# Make predictions
y_pred_lr = lr_pipeline.predict(X_test)
y_pred_ridge = ridge_pipeline.predict(X_test)
# Calculate mean squared error
mse_lr = mean_squared_error(y_test, y_pred_lr)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
# Calculate R-squared
r2_lr = r2_score(y_test, y_pred_lr)
r2_ridge = r2_score(y_test, y_pred_ridge)
# Calculate mean absolute error
mae_lr = mean_absolute_error(y_test, y_pred_lr)
mae_ridge = mean_absolute_error(y_test, y_pred_ridge)
# Calculate mean absolute percentage error
mape_lr = mean_absolute_percentage_error(y_test, y_pred_lr)
mape_ridge = mean_absolute_percentage_error(y_test, y_pred_ridge)
# Print the results
print("Linear Regression MSE:", mse_lr)
print("Ridge Regression MSE:", mse_ridge)
print("Linear Regression R-squared:", r2_lr)
print("Ridge Regression R-squared:", r2_ridge)
print("Linear Regression MAE:", mae_lr)
print("Ridge Regression MAE:", mae_ridge)
print("Linear Regression MAPE:", mape_lr)
print("Ridge Regression MAPE:", mape_ridge)


Linear Regression MSE: 0.1371682053082538
Ridge Regression MSE: 0.13718838549258475
Linear Regression R-squared: 0.4343621021516396
Ridge Regression R-squared: 0.4342788855124957
Linear Regression MAE: 0.287745694224429
Ridge Regression MAE: 0.28820775939135757
Linear Regression MAPE: 645238867583785.1
Ridge Regression MAPE: 645983981155847.0
