# **Ridge Regression - Machine Learning**

<h1 style="font-family: 'poppins'; font-weight: bold; color: Blue;">👨‍💻Author: Muhammad Hassaan</h1>

[![GitHub](https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github)](https://github.com/MHassaan2) 
[![Kaggle](https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle)](https://www.kaggle.com/mhassaan1122) 
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/iammuhammadhassaan7/)  
[![Email](https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=email)](mailto:muhammadhassaan7896@gmail.com)


* **Ridge Regression** is a regularized version of Linear Regression: a regularization term equal to $\alpha \sum_{i=1}^n \theta_i^2$ is added to the cost function. This forces the learning algorithm to not only fit the data but also keep the model weights as small as possible. Note that the regularization term should only be added to the cost function during training. Once the model is trained, you want to evaluate the model's performance using the unregularized performance measure.

* **Ridge regression**, also known as Tikhonov regularization, is a type of linear regression that includes a regularization term. The key idea behind ridge regression is to find a new line that doesn't fit the training data as well as ordinary least squares regression, in order to achieve better generalization to new data. This is particularly useful when dealing with multicollinearity (independent variables are highly correlated) or when the number of predictors (features) exceeds the number of observations.

### Key Concept:
- **Regularization**: Ridge regression adds a penalty equal to the square of the magnitude of coefficients. This penalty term (squared L2 norm) shrinks the coefficients towards zero, but it doesn't make them exactly zero.

### Mathematical Representation:
The ridge regression modifies the least squares objective function by adding a penalty term:

$$ \text{Minimize } \sum_{i=1}^{n} (y_i - \sum_{j=1}^{p} x_{ij} \beta_j)^2 + \lambda \sum_{j=1}^{p} \beta_j^2 $$

where:
- $ y_i $ is the response value for the ith observation.
- $ x_{ij} $ is the value of the jth predictor for the ith observation.
- $ \beta_j $ is the regression coefficient for the jth predictor.
- $ \lambda $ is the tuning parameter that controls the strength of the penalty; $ \lambda \geq 0 $.


In this code, `alpha` is the regularization strength \( \lambda \). Adjusting `alpha` changes the strength of the regularization penalty. A larger `alpha` enforces stronger regularization (leading to smaller coefficients), and a smaller `alpha` tends towards a model similar to linear regression.

### Key Points:
- **Choosing Alpha**: Selecting the right value of `alpha` is crucial. It can be done using cross-validation techniques like `RidgeCV`.
- **Standardization**: It's often recommended to standardize the predictors before applying ridge regression.
- **Bias-Variance Tradeoff**: Ridge regression balances the bias-variance tradeoff in model training.

In [4]:
# import libraries 
from sklearn.linear_model import Ridge
import numpy as np 

# example data 
X = np.array([[1, 1], [1, 2], [2,2], [2,3]])
# target values
y = np.dot(X, np.array([1, 2])) + 3

# ridge regression model 
ridge = Ridge(alpha=1.0) # alpha is the equivalent of lambda in the formula
ridge.fit(X, y)

# print coefficients
print('Coefficient:', ridge.coef_) # coefficients of the model

# print intercept
print('Intercept: ', ridge.intercept_) # intercept of the model

Coefficient: [0.8 1.4]
Intercept:  4.5


## **Comparing simple linear regression vs. Ridge Regression**

In [5]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, mean_absolute_percentage_error
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# remove warnings
import warnings
warnings.filterwarnings("ignore")

# load data
df = sns.load_dataset('titanic')

## Preprocess the data

In [6]:
# slecting a subset of columns for simplicity
columns = ['survived', 'pclass', 'sex', 'age', 'fare']
df = df[columns]

# handling missing values
df['age'].fillna(df['age'].median(), inplace=True)

# define feature and target variable 
X = df.drop('survived', axis=1)
y = df['survived']

# split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Creating a Pipeline

In [7]:
# define a pipeline for OnehotEncoding and model
categorical_features = ['sex']
numeric_features = ['pclass', 'age', 'fare']

# preprocessor
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(), categorical_features),
        ('num', 'passthrough', numeric_features)
    ])

# linear regression pipeline
lr_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('regressor', LinearRegression())])

# ridge regression pipeline
ridge_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('regressor', Ridge())])

## Train and Evaluate the Models

In [8]:
# train and evaluate linear regression
lr_pipeline.fit(X_train, y_train)
lr_pred = lr_pipeline.predict(X_test)
lr_mse = mean_squared_error(y_test, lr_pred)
lr_mae = mean_absolute_error(y_test, lr_pred)
lr_mape = mean_absolute_percentage_error(y_test, lr_pred)
lr_r2 = r2_score(y_test, lr_pred)
lr_rmse = np.sqrt(lr_mse)

# train and evaluate ridge regression
ridge_pipeline.fit(X_train, y_train)
ridge_pred = ridge_pipeline.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_pred)
ridge_mae = mean_absolute_error(y_test, ridge_pred)
ridge_mape = mean_absolute_percentage_error(y_test, ridge_pred)
ridge_r2 = r2_score(y_test, ridge_pred)
ridge_rmse = np.sqrt(ridge_mse)

# print all the results
print('Linear Regression')
print('MSE:', lr_mse)
print('MAE:', lr_mae)
print('MAPE:', lr_mape)
print('R2:', lr_r2)
print('RMSE:', lr_rmse)

print('-' * 100)
print('\nRidge Regression')
print('MSE:', ridge_mse)
print('MAE:', ridge_mae)
print('MAPE:', ridge_mape)
print('R2:', ridge_r2)
print('RMSE:', ridge_rmse)



Linear Regression
MSE: 0.13716820530825372
MAE: 0.28774569422442936
MAPE: 645238867583785.2
R2: 0.43436210215163995
RMSE: 0.3703622622625768
----------------------------------------------------------------------------------------------------

Ridge Regression
MSE: 0.13718838549258477
MAE: 0.28820775939135773
MAPE: 645983981155847.5
R2: 0.4342788855124956
RMSE: 0.3703895051058882


---