# **Ridge Regression**

Ridge Regression is a regularized version of Linear Regression: a regularization term equal to $\alpha \sum_{i=1}^n \theta_i^2$ is added to the cost function. This forces the learning algorithm to not only fit the data but also keep the model weights as small as possible. Note that the regularization term should only be added to the cost function during training. Once the model is trained, you want to evaluate the model's performance using the unregularized performance measure.

`Regularization` is a technique used in machine learning and statistics to prevent overfitting of models on training data. Overfitting occurs when a model learns the training data too well, including its noise and outliers, leading to poor generalization to new, unseen data. Regularization helps to solve this problem by adding a penalty to the model's complexity.

Ridge regression, also known as `Tikhonov regularization`, is a type of linear regression that includes a regularization term. The key idea behind ridge regression is to find a new line that doesn't fit the training data as well as ordinary least squares regression, in order to achieve better generalization to new data. 

`This is particularly useful when dealing with multicollinearity (independent variables are highly correlated) or when the number of predictors (features) exceeds the number of observations (rows).
`
### **Key Concept:**
- **Regularization**: Ridge regression adds a penalty equal to the square of the magnitude of coefficients. This penalty term (squared L2 norm) shrinks the coefficients towards zero, but it doesn't make them exactly zero.

### **Mathematical Representation**:
The ridge regression modifies the least squares objective function by adding a penalty term:

$$ \text{Minimize } \sum_{i=1}^{n} (y_i - \sum_{j=1}^{p} x_{ij} \beta_j)^2 + \lambda \sum_{j=1}^{p} \beta_j^2 $$

where:
- $ y_i $ is the response value for the ith observation.
- $ x_{ij} $ is the value of the jth predictor for the ith observation.
- $ \beta_j $ is the regression coefficient for the jth predictor.
- $ \lambda $ is the tuning parameter that controls the strength of the penalty; $ \lambda \geq 0 $.


In this code, `alpha` is the regularization strength $ lambda$. Adjusting `alpha` changes the strength of the regularization penalty. A larger `alpha` enforces stronger regularization (leading to smaller coefficients), and a smaller `alpha` tends towards a model similar to linear regression.

### **Key Points**:
- **Choosing Alpha**: Selecting the right value of `alpha` is crucial. It can be done using cross-validation techniques like `RidgeCV`.
- **Standardization**: It's often recommended to standardize the predictors `(features)` before applying ridge regression.
- **Bias-Variance Tradeoff**: Ridge regression balances the bias-variance tradeoff in model training.

In [1]:
from sklearn.linear_model import Ridge
import numpy as np

# Example data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# Target values
y = np.dot(X, np.array([1, 2])) + 3

# Ridge Regression Model
ridge_reg = Ridge(alpha=1.0)  # alpha is the equivalent of lambda in the formula
ridge_reg.fit(X, y)

# Coefficients
print("Coefficients:", ridge_reg.coef_)
# Intercept
print("Intercept:", ridge_reg.intercept_)

Coefficients: [0.8 1.4]
Intercept: 4.5


### **Comparision between the Ridge and the Linear Regression**:

In [3]:
# comparing the ridge regression with the normal linear regression:

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
print("Linear Regression Coefficients:", lin_reg.coef_)
print("Linear Regression Intercept:", lin_reg.intercept_)


Linear Regression Coefficients: [1. 2.]
Linear Regression Intercept: 3.0000000000000018


### **`Note`**:

1. Ridge regression is a technique used when the data suffers from multicollinearity (independent variables are highly correlated).

2. In multicollinearity, even though the least squares estimates (OLS) are unbiased, their variances are large which deviates the observed value far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.

3. The hyperparameter α controls the amount of regularization. If α = 0 then Ridge regression is just Linear regression. If α is very large, then all weights end up very close to zero and the result is a flat line going through the data’s mean.

### The Ridge regression cost function is:
 $J(θ) = MSE(θ) + α Σ θ²$

In [5]:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=1.0)

# fit the model
ridge_reg.fit(X,y)
ridge_reg.fit(X, y)
print("Linear Regression Coefficients:", ridge_reg.coef_)
print("Linear Regression Intercept:", ridge_reg.intercept_)


Linear Regression Coefficients: [0.8 1.4]
Linear Regression Intercept: 4.5


#### Now let's try it on titanic dataset:



In [6]:
# Importing Required Libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, mean_absolute_percentage_error
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


In [7]:
# import the dataset:

df = sns.load_dataset('titanic')
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


### Pre-processing the data:

In [9]:
columns = ['sex', 'age', 'fare', 'pclass', 'survived']
df = df[columns]

# handle the missing values:
df['age'] = df['age'].fillna(df['age'].median())

# Split the data into X and y:
X = df.drop('survived', axis=1)
y = df['survived']

# train test split:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Create a pipeline:

In [21]:
# Define a pipeline for OneHotEncoding and model
cat_features = ['sex']
num_features = ['pclass', 'age', 'fare']

# Preprocessor
preprocessor = ColumnTransformer(
    transformers=[
        ('num', 'passthrough', num_features),
        ('cat', OneHotEncoder(), cat_features)])

# Linear Regression Pipeline
lr_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                              ('regressor', LinearRegression())])

# Ridge Regression Pipeline
ridge_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                                 ('regressor', Ridge(alpha=1.0))])

### Trian and evalute the model:

In [26]:
# Train and evaluate Linear Regression
lr_pipeline.fit(X_train, y_train)

# predict the model:
lr_pred = lr_pipeline.predict(X_test)

# evaluate the metrics:
lr_mse = mean_squared_error(y_test, lr_pred)
lr_r2 = r2_score(y_test, lr_pred)
lr_mae = mean_absolute_error(y_test, lr_pred)
lr_mape = mean_absolute_percentage_error(y_test, lr_pred)
lr_rmse = np.sqrt(lr_mse)

# Train and evaluate Ridge Regression
ridge_pipeline.fit(X_train, y_train)

# predict the model:
ridge_pred = ridge_pipeline.predict(X_test)

# evaluate the model:
ridge_mse = mean_squared_error(y_test, ridge_pred)
ridge_r2 = r2_score(y_test, ridge_pred)
ridge_mae = mean_absolute_error(y_test, ridge_pred)
ridge_mape = mean_absolute_percentage_error(y_test, ridge_pred)
ridge_rmse = np.sqrt(ridge_mse)

# print the metrics:

print("Linear Regression MSE:", lr_mse)
print("Ridge Regression MSE:", ridge_mse)

print("Linear Regression R2:", lr_r2)
print("Ridge Regression R2:", ridge_r2)

print("Linear Regression MAE:", lr_mae)
print("Ridge Regression MAE:", ridge_mae)

print("Linear Regression MAPE:", lr_mape)
print("Ridge Regression MAPE:", ridge_mape)

print("Linear Regression RMSE:", lr_rmse)
print("Ridge Regression RMSE:", ridge_rmse)

Linear Regression MSE: 0.1371682053082538
Ridge Regression MSE: 0.13718838549258477
Linear Regression R2: 0.4343621021516396
Ridge Regression R2: 0.4342788855124956
Linear Regression MAE: 0.287745694224429
Ridge Regression MAE: 0.2882077593913576
Linear Regression MAPE: 645238867583785.4
Ridge Regression MAPE: 645983981155846.8
Linear Regression RMSE: 0.3703622622625769
Ridge Regression RMSE: 0.3703895051058882


In [27]:
# Ridge Regression with different alpha values:

alphas = [0.1, 0.5, 1.0, 2.0, 5.0, 10.0]

for alpha in alphas:
    ridge_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                                     ('regressor', Ridge(alpha=alpha))])
    ridge_pipeline.fit(X_train, y_train)
    ridge_pred = ridge_pipeline.predict(X_test)
    ridge_mse = mean_squared_error(y_test, ridge_pred)
    ridge_r2 = r2_score(y_test, ridge_pred)
    ridge_mae = mean_absolute_error(y_test, ridge_pred)
    ridge_mape = mean_absolute_percentage_error(y_test, ridge_pred)
    ridge_rmse = np.sqrt(ridge_mse)
    print(f"Ridge Regression with alpha={alpha} MSE:", ridge_mse)
    print(f"Ridge Regression with alpha={alpha} R2:", ridge_r2)
    print(f"Ridge Regression with alpha={alpha} MAE:", ridge_mae)
    print(f"Ridge Regression with alpha={alpha} MAPE:", ridge_mape)
    print(f"Ridge Regression with alpha={alpha} RMSE:", ridge_rmse)
    print("\n")

Ridge Regression with alpha=0.1 MSE: 0.13717016510037283
Ridge Regression with alpha=0.1 R2: 0.43435402059446004
Ridge Regression with alpha=0.1 MAE: 0.28779202724886926
Ridge Regression with alpha=0.1 MAPE: 645313585211062.1
Ridge Regression with alpha=0.1 RMSE: 0.37036490803040834


Ridge Regression with alpha=0.5 MSE: 0.13717813407700813
Ridge Regression with alpha=0.5 R2: 0.4343211590783246
Ridge Regression with alpha=0.5 MAE: 0.2879770777909805
Ridge Regression with alpha=0.5 MAPE: 645611996636465.5
Ridge Regression with alpha=0.5 RMSE: 0.37037566615128503


Ridge Regression with alpha=1.0 MSE: 0.13718838549258477
Ridge Regression with alpha=1.0 R2: 0.4342788855124956
Ridge Regression with alpha=1.0 MAE: 0.2882077593913576
Ridge Regression with alpha=1.0 MAPE: 645983981155846.8
Ridge Regression with alpha=1.0 RMSE: 0.3703895051058882


Ridge Regression with alpha=2.0 MSE: 0.13720984388428473
Ridge Regression with alpha=2.0 R2: 0.4341903979541355
Ridge Regression with alpha=2.0 MAE

## **About Me**

<div align="center">
  <img src="https://scontent.flhe6-1.fna.fbcdn.net/v/t39.30808-6/449152277_18043153459857839_8752993961510467418_n.jpg?_nc_cat=108&ccb=1-7&_nc_sid=127cfc&_nc_ohc=6slHzGIxf0EQ7kNvgEeodY9&_nc_ht=scontent.flhe6-1.fna&oh=00_AYCiVUtssn2d_rREDU_FoRbXvszHQImqOjfNEiVq94lfBA&oe=66861B78" width="150px" style="border-radius: 50%;"/>
</div>

## Muhammad Faizan

🎓 **3rd Year BS Computer Science** student at the **University of Agriculture, Faisalabad**  
💻 Enthusiast in **Machine Learning, Data Engineering, and Data Analytics**


### 🌐 Connect with Me

[Kaggle](https://www.kaggle.com/faizanyousafonly/) | [LinkedIn](https://www.linkedin.com/in/mrfaizanyousaf/) | [GitHub](https://github.com/faizan-yousaf/)  




### 💬 Contact Me
- **Email:** faizanyousaf815@gmail.com
- **WhatsApp:** [+92 306 537 5389](https://wa.me/923065375389)


🔗 **Let’s Collaborate:**  
I'm always open to queries, collaborations, and discussions. Let's build something amazing together!
