### Ridge Regression

**Ridge Regression** is a type of linear regression that is used when the data has **multicollinearity** (i.e., when independent variables are highly correlated). Ridge regression adds a **penalty** to the size of the coefficients to reduce overfitting.

### Key Concepts:

- **Regular Linear Regression**: It tries to find a line (or hyperplane in higher dimensions) that best fits the data. However, when features are highly correlated, linear regression may overfit the data, making predictions less reliable for new data.
  
- **Ridge Regression**: Adds a penalty term to the linear regression's cost function. This penalty discourages large coefficients, which helps prevent overfitting by shrinking the coefficients.

### Ridge Regression Formula:
The cost function for Ridge Regression is modified from simple linear regression by adding a penalty term:
\[
\text{Cost function} = \text{Sum of squared errors} + \lambda \times (\text{sum of squared coefficients})
\]
Where:
- \(\lambda\) is the regularization parameter (also called **tuning parameter**). It controls the strength of the penalty.
  - When \(\lambda = 0\), it becomes standard linear regression.
  - When \(\lambda\) is very large, the model becomes more biased but less sensitive to noise (i.e., it reduces variance).

### Advantages of Ridge Regression:
- **Prevents Overfitting**: By shrinking the coefficients, Ridge Regression helps avoid overfitting, especially in cases where there are many features or multicollinearity.
- **Useful with Multicollinearity**: Ridge performs well when independent variables are highly correlated.
- **Stability**: Produces more stable and reliable predictions.

### Disadvantages:
- **Bias**: As it shrinks the coefficients, Ridge introduces some bias into the model, so predictions might be less accurate on training data but generalize better to new data.
- **Interpretability**: The shrunk coefficients may not always be as interpretable as in regular linear regression.

### Simple Example:
- **Scenario**: You want to predict house prices based on multiple factors like size, location, and age.
- If some features (like number of rooms and house size) are highly correlated, Ridge Regression will prevent overfitting by shrinking their coefficients, providing a more stable prediction.

### Why Use Ridge Regression:
- To improve model generalization when you have many features or when independent variables are correlated.
  


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
from sklearn.datasets import load_diabetes
data = load_diabetes()

In [3]:
X=data.data
y=data.target

In [4]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=45)

In [5]:
from sklearn.linear_model import LinearRegression
L=LinearRegression()

In [6]:
L.fit(X_train,y_train)

In [7]:
print(L.coef_)
print(L.intercept_)

[  23.45465406 -247.42747406  492.1087518   329.35876431 -970.79723039
  573.54295519  182.42162368  255.92168168  794.21609282   89.32249214]
152.13623331746496


In [8]:
y_pred=L.predict(X_test)

In [9]:
from sklearn.metrics import r2_score,mean_squared_error

print("R2 score",r2_score(y_test,y_pred))
print("RMSE",np.sqrt(mean_squared_error(y_test,y_pred)))

R2 score 0.5188113124539249
RMSE 48.72713760953253


In [41]:
from sklearn.linear_model import Ridge
R = Ridge(alpha=0.00001)

In [42]:
R.fit(X_train, y_train)

In [43]:
y_pred1 = R.predict(X_test)

In [44]:
print('R2 score: ', r2_score(y_test, y_pred1))
print('RMSE: ', np.sqrt(mean_squared_error(y_test, y_pred1)))

R2 score:  0.5188278514667455
RMSE:  48.72630019824301
