# Ridge Regression


**Ridge regression** is a type of linear regression that addresses the issue of multicollinearity (when independent variables are highly correlated) and overfitting by adding a penalty term to the cost function. This penalty term discourages large coefficients, leading to more stable and generalizable models.

In standard linear regression, the goal is to minimize the sum of squared errors (residuals) between the predicted values and actual values:

![alt text](images/std_regression.png "Standard Regression")

In ridge regression, we add a penalty term that is proportional to the sum of the squares of the coefficients:

![alt text](images/ridge_regression.png "Ridge Regression")

Here:

* 𝜆 is the regularization parameter that controls the strength of the penalty. A higher λ results in more regularization, leading to smaller coefficients.
* 𝛽𝑗 represents the coefficients of the model.
* The first term is the usual least-squares error.
* The second term is the ridge penalty that discourages large coefficients.

### Key Points:
* **Regularization:** Ridge regression applies L2 regularization, which penalizes the sum of the squared coefficients.
* **Control Overfitting:** By shrinking the coefficients, ridge regression helps prevent overfitting, especially when there are many features or multicollinearity in the data.
* **Solution:** Unlike standard linear regression, ridge regression tends to produce more stable models when there are many correlated features.

In [None]:
import pandas as pd

In [None]:
# Example for L2(Ridge) Regression

# Importing necessary libraries
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# Example dataset: predicting house prices based on features
data = {
    'bedrooms': [3, 4, 2, 5, 3, 4, 3, 2, 4, 3],
    'square_feet': [1500, 2000, 1200, 2500, 1800, 2200, 1600, 1300, 1900, 1700],
    'age': [10, 15, 7, 20, 12, 18, 9, 5, 16, 11],
    'price': [400000, 500000, 350000, 600000, 450000, 480000, 420000, 380000, 490000, 460000]
}

# Converting to a pandas DataFrame
df = pd.DataFrame(data)

# Features and target
X = df[['bedrooms', 'square_feet', 'age']]  # Features
y = df['price']  # Target

# Split data into train and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features (important for ridge regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create and train the ridge regression model
ridge_model = Ridge(alpha=1.0)  # alpha is the regularization strength (λ)
ridge_model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = ridge_model.predict(X_test_scaled)

# Evaluate the model using mean squared error (MSE)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)  # Root Mean Squared Error

print(f"Predicted Prices: {y_pred}")
print(f"Actual Prices: {y_test.values}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")


## Lasso Regression

* **Lasso Regression** (Least Absolute Shrinkage and Selection Operator) is another type of linear regression that also aims to address overfitting and multicollinearity, like ridge regression. However, unlike ridge regression, which uses L2 regularization (squaring the coefficients), lasso regression uses L1 regularization (taking the absolute values of the coefficients). This difference has significant consequences for how the model behaves.

![alt_text](images/lasso_regression.png "Lasso Regression")

Where:

* 𝑦i is the true value, and 
* 𝑦^𝑖 is the predicted value.
* 𝛽𝑗 are the coefficients.
* λ is the regularization parameter, controlling the strength of the penalty.
* The first term is the usual least-squares error.
* The second term is the L1 penalty (the sum of the absolute values of the coefficients).
**Key Points:**
* **L1 Regularization:** Lasso applies L1 regularization, which encourages sparsity (i.e., it forces some of the coefficients to become exactly zero).
* **Feature Selection:** Since some coefficients are driven to zero, lasso regression can be useful for feature selection by eliminating irrelevant or redundant features.
* **Overfitting Control:** Like ridge regression, lasso helps prevent overfitting, but it can also result in simpler, more interpretable models since irrelevant features are removed.
* Choosing 𝜆, λ is typically selected through cross-validation.

#### ** Differences Between Lasso and Ridge Regression:**
* **Ridge:** Uses L2 regularization (penalizing the sum of squared coefficients). It usually keeps all features but shrinks their values.
* **Lasso:** Uses L1 regularization (penalizing the sum of absolute values of coefficients). It tends to set some coefficients exactly to zero, effectively selecting a subset of features.

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

# Example dataset: predicting house prices based on features
data = {
    'bedrooms': [3, 4, 2, 5, 3, 4, 3, 2, 4, 3],
    'square_feet': [1500, 2000, 1200, 2500, 1800, 2200, 1600, 1300, 1900, 1700],
    'age': [10, 15, 7, 20, 12, 18, 9, 5, 16, 11],
    'price': [400000, 500000, 350000, 600000, 450000, 480000, 420000, 380000, 490000, 460000]
}

# Converting to a pandas DataFrame
df = pd.DataFrame(data)

# Features and target
X = df[['bedrooms', 'square_feet', 'age']]  # Features
y = df['price']  # Target

# Split data into train and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features (important for lasso regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create and train the lasso regression model
lasso_model = Lasso(alpha=0.1)  # alpha is the regularization strength (λ)
lasso_model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = lasso_model.predict(X_test_scaled)

# Evaluate the model using mean squared error (MSE)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)  # Root Mean Squared Error

print(f"Predicted Prices: {y_pred}")
print(f"Actual Prices: {y_test.values}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")


#### Explanation

**Regularization Parameter** 

λ (or alpha):
* **Low 𝜆:** If you set  λ to a very small value (or alpha close to 0), the model behaves similarly to linear regression (without regularization).
* **High 𝜆:** If 𝜆 is large, the coefficients are heavily penalized, potentially making the model simpler and reducing overfitting, but it can also lead to underfitting if too many features are eliminated.

**Key Differences from Ridge:**
* Lasso can zero out coefficients, making it better for feature selection.
* Ridge only shrinks coefficients but never eliminates them entirely.