# Understanding Lasso and Ridge Regression

This document explains Lasso and Ridge Regression, including their theory, usage, and practical applications, along with answers to common questions.

## What is Ridge Regression (L2 Regularization)?

Ridge Regression is an improved version of Linear Regression that prevents overfitting by adding a penalty term. This penalty term is proportional to the square of the coefficients (L2 norm).

- **Purpose**: To reduce the impact of unimportant features by shrinking the coefficients.
- **Key Idea**: It minimizes the sum of the squared residuals (errors) and adds a penalty proportional to the square of the coefficients.
- **Formula**:
  $$
  \text{Loss} = \text{Residual Sum of Squares (RSS)} + \alpha \sum_{j=1}^n \beta_j^2
$$
  Where (alpha) is the regularization parameter, and (beta_j) are the coefficients of the features.
- **Effect**: All features are included, but their coefficients are shrunk towards zero. No coefficient becomes exactly zero.

### Use Cases of Ridge Regression:
1. When all features are important, but their impact needs to be controlled.
2. When there is multicollinearity (high correlation between features).
3. When overfitting is a concern in linear regression models.

## What is Lasso Regression (L1 Regularization)?

Lasso Regression is another version of Linear Regression that adds a penalty proportional to the absolute value of the coefficients (L1 norm). It is widely used for feature selection.

- **Purpose**: To reduce the number of irrelevant features by shrinking some coefficients to exactly zero.
- **Key Idea**: It minimizes the sum of the squared residuals (errors) and adds a penalty proportional to the absolute sum of the coefficients.
- **Formula**:
$$
  \text{Loss} = \text{Residual Sum of Squares (RSS)} + \alpha \sum_{j=1}^n |\beta_j|
  $$
- **Effect**: Some feature coefficients become exactly zero, effectively excluding them from the model.

### Use Cases of Lasso Regression:
1. When irrelevant features need to be removed automatically.
2. When feature selection is required in high-dimensional datasets.
3. To prevent overfitting while also simplifying the model.

## Common Questions and Answers

1. **Why do we use Ridge and Lasso Regression?**
   - To improve the performance of Linear Regression by preventing overfitting.
   - Ridge controls the magnitude of coefficients, while Lasso can eliminate irrelevant features entirely.

2. **Where can we use Ridge Regression?**
   - Ridge Regression is useful in scenarios where all features are considered important, but their influence needs to be balanced. For example, predicting house prices using multiple correlated features like area, bedrooms, and location.

3. **Where can we use Lasso Regression?**
   - Lasso Regression is ideal when feature selection is required. For instance, in a medical dataset with hundreds of variables, Lasso can identify the most critical variables affecting patient health.

4. **How do we choose between Ridge and Lasso?**
   - Use Ridge when all features are important, and you want to control their impact.
   - Use Lasso when you suspect some features are irrelevant and want the model to exclude them automatically.

5. **What is the role of the (alpha) parameter in both methods?**
   - The (alpha) parameter controls the strength of regularization. A higher (alpha) value increases the penalty, shrinking the coefficients more aggressively.

6. **Can we use Ridge and Lasso together?**
   - Yes, Elastic Net combines Ridge and Lasso penalties, allowing both feature selection and coefficient shrinkage.

## Steps to Use Ridge or Lasso Regression in Python

1. **Prepare the Dataset:**
   - Collect data and clean it (handle missing values, normalize features if necessary).
2. **Split the Dataset:**
   - Divide the data into training and testing sets (e.g., 80% training, 20% testing).
3. **Fit the Model:**
   - Use Ridge or Lasso from Python’s `sklearn` library to train the model on the training dataset.
4. **Evaluate the Model:**
   - Test the model on unseen data (testing set) and check its performance using metrics like Mean Squared Error (MSE).

## Conclusion

- **Ridge Regression** is best for handling multicollinearity and controlling coefficient sizes when all features are relevant.
- **Lasso Regression** is best for feature selection when irrelevant features are present.
- Both methods enhance the performance of Linear Regression and make it more robust for real-world applications.


In [1]:
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd

data = {
    "area": [1200, 1500, 2000, 2500, 3000],
    "bedrooms": [2, 3, 3, 4, 5],
    "bathrooms": [1, 2, 2, 3, 3],
    "age": [10, 5, 20, 15, 8],
    "price": [200000, 250000, 300000, 350000, 400000]
}
df = pd.DataFrame(data)

# Features (X) and Target (y)
X = df[["area", "bedrooms", "bathrooms", "age"]]
y = df["price"]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Ridge Regression
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
ridge_predictions = ridge.predict(X_test)

# Lasso Regression
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
lasso_predictions = lasso.predict(X_test)

# Evaluate Models
print("Ridge MSE:", mean_squared_error(y_test, ridge_predictions))
print("Lasso MSE:", mean_squared_error(y_test, lasso_predictions))

# Check coefficients
print("Ridge Coefficients:", ridge.coef_)
print("Lasso Coefficients:", lasso.coef_)



Ridge MSE: 459802525.60451496
Lasso MSE: 534638903.95028365
Ridge Coefficients: [ 112.41263586  -46.80416493 -257.59942843  988.15505279]
Lasso Coefficients: [  115.62352921  -669.34528569 -2036.99351894  1020.6422698 ]
