# Notebook 1: Regularization Techniques (Lasso, Ridge, Elastic Net)

Welcome to the first notebook in our advanced machine learning series under **Part_3_Advanced_Topics**. In this notebook, we will explore **Regularization Techniques**, which are essential for preventing overfitting in regression models by adding a penalty to the complexity of the model.

We'll cover the following topics:
- What is Regularization?
- Key concepts: Overfitting, L1 (Lasso), L2 (Ridge), and Elastic Net
- How Regularization works
- Implementation using scikit-learn
- Advantages and limitations

## What is Regularization?

Regularization is a technique used in machine learning to prevent overfitting by adding a penalty term to the loss function of a model. Overfitting occurs when a model learns the noise in the training data rather than the underlying pattern, leading to poor generalization on unseen data.

Regularization methods like Lasso, Ridge, and Elastic Net constrain the model's complexity by penalizing large coefficients in regression models, thus improving performance on test data.

## Key Concepts

- **Overfitting:** When a model is too complex and fits the training data too closely, including noise, resulting in poor performance on new data.
- **L1 Regularization (Lasso):** Adds the absolute values of the coefficients as a penalty term to the loss function. It can drive some coefficients to exactly zero, effectively performing feature selection.
- **L2 Regularization (Ridge):** Adds the squared values of the coefficients as a penalty term. It shrinks coefficients towards zero but rarely to exactly zero, helping to reduce model sensitivity to individual features.
- **Elastic Net:** Combines L1 and L2 regularization, balancing feature selection (from Lasso) and coefficient shrinkage (from Ridge).
- **Hyperparameter (Alpha/Lambda):** Controls the strength of the regularization penalty. A higher value means more regularization.

## How Regularization Works

Regularization modifies the standard loss function (e.g., Mean Squared Error for regression) by adding a penalty term:

- **Standard Loss Function:** Minimize the error between predicted and actual values.
- **Regularized Loss Function:** Minimize the error + penalty term.
  - Lasso: Error + α * Σ|coefficients|
  - Ridge: Error + α * Σ(coefficients²)
  - Elastic Net: Error + α * [(1 - l1_ratio) * Σ(coefficients²) + l1_ratio * Σ|coefficients|]

The penalty term discourages large coefficients, which often indicate overfitting, thus simplifying the model and improving generalization.

## Implementation Using scikit-learn

Let's implement Linear Regression with and without regularization using scikit-learn. We'll compare the performance and observe the effect on model coefficients using a synthetic dataset.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate a synthetic dataset for regression with noise
X, y = make_regression(n_samples=100, n_features=10, n_informative=5, noise=15, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 1. Standard Linear Regression (No Regularization)
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)
mse_lr = mean_squared_error(y_test, y_pred_lr)
r2_lr = r2_score(y_test, y_pred_lr)
print(f'Standard Linear Regression - MSE: {mse_lr:.2f}, R2 Score: {r2_lr:.2f}')
print(f'Coefficients (Linear Regression): {lr_model.coef_}')

# 2. Lasso Regression (L1 Regularization)
lasso_model = Lasso(alpha=1.0, random_state=42)
lasso_model.fit(X_train, y_train)
y_pred_lasso = lasso_model.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
r2_lasso = r2_score(y_test, y_pred_lasso)
print(f'Lasso Regression - MSE: {mse_lasso:.2f}, R2 Score: {r2_lasso:.2f}')
print(f'Coefficients (Lasso): {lasso_model.coef_}')

# 3. Ridge Regression (L2 Regularization)
ridge_model = Ridge(alpha=1.0, random_state=42)
ridge_model.fit(X_train, y_train)
y_pred_ridge = ridge_model.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
r2_ridge = r2_score(y_test, y_pred_ridge)
print(f'Ridge Regression - MSE: {mse_ridge:.2f}, R2 Score: {r2_ridge:.2f}')
print(f'Coefficients (Ridge): {ridge_model.coef_}')

# 4. Elastic Net (Combination of L1 and L2)
elastic_model = ElasticNet(alpha=1.0, l1_ratio=0.5, random_state=42)
elastic_model.fit(X_train, y_train)
y_pred_elastic = elastic_model.predict(X_test)
mse_elastic = mean_squared_error(y_test, y_pred_elastic)
r2_elastic = r2_score(y_test, y_pred_elastic)
print(f'Elastic Net Regression - MSE: {mse_elastic:.2f}, R2 Score: {r2_elastic:.2f}')
print(f'Coefficients (Elastic Net): {elastic_model.coef_}')

# Visualize the coefficients for comparison
plt.figure(figsize=(10, 6))
x = np.arange(len(lr_model.coef_))
plt.plot(x, lr_model.coef_, marker='o', label='Linear Regression')
plt.plot(x, lasso_model.coef_, marker='s', label='Lasso')
plt.plot(x, ridge_model.coef_, marker='^', label='Ridge')
plt.plot(x, elastic_model.coef_, marker='d', label='Elastic Net')
plt.xlabel('Feature Index')
plt.ylabel('Coefficient Value')
plt.title('Comparison of Coefficients with Different Regularization Techniques')
plt.legend()
plt.grid(True)
plt.show()

## Advantages and Limitations

**Advantages:**
- Prevents overfitting by penalizing large coefficients, leading to better generalization on unseen data.
- Lasso can perform feature selection by setting some coefficients to zero, simplifying the model.
- Ridge handles multicollinearity well by shrinking correlated feature coefficients.
- Elastic Net combines the benefits of both Lasso and Ridge, offering a balance between feature selection and coefficient shrinkage.

**Limitations:**
- Requires tuning of hyperparameters (alpha for penalty strength, l1_ratio for Elastic Net) to achieve optimal performance.
- May not be effective if the model is underfitting or if overfitting is not the primary issue.
- Lasso may arbitrarily select one feature among highly correlated ones, which can be undesirable for interpretability.

## Conclusion

Regularization techniques like Lasso, Ridge, and Elastic Net are crucial tools in machine learning for managing model complexity and preventing overfitting. By understanding the differences between L1 and L2 penalties, you can choose the appropriate method based on your dataset and problem, whether you need feature selection, handling multicollinearity, or a balance of both.

In the next notebook, we will explore another advanced topic to further enhance our machine learning toolkit.