# Regularization (Ridge and Lasso)

## Problem Type
**Regularization** is primarily used for:
- **Regression** and **Classification** problems
- **Supervised** learning

### How Regularization Works
- **Introduces a penalty term** to the loss function to discourage overly complex models, which helps prevent overfitting.
- **Ridge Regularization (L2):**
  - Adds the squared magnitude of coefficients as a penalty term.
  - Shrinks coefficients but does not set any to zero, retaining all features.
- **Lasso Regularization (L1):**
  - Adds the absolute value of coefficients as a penalty term.
  - Can shrink some coefficients to zero, effectively performing feature selection.
- **Elastic Net:**
  - Combines both L1 and L2 penalties, balancing the strengths of Ridge and Lasso.

### Key Tuning Metrics
- **`alpha`:**
  - **Description:** Controls the regularization strength for both Lasso and Ridge (`alpha = λ` in scikit-learn).
  - **Impact:** Larger values of `alpha` increase the regularization, which reduces overfitting but can also lead to underfitting if too high.
  - **Default:** `alpha = 1.0` is the default value.
- **`l1_ratio`:**
  - **Description:** Used in Elastic Net to mix L1 and L2 penalties. `l1_ratio = 0` is equivalent to Ridge, while `l1_ratio = 1` is equivalent to Lasso.
  - **Impact:** Determines the balance between L1 and L2 regularization in Elastic Net.
- **`fit_intercept`:**
  - **Description:** Whether to calculate the intercept for this model.
  - **Impact:** Setting to `False` forces the model to pass through the origin.
  - **Default:** `True`, calculating the intercept.
- **`max_iter`:**
  - **Description:** Maximum number of iterations for the solver to converge.
  - **Impact:** Increase if the model does not converge.
  - **Default:** `1000` for most regularization techniques.

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Helps prevent overfitting by penalizing large coefficients | Can lead to underfitting if regularization is too strong |
| Lasso can perform automatic feature selection          | Ridge does not reduce coefficients to zero              |
| Elastic Net balances the benefits of Lasso and Ridge   | Lasso may struggle with correlated features             |
| Improves generalization of the model                   | Regularization adds complexity to model tuning          |
| Useful when dealing with multicollinearity             | Requires careful selection of regularization strength   |

### Evaluation Metrics
- **Mean Squared Error (MSE):**
  - **Description:** Average of squared errors between predicted and actual values.
  - **Good Value:** Lower values indicate fewer errors, with the model appropriately balanced between bias and variance.
  - **Bad Value:** Higher values suggest poor model performance, possibly due to too much regularization.
- **R-squared (R²):**
  - **Description:** Proportion of variance in the dependent variable that is predictable from the independent variables.
  - **Good Value:** Closer to 1 indicates a good fit, but values should be carefully interpreted in regularized models.
  - **Bad Value:** Closer to 0 suggests the model does not explain much of the variance.
- **Cross-Validation Score:**
  - **Description:** Average performance across multiple subsets of the dataset, useful for assessing model generalization.
  - **Good Value:** Higher scores indicate better generalization.
  - **Bad Value:** Low or highly variable scores across folds suggest poor generalization or model instability.
- **Coefficient Magnitudes:**
  - **Description:** Size of the model coefficients; regularization should ideally reduce these without sacrificing accuracy.
  - **Good Value:** Smaller, more uniform coefficients indicate effective regularization.
  - **Bad Value:** Large or highly varying coefficients may suggest insufficient regularization.



In [7]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
housing = fetch_california_housing()
print(housing.DESCR)

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
    - MedInc        median income in block group
    - HouseAge      median house age in block group
    - AveRooms      average number of rooms per household
    - AveBedrms     average number of bedrooms per household
    - Population    block group population
    - AveOccup      average number of household members
    - Latitude      block group latitude
    - Longitude     block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per ce

In [3]:
dir(housing)

['DESCR', 'data', 'feature_names', 'frame', 'target', 'target_names']

In [4]:
X, y = housing.data, housing.target

In [5]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

### Ridge

In [8]:
# Initialize Ridge regression model with L2 regularization
# You can adjust the alpha (lambda) value
ridge_model = Ridge(
    alpha=1.0, fit_intercept=True, max_iter=1000, solver="auto", random_state=42
)

# Fit the model to the training data
ridge_model.fit(X_train, y_train)

# Predict on the test set
y_pred = ridge_model.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error (L2 Regularization): {mse:.2f}")
print(f'R2 score (L2 Regularization): {r2}')

Mean Squared Error (L2 Regularization): 0.56
R2 score (L2 Regularization): 0.5758549611440139


### Lasso

In [9]:
# Initialize Lasso regression model with L1 regularization
# You can adjust the alpha (lambda) value
lasso_model = Lasso(
    alpha=1.0, fit_intercept=True, max_iter=1000,random_state=42
)

# Fit the model to the training data
lasso_model.fit(X_train, y_train)

# Predict on the test set
y_pred = lasso_model.predict(X_test)

# Calculate Mean Squared Error (MSE) and R2 score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error (L1 Regularization): {mse:.2f}")
print(f'R2 score (L1 Regularization): {r2}')

Mean Squared Error (L1 Regularization): 0.94
R2 score (L1 Regularization): 0.2841671821008396


### ElasticNet

In [10]:
# Initialize ElasticNet regression model with both L1 and L2 regularization
# You can adjust the alpha (lambda) value and the l1_ratio
elasticnet_model = ElasticNet(
    alpha=1.0, l1_ratio=0.5, fit_intercept=True, max_iter=1000, random_state=42
)

# Fit the model to the training data
elasticnet_model.fit(X_train, y_train)

# Predict on the test set
y_pred = elasticnet_model.predict(X_test)

# Calculate Mean Squared Error (MSE) and R2 score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error (ElasticNet): {mse:.2f}")
print(f'R2 score (ElasticNet): {r2}')

Mean Squared Error (ElasticNet): 0.76
R2 score (ElasticNet): 0.41655189098028234
