# Module 1: Introduction to Scikit-Learn

## Part 4: ElasticNet Regression

In this part, we will explore ElasticNet regression, a linear regression technique that combines the strengths of both Lasso and elasticnet regression.

### 4.1 Understanding ElasticNet regression

ElasticNet regression is a linear regression technique that extends ordinary least squares regression by adding both L1 and L2 penalties to the objective function. The L1 penalty promotes sparsity by driving some coefficients to exactly zero (similar to Lasso regression), while the L2 penalty promotes shrinkage of the remaining coefficients towards zero (similar to elasticnet regression).

The key idea behind ElasticNet regression is to find a balance between fitting the training data well, selecting relevant features, and keeping the model coefficients small. By adding both L1 and L2 penalties, ElasticNet regression provides a flexible approach for variable selection and regularization.

In Elastic Net regression, the objective is to minimize the following cost function:

$\text{J}\theta = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2+\alpha_1\sum_{j=1}^{n}|\theta_j^2|+\alpha_2\sum_{j=1}^{n}\theta_j^2$

Where:

- $m$ is the number of training examples.
- $h_\theta(x^{(i)})$ is the predicted output for the $i$-th training example using the model parameters $\theta$.
- $y^{(i)}$ is the actual output for the $i$-th training example.
- $n$ is the number of features (excluding the bias term).
- $\theta_j$ represents the value of the coefficient of the $j$-th feature.
- $\alpha_1 \alpha_2$ are the regularization parameters for L1 and L2 regularization, respectively.

The hybrid ElasticNet regression is especially good at dealing with situations when there are correlations between parameters. This is because Lasso regression tends to pick just one of the correlated terms and eliminates the others whereas ridge regression tends to shrink all of the parameters for the correlated variables together.
By combining lasso and ridge regression elasticnet regression groups and shrinks the parameters associated with the correlated variables and leaves then in equation or removes them all at once.

Advantages of Elastic Net Regression
1. Combines L1 and L2 Regularization: Elastic Net regression combines the strengths of L1 and L2 regularization, which makes it more flexible and better suited for handling high-dimensional datasets.
2. Feature Selection and Coefficient Shrinkage: Elastic Net regression can perform both feature selection and coefficient shrinkage. It sets some coefficients exactly to zero (feature selection) and shrinks others towards zero (coefficient shrinkage).
3. Robust to Multicollinearity: Elastic Net regression is more robust to multicollinearity compared to Lasso regression, making it suitable for datasets with highly correlated features.
4. Stability: Elastic Net regression provides a stable solution, even when the number of features is much larger than the number of samples.

Disadvantages of Elastic Net Regression
1. Hyperparameter Tuning: Elastic Net regression requires tuning two hyperparameters (, which can be challenging and time-consuming.
2. Computationally Intensive: Training an Elastic Net regression model can be computationally intensive, especially for large datasets and high-dimensional feature spaces.

### 4.2 Training and Evaluation

In Scikit-Learn, Elastic Net regression can be implemented using the ElasticNet class from the linear_model module. The regularization parameters ​$\alpha_1$ and $\alpha_2$ can be set using the $\alpha$ hyperparameter. Then $l1\_ratio$ hyperparameter determines the balance between L1 and L2 regularization, where a value of 1 corresponds to pure Lasso regression, and a value of 0 corresponds to pure Ridge regression. Cross-validation techniques, such as grid search or randomized search, can be used to find the optimal values of alpha and l1_ratio.

To train an ElasticNet regression model, we need a labeled dataset with the target variable and the corresponding feature values. The model learns by minimizing the regularized objective function, which includes the sum of squared residuals from the ordinary least squares regression, the L1 penalty term, and the L2 penalty term.

Once trained, we can use the ElasticNet regression model to make predictions for new, unseen data points. The model predicts the target values based on the learned coefficients and the feature values.

Scikit-Learn provides the ElasticNet class for performing ElasticNet regression. Here's an example of how to use it:

In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score

np.random.seed(42)  # Set a fixed seed for reproducibility
# Load the Boston Housing dataset
boston = load_breast_cancer()
X, y = boston.data, boston.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit and evaluate linear regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
linear_predictions = linear_model.predict(X_test)
linear_mae = mean_absolute_error(y_test, linear_predictions)
linear_r2 = r2_score(y_test, linear_predictions)
print("Linear Model Mean Absolute Error (MAE):", linear_mae)
print("Linear Model R-squared (R2) Score:", linear_r2)

# Fit and evaluate Lasso regression model
lasso_model = Lasso(max_iter=100000, alpha=1.0)  # Setting alpha to 1.0 to add regularization (L1 penalty)
lasso_model.fit(X_train, y_train)
lasso_predictions = lasso_model.predict(X_test)
lasso_mae = mean_absolute_error(y_test, lasso_predictions)
lasso_r2 = r2_score(y_test, lasso_predictions)
print("\nLasso Model Mean Absolute Error (MAE):", lasso_mae)
print("Lasso Model R-squared (R2) Score:", lasso_r2)

# Fit and evaluate Ridge regression model
ridge_model = Ridge(max_iter=100000, alpha=1.0)  # Setting alpha to 1.0 to add regularization (L2 penalty)
ridge_model.fit(X_train, y_train)
ridge_predictions = ridge_model.predict(X_test)
ridge_mae = mean_absolute_error(y_test, ridge_predictions)
ridge_r2 = r2_score(y_test, ridge_predictions)
print("\nRidge Model Mean Absolute Error (MAE):", ridge_mae)
print("Ridge Model R-squared (R2) Score:", ridge_r2)

# Fit and evaluate ElasticNet regression model
elasticnet_model = ElasticNet(max_iter=100000, alpha=1.0)  # Setting alpha to 1.0 to add regularization (L2 penalty)
elasticnet_model.fit(X_train, y_train)
elasticnet_predictions = elasticnet_model.predict(X_test)
elasticnet_mae = mean_absolute_error(y_test, elasticnet_predictions)
elasticnet_r2 = r2_score(y_test, elasticnet_predictions)
print("\nElasticnet Model Mean Absolute Error (MAE):", elasticnet_mae)
print("Elasticnet Model R-squared (R2) Score:", elasticnet_r2)

In this example, we compared the performance of four different regression models on the Breast Cancer dataset from Scikit-Learn. The metrics used for evaluation were Mean Absolute Error (MAE) and R-squared (R2) score.

Results showed that the Ridge Regression model had the lowest MAE (0.1889) and the highest R2 score (0.7472), indicating the best overall performance among the tested models. The Linear Regression model performed well with a slightly higher MAE (0.1969) and a reasonably high R2 score (0.7271).

The Lasso Regression model had a higher MAE (0.2394) and a lower R2 score (0.6233) compared to the other models. ElasticNet Regression achieved intermediate results with an MAE of 0.2255 and an R2 score of 0.6705.

Overall, the Ridge Regression model demonstrated the best performance on this dataset but additionally hyperparameter tuning such as fine-tuning the alpha parameter, could potentially improve the performance.

### 4.3 Hyperparameter tunning

ElasticNet regression has a hyperparameter called alpha that controls the strength of the regularization. Higher values of alpha result in stronger regularization and smaller coefficients. The choice of the alpha value depends on the trade-off between bias and variance. The other hyperparameter is l1_ratio. This one determines the balance between L1 and L2 regularization, where a value of 1 corresponds to pure Lasso regression, and a value of 0 corresponds to pure Ridge regression.
The Cross-validation techniques, such as grid search or randomized search, can be used to find the optimal value of alpha and l1_ratio.

In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.model_selection import train_test_split, GridSearchCV

np.random.seed(42)  # Set a fixed seed for reproducibility
# Load the Boston Housing dataset
boston = load_breast_cancer()
X, y = boston.data, boston.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit and evaluate linear regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
linear_predictions = linear_model.predict(X_test)
linear_mae = mean_absolute_error(y_test, linear_predictions)
linear_r2 = r2_score(y_test, linear_predictions)
print("Linear Model Mean Absolute Error (MAE):", linear_mae)
print("Linear Model R-squared (R2) Score:", linear_r2)

# Perform cross-validated hyperparameter tuning for Lasso regression
lasso_params = {'alpha': np.logspace(-3, 5, 100)}  # Define a range of alpha values
lasso_model = Lasso(max_iter=100000)
lasso_cv = GridSearchCV(lasso_model, lasso_params, cv=5)
lasso_cv.fit(X_train, y_train)
best_lasso_alpha = lasso_cv.best_params_['alpha']
print("\nBest Lasso alpha found:", best_lasso_alpha)
# Fit and evaluate Lasso regression model
lasso_model = Lasso(max_iter=100000, alpha=best_lasso_alpha)  # Setting alpha to best alpha found to add regularization (L1 penalty)
lasso_model.fit(X_train, y_train)
lasso_predictions = lasso_model.predict(X_test)
lasso_mae = mean_absolute_error(y_test, lasso_predictions)
lasso_r2 = r2_score(y_test, lasso_predictions)
print("Lasso Model Mean Absolute Error (MAE):", lasso_mae)
print("Lasso Model R-squared (R2) Score:", lasso_r2)

# Perform cross-validated hyperparameter tuning for Ridge regression
ridge_params = {'alpha': np.logspace(-3, 5, 100)}  # Define a range of alpha values
ridge_model = Ridge(max_iter=100000)
ridge_cv = GridSearchCV(ridge_model, ridge_params, cv=5)
ridge_cv.fit(X_train, y_train)
best_ridge_alpha = ridge_cv.best_params_['alpha']
print("\nBest Ridge alpha found:", best_ridge_alpha)
# Fit and evaluate Ridge regression model
ridge_model = Ridge(max_iter=100000, alpha=best_ridge_alpha)  # Setting alpha to best alpha found to add regularization (L2 penalty)
ridge_model.fit(X_train, y_train)
ridge_predictions = ridge_model.predict(X_test)
ridge_mae = mean_absolute_error(y_test, ridge_predictions)
ridge_r2 = r2_score(y_test, ridge_predictions)
print("Ridge Model Mean Absolute Error (MAE):", ridge_mae)
print("Ridge Model R-squared (R2) Score:", ridge_r2)

# Perform cross-validated hyperparameter tuning for ElasticNet regression
elasticnet_params = {'alpha': np.logspace(-3, 5, 100), 'l1_ratio': [0.1, 0.5, 0.7, 0.9]}  # Define a range of alpha and l1_ratio values
elasticnet_model = ElasticNet(max_iter=100000)
elasticnet_cv = GridSearchCV(elasticnet_model, elasticnet_params, cv=5)
elasticnet_cv.fit(X_train, y_train)
best_elasticnet_alpha = elasticnet_cv.best_params_['alpha']
best_elasticnet_l1_ratio = elasticnet_cv.best_params_['l1_ratio']
print("\nBest ElasticNet alpha found:", best_elasticnet_alpha)
print("Best ElasticNet l1_ratio found:", best_elasticnet_l1_ratio)
# Fit and evaluate ElasticNet regression model
elasticnet_model = ElasticNet(max_iter=100000, alpha=best_elasticnet_alpha, l1_ratio=best_elasticnet_l1_ratio)
elasticnet_model.fit(X_train, y_train)
elasticnet_predictions = elasticnet_model.predict(X_test)
elasticnet_mae = mean_absolute_error(y_test, elasticnet_predictions)
elasticnet_r2 = r2_score(y_test, elasticnet_predictions)
print("ElasticNet Model Mean Absolute Error (MAE):", elasticnet_mae)
print("ElasticNet Model R-squared (R2) Score:", elasticnet_r2)

In this example, we compared the performance of Linear Regression, Lasso Regression, Ridge Regression, and ElasticNet Regression on the breast cancer dataset using Mean Absolute Error (MAE) and R-squared (R2) as evaluation metrics.

The Linear Regression model achieved an MAE of 0.1969 and an R2 score of 0.7271 on the test set. After hyperparameter tuning, the best Lasso model with an alpha of 0.001 had an MAE of 0.1907 and an R2 score of 0.7443. The best Ridge model with an alpha of 0.0236 achieved an MAE of 0.1907 and an R2 score of 0.7570. The best ElasticNet model with an alpha of 0.001 and l1_ratio of 0.1 had an MAE of 0.1888 and an R2 score of 0.7509.

In summary, we observed that both Ridge and ElasticNet models outperformed Linear Regression and Lasso Regression in terms of MAE and R2 score. Ridge Regression provided a good balance between feature importance and regularization, while ElasticNet Regression offered intermediate performance, leveraging both L1 and L2 regularization. The choice of the regression model should be based on the specific dataset characteristics and the desired level of regularization and feature selection.

### 4.4 Summary

ElasticNet Regression is a versatile linear regression technique that combines the benefits of both Lasso and Ridge regression. It combines L1 and L2 regularization to provide a balance between feature selection and model complexity. It's useful for handling high-dimensional datasets and captures both linear and non-linear relationships, making it a versatile choice for regression tasks with optimal regularization parameters.