# Module 1: Introduction to Scikit-Learn

## Section 2: Supervised Learning Algorithms

### Part 2: Lasso Regression

In this part, we will explore Lasso regression, a linear regression technique that performs both variable selection and regularization.

### 2.1 Understanding Lasso regression

Lasso regression, also known as L1 regularization, is a linear regression technique that extends ordinary least squares regression by adding an L1 penalty term to the objective function. The L1 penalty promotes sparsity in the model by driving some coefficients to exactly zero. As a result, Lasso regression performs variable selection, allowing us to identify the most important features.

The key idea behind Lasso regression is to find a balance between fitting the training data well and keeping the model coefficients small. By adding the L1 penalty term, Lasso regression encourages sparse models, effectively excluding irrelevant or redundant features from the model generalizing better and avoiding overfitting.

In Lasso regression, the objective is to minimize the following cost function:

$\text{J}\theta = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2+\alpha\sum_{j=1}^{n}|\theta_j|$

Where:

- $m$ is the number of training examples.
- $h_\theta(x^{(i)})$ is the predicted output for the $i$-th training example using the model parameters $\theta$.
- $y^{(i)}$ is the actual output for the $i$-th training example.
- $n$ is the number of features (excluding the bias term).
- $|\theta_j|$ represents the absolute value of the coefficient of the $j$-th feature.
- $\alpha$ is the regularization parameter or the strength of the penalty term.

The regularization term, $\alpha$, which is the sum of the absolute values of the coefficients multiplied by the regularization parameter. The larger the value of $\alpha$, the greater the penalty, and the more the coefficients will be pushed towards zero. Our linear model will be model horizontal because we will reduce the sensitivity of the changes in the independant variables.

The goal of lasso regression is to find the best-fit line that minimizes the difference between the predicted values and the actual values and reduce overfitting adding a penaly term.

$\text{Least Squares Regression = Min(sum of squared residuals}+ \alpha * |\text{slope}|)$ 

Lasso is usefull when we have several independent variables that are useless. It helps reduce overfitting and performs variable selection. Lasso can reduce the slope to exacly zero.

Advantages of Lasso Regression
1. Feature Selection: Lasso regression automatically performs feature selection by setting the coefficients of less important features to zero. This makes the model more interpretable and efficient when dealing with high-dimensional datasets.
2. Regularization: Lasso regression introduces regularization, which helps prevent overfitting by controlling the complexity of the model.
3. Sparsity: The feature selection property of Lasso regression leads to sparse models, meaning that only a subset of features is used in the final model, reducing memory and computational requirements.
4. Robust to Multicollinearity: Lasso regression can handle multicollinearity, a situation where two or more features are highly correlated, by selecting one of them and setting the others' coefficients to zero.

Disadvantages of Lasso Regression
1. Feature Dependency Handling: Lasso regression tends to arbitrarily select one feature among highly correlated features and set the others to zero. This can lead to information loss when dealing with feature dependencies.
2. Hyperparameter Tuning: The choice of the regularization parameter $\alpha$ can significantly affect the model's performance. Selecting the right value for $\alpha$ often requires cross-validation.
3. Not Suitable for All Problems: Lasso regression may not perform well on datasets where all features are equally important, as it may eliminate useful predictors.

### 2.2 Training and Evaluation

In Scikit-Learn, Lasso regression can be easily implemented using the Lasso class from the linear_model module. The regularization parameter $\alpha$ can be set using the alpha hyperparameter. By varying the value of $\alpha$, you can control the amount of regularization and the sparsity of the resulting model.

To train a Lasso regression model, we need a labeled dataset with the target variable and the corresponding feature values. The model learns by minimizing the regularized objective function, which includes the sum of squared residuals from the ordinary least squares regression and the L1 penalty term.

Once trained, we can use the Lasso regression model to make predictions for new, unseen data points. The model predicts the target values based on the learned coefficients and the feature values.

Scikit-Learn provides the Lasso class for performing Lasso regression. Here's an example of how to use it:



In [None]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score

np.random.seed(42)  # Set a fixed seed for reproducibility
# Create a synthetic dataset with 100 features
X, y = make_regression(n_samples=100, n_features=100, noise=10)
# Introduce some multicollinearity by creating linear combinations of features
X[:, 10] = X[:, 0] + X[:, 1] + np.random.normal(0, 1, 100)
X[:, 20] = 2 * X[:, 5] + np.random.normal(0, 1, 100)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and fit a Linear Regression model
linear_regressor = LinearRegression()
linear_regressor.fit(X_train, y_train)
# Make predictions using the linear regression model
y_pred_linear = linear_regressor.predict(X_test)
linear_mae = mean_absolute_error(y_test, y_pred_linear)
linear_r2 = r2_score(y_test, y_pred_linear)
print("Linear Model Mean Absolute Error (MAE):", linear_mae)
print("Linear Model R-squared (R2) Score:", linear_r2)

# Create and fit a Lasso Regression model
lasso_regressor = Lasso(max_iter=100000, alpha=0.001)  # alpha is the regularization strength, higher values increase the penalty
lasso_regressor.fit(X_train, y_train)
# Make predictions using the lasso regression model
y_pred_lasso = lasso_regressor.predict(X_test)
lasso_mae = mean_absolute_error(y_test, y_pred_lasso)
lasso_r2 = r2_score(y_test, y_pred_lasso)
print("\nLasso Model Mean Absolute Error (MAE):", lasso_mae)
print("Lasso Model R-squared (R2) Score:", lasso_r2)

In this example, we've introduced multicollinearity between some features by creating linear combinations of other features. Linear regression tends to struggle with such datasets, as the multicollinearity can lead to unstable coefficient estimates. On the other hand, lasso regression is well-suited to handle such scenarios and effectively identify the relevant features while setting the coefficients of irrelevant features to zero. In this case, the linear model has a lower MAE (62.13) compared to the Lasso model (104.93). A lower MAE indicates that the linear model's predictions are closer to the true values on average, suggesting better performance for the linear model. The linear model has also a higher R-squared value (0.776) than the Lasso model (0.354), indicating that the linear model explains a larger portion of the variance in the target variable compared to the Lasso model.

If Lasso has a better performance on datasets with multicollinearity why our lineal model is doing better? It's essential to perform proper cross-validation to assess the models' generalization performance effectively. Additionally, hyperparameter tuning, such as fine-tuning the alpha parameter for Lasso regression, could potentially improve the performance of the Lasso model.

### 2.3 Hyperparameter Tuning

Lasso regression has a hyperparameter called alpha that controls the strength of the regularization. Higher values of alpha result in stronger regularization and more coefficients being shrunk to zero. The choice of the alpha value depends on the trade-off between bias and variance. Cross-validation techniques, such as grid search or randomized search, can be used to find the optimal value of alpha.

In [None]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_absolute_error, r2_score

np.random.seed(42)  # Set a fixed seed for reproducibility
# Create a synthetic dataset with 100 features
X, y = make_regression(n_samples=100, n_features=100, noise=10)
# Introduce some multicollinearity by creating linear combinations of features
X[:, 10] = X[:, 0] + X[:, 1] + np.random.normal(0, 1, 100)
X[:, 20] = 2 * X[:, 5] + np.random.normal(0, 1, 100)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and fit a Linear Regression model
linear_regressor = LinearRegression()
linear_regressor.fit(X_train, y_train)
# Make predictions using the linear regression model
y_pred_linear = linear_regressor.predict(X_test)
linear_mae = mean_absolute_error(y_test, y_pred_linear)
linear_r2 = r2_score(y_test, y_pred_linear)
print("Linear Model Mean Absolute Error (MAE):", linear_mae)
print("Linear Model R-squared (R2) Score:", linear_r2)

# Create a Lasso Regression model
lasso_regressor = Lasso(max_iter=100000)
# Set up a range of alpha values to try
param_grid = {'alpha': [0.000001, 0.00001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0]}
# Perform cross-validation to find the best alpha
grid_search = GridSearchCV(lasso_regressor, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Get the best alpha value from the cross-validation
best_alpha = grid_search.best_params_['alpha']
print("\nBest Lasso alpha value found:", best_alpha)
# Create and fit a Lasso Regression model with the best alpha
lasso_regressor_best = Lasso(alpha=best_alpha)
lasso_regressor_best.fit(X_train, y_train)

# Make predictions using the lasso regression model with the best alpha
y_pred_lasso_best = lasso_regressor_best.predict(X_test)
lasso_mae_best = mean_absolute_error(y_test, y_pred_lasso_best)
lasso_r2_best = r2_score(y_test, y_pred_lasso_best)
print("Lasso Model with Best Alpha Mean Absolute Error (MAE):", lasso_mae_best)
print("Lasso Model with Best Alpha R-squared (R2) Score:", lasso_r2_best)

In this example, we've the same prevoius data with multicollinearity between some features by creating linear combinations of other features. We found the best alpha hyperparameter using cross-validation strategy with grid_search. After performing cross-validation to find the best alpha value for Lasso regression, the model with the selected alpha (1.0 in this case) achieved significantly improved performance compared to the initial Lasso model. The Lasso model with the best alpha achieved a lower MAE (10.50) than both the linear model and the initial Lasso model. This indicates that the Lasso model with the selected alpha provides more accurate predictions, with smaller average absolute differences between the true target values and the predicted values. The R-squared score for the Lasso model with the best alpha (0.994) is very close to 1.0, indicating that the model explains almost all of the variance in the target variable. This suggests that the Lasso model with the selected alpha provides an excellent fit to the data and performs exceptionally well in explaining the variation in the target variable.

Overall, the updated results demonstrate the importance of hyperparameter tuning, specifically selecting the right alpha value in Lasso regression. By choosing the best alpha through cross-validation, the Lasso model can outperform both the linear model and the initial Lasso model, providing more accurate predictions and better explaining the variance in the target variable.

### 2.4 Summary

Lasso regression is a powerful technique for linear regression tasks, offering both variable selection and regularization. It adds an L1 penalty term to the objective function, promoting sparse models by shrinking some coefficients to zero. Is a powerful tool and it can be particularly useful in situations with high-dimensional data, multicollienarity or when interpretability is essential. However, it's essential to carefully tune the regularization parameter and consider the potential impact of feature dependencies on the model's performance.