# Lasso Regression
Lasso Regression, also known as Least Absolute Shrinkage and Selection Operator, is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e., models with fewer parameters). This is achieved by penalizing the absolute size of the regression coefficients.

The objective function for Lasso Regression is:

$$
\text{min} \left( \sum_{i=1}^{n} (y_i - \sum_{j=1}^{p} x_{ij} \beta_j)^2 + \lambda \sum_{j=1}^{p} |\beta_j| \right)
$$

Where:
- $ y_i $ is the dependent variable.
- $ x_{ij} $ represents the independent variables.
- $ \beta_j $ are the coefficients.
- $ \lambda $ is the penalty parameter.

The Lasso Regression can be particularly useful when you have a large number of features, and you want to select a subset of them for your model.

In [69]:
# import libraries
from sklearn.linear_model import Lasso , Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error

# Generate some regression data
X, y = make_regression(n_samples=10000, n_features=20, noise=0.3, random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Lasso regression model
lasso = Lasso(alpha=0.1) # Regularization parameter
Ridge = Ridge(alpha=0.1) # Regularization parameter
# Fit the model to the training data
lasso.fit(X_train, y_train)
Ridge.fit(X_train, y_train)
# Make predictions on the test data
y_pred_lasso = lasso.predict(X_test)
y_pred_ridge = Ridge.predict(X_test)
# Calculate the mean squared error of the predictions
mse = mean_squared_error(y_test, y_pred_lasso)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
# Print the mean squared error
print(f'Mean Squared Error: {mse}')
print(f'Mean Squared Error Ridge: {mse_ridge}')

Mean Squared Error: 0.18317905556291122
Mean Squared Error Ridge: 0.09287671961133424


In [72]:
# Fine tune alpha value using cv
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge
import numpy as np

# Create a Lasso regression object
lasso = Lasso()

# Create a dictionary for the grid search key and values
param_grid = {'alpha': np.arange(0.1, 10, 0.1)}

# Use grid search to find the best value for alpha
lasso_cv = GridSearchCV(lasso, param_grid, cv=10)

# Fit the model
lasso_cv.fit(X, y)

# Print the tuned parameters and score
print("Tuned Lasso Regression Parameters: {}".format(lasso_cv.best_params_))
print("Best score is {}".format(lasso_cv.best_score_))

# Create a Ridge regression object
ridge = Ridge()

# Create a dictionary for the grid search key and values
param_grid = {'alpha': np.arange(0.1, 10, 0.1)}

# Use grid search to find the best value for alpha
ridge_cv = GridSearchCV(ridge, param_grid, cv=10)

# Fit the model
ridge_cv.fit(X, y)

# Print the tuned parameters and score
print("Tuned Ridge Regression Parameters: {}".format(ridge_cv.best_params_))
print("Best score is {}".format(ridge_cv.best_score_))

Tuned Lasso Regression Parameters: {'alpha': 0.1}
Best score is 0.9999952605177066
Tuned Ridge Regression Parameters: {'alpha': 0.1}
Best score is 0.9999977194624126
