# Ridge and Lasso

Hello, let's continue talking about regression.

There are two other types of regressions in scikit-learn: Lasso and Ridge. These are characterized by placing restrictions on the magnitude of the regression coefficients.

## Ridge

The first one I want to talk about is known as Ridge.

This regression, unlike traditional linear regression, penalizes the magnitude of the learned coefficients. This in turn reduces the variance of the estimated coefficients and improves the stability of the model when collinearity exists, that is, the correlation between our predictor variables.

The penalty used by Ridge is known as L2; I've included more details about this penalty in the book's resources for you to learn more.

Before we begin, let's generate some data:

In [None]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=100, n_features=10, bias=2.0, noise=5.0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y)


To use ridge, you need to import it from `linear_model`, and call the constructor:

In [None]:
from sklearn.linear_model import Ridge

ridge = Ridge()


And like every other estimator in scikit-learn, it has the `fit` and `predict` methods to interact with it:

In [None]:
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)


### Arguments

The class shares a couple of arguments with `LinearRegression`, these are `fit_intercept` and `normalize`. But it also includes some specific ones:

 - `alpha`: This is a float parameter that specifies the level of regularization in the model. A higher alpha value results in smaller coefficients and, therefore, a more simplified model. The default value is 1.0 – this is a hyperparameter that is recommended to tune.
 - `solver`: This is a string that indicates the solver used in the underlying optimization problem. Possible values are "auto", "svd", "cholesky", "lsqr", and "sparse_cg". The default value is "auto", and it generally works well.
 - `max_iter`: This is an integer that specifies the maximum number of iterations allowed in the solver – some solvers work iteratively. The default value is None, which means a reasonable value is used based on the size of the dataset.

## Lasso

This regression, unlike traditional linear regression, penalizes the magnitude of learned coefficients – similar to Ridge regression.

The penalty used by Lasso is known as L1; I leave more details about this penalty in the book's resources for you to learn more. But something to note is that Lasso can force some coefficients to become zero, thus excluding some of the input variables from the model's calculations, thereby reducing its complexity.

The "punished" variables are those that the model considers irrelevant or with high collinearity.

The Lasso algorithm works iteratively by definition, which is another difference from traditional linear regression that has a closed analytical solution.

Ridge is also available in the `linear_model` module of `sklearn`:

In [None]:
from sklearn.linear_model import Lasso

lasso = Lasso()


And of course, it shares the interface of other estimators in scikit-learn:

In [None]:
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)


### Arguments

Like Ridge regression, it also has the `alpha` argument to control the strength with which it applies the penalty, as well as the `max_iter` parameter which is more important here because this is a fully iterative algorithm.

It also has the following arguments that can help you in training:

 - `tol`: This is the tolerance for the convergence of the optimization algorithm. If the difference between two consecutive iterations is less than `tol`, the algorithm is considered to have converged. The default value is 1e-4.
 - `warm_start`: This parameter is boolean and specifies whether to use the coefficients from the previous regression as a starting point for the current regression. If `True`, the previous solution is used as a starting point for optimization, which can speed up the fitting process. The default value is `False`.

## Attributes

Both classes offer the linear regression attributes that help us understand a bit more about our input values. If you remember, in the previous chapter on linear regression we saw how the `coef_` and `intercept_` attributes can be used to interpret the results. Lasso and Ridge have them too.

### Comparison

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import  make_regression
from sklearn.linear_model import LinearRegression, Lasso, Ridge

# Generate a random regression dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, noise=10, random_state=42)

# Fit Linear Regression
lr = LinearRegression()
ridge = Ridge()
lasso = Lasso()

# Fit regressions
lr.fit(X, y)
ridge.fit(X, y)
lasso.fit(X, y)

# Plot the coefficients
fig, ax = plt.subplots(figsize=(10, 6))
models = ['Linear Regression', 'Ridge Regression', 'Lasso Regression']
coefficients = [lr.coef_, ridge.coef_, lasso.coef_]
colors = ['blue', 'green', 'red']

for i, (model, coef) in enumerate(zip(models, coefficients)):
    ax.bar(np.arange(len(coef)) + i*0.25, coef, color=colors[i], width=0.25, label=model)

ax.set_xticks(np.arange(len(coef)))
ax.set_xticklabels(['Feature '+str(i) for i in range(len(coef))])
ax.set_ylabel("Coeficiente")
ax.legend()


## When to use each one?

 - Linear regression is a good option when the relationship between the independent variables and the dependent variable is approximately linear. It's always a good method to consider, even if just to establish a baseline.
 - Ridge regression is a good choice when there are many features and some of them are expected to have small or moderate effects on the dependent variable. Ridge helps with regularization by shrinking some feature coefficients towards zero, but not to zero like Lasso.
 - Lasso regression is a good option when there are many features and some of them are expected to be irrelevant or redundant. Lasso helps with feature selection and regularization by turning some feature coefficients to zero, effectively removing them from the model.
And there you have it, two other types of regression very similar to linear regression but including a level of penalization for the coefficients that helps us reduce overfitting and the overall complexity of the model.

The next chapter will continue exploring supervised learning.