[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/samdavanloo/ISE-ML/blob/main/2_regularization.ipynb)

# Regularization in model fitting
Regularization is one approach to tackle the problem of overfitting by adding additional information, and thereby shrinking the parameter values of the
model to induce a penalty against complexity. The most popular approaches to regularized linear regression are the so-called "Ridge Regression", "Least Absolute Shrinkage and Selection Operator (LASSO)", and "Elastic Net".

## Ridge regression
Ridge regression is an L2-norm penalized model where we simply add the squared sum of the weights to our least-squares cost function:
$$
l_{\text{Ridge}}(\mathbf{w}) = \sum_{i=1}^n(y_i-\hat{f}_i(\mathbf{w},w_0))^2 + \lambda||\mathbf{w}||_2^2,
$$
where
$$
||\mathbf{w}||_2^2 \triangleq \sum_{j=1}^p w_j^2.
$$
By increasing the value of hyperparameter $\lambda$, we increase the regularization strength and shrink the weights of our model. Note that we don't regularize the intercept term $w_0$.

In [None]:
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1.0)

## Least Absolute Shrinkage and Selection Operator (LASSO)
An alternative approach that can lead to sparse models is LASSO. Depending on the regularization strength, certain weights can become zero, which also makes LASSO useful as a supervised feature selection technique. The optimization problem behind fitting LASSO has the form of
$$
l_{\text{LASSO}}(\mathbf{w}) = \sum_{i=1}^n(y_i-\hat{f}_i(\mathbf{w},w_0))^2 + \lambda||\mathbf{w}||_1,
$$
where
$$
||\mathbf{w}||_1 \triangleq \sum_{j=1}^p |w_j|.
$$

In [None]:
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=1.0)

## Elastic Net
A limitation of LASSO is that it selects at most n variables if p>n. A compromise between Ridge regression and LASSO is Elastic Net, which has an L1 penalty to generate sparsity and an L2 penalty to overcome some of the limitations of LASSO, such as the number of selected variables:
$$
l_{\text{Ridge}}(\mathbf{w}) = \sum_{i=1}^n(y_i-\hat{f}_i(\mathbf{w},w_0))^2 + \lambda_1||\mathbf{w}||_2^2+ \lambda_2||\mathbf{w}||_1.
$$

In [68]:
from sklearn.linear_model import ElasticNet
elanet = ElasticNet(alpha=1.0, l1_ratio=0.5)