## Regularization

Regularization is a technique used in machine learning to reduce overfitting. Overfitting occurs when a model learns the training data too well and is unable to generalize to new data.

There are mainly three types of regularization: `Lasso`, `Ridge` and `Elastic Net`.

### Lasso Regression
lasso regression, also known as Least Absolute Shrinkage and Selection Operator, is a type of regularization that penalizes the L1 norm of the coefficients. The L1 norm is the sum of the absolute values of the coefficients. This penalization forces the model to shrink the coefficients towards zero, which can lead to some coefficients towards zero, which can lead to some coefficients being set to zero exaactly. This feature selection capability of Lasso regression makes it a powerful tool for variable selection.


In [None]:
#In Python, Lasso formula:

Lasso(B) = sum((vi - x.T*B)^2) + lambda * sum(abs(Bj))

where:

1. `B` is the vector of coefficients.
2. `vi` is the target variable for the i-th observation.
3. `x.T*B` is the predicted target variable for the i-th observation, given the coefficients vector B.
4. `lambda` is the regularization parameter.

### Ridge Regression
Ridge regression penalizes the L2 norm of the coefficients. The L2 norm is the square root of the sum of the squared coefficients. This penalization shrinks the coefficients towards zero, but it does not set any coefficients to zero exactly. This makes Ridge regression a more robust regularization technique than Lasso regression, but it is not as effective for variable selection.

In [None]:
#In Python, Ridge formula:

Lasso(B) = sum((vi - x.T*B)^2) + lambda * sum(Bj**2)

where:

1. `B` is the vector of coefficients.
2. `vi` is the target variable for the i-th observation.
3. `x.T*B` is the predicted target variable for the i-th observation, given the coefficients vector B.
4. `lambda` is the regularization parameter.

The Ridge formula is similar to the Lasso formula, except that the regularization penalty is the L2 norm of the coefficients, rather than the L1 norm. This means that the Ridge model will shrink the coefficients towards zero, but it will not set any coefficients to zero exactly.