# Regularization
A technique used to solve the overfitting or underfitting problem of a machine learning model.

If a model learns too much, in which it learns the "background noise" (outliers) while being fit, then it may overfit the model to the unnecessary data points that hurts generalization.

We can prevent this type of problem through regularization parameters.

# Ridge Regression
In a Machine Learning model, if we use a simple linear regression model, then the model will use the Ordinary Least Squares method to determine the line of best-fit.

However, what if there were a few background noises or not enough data points, so the model fits too much to the data?

### Simple Linear vs Ridge Regression
The Simple Linear Regression's formula is: ```y = b0 + b1*x```.

Simple Linear uses the formula with the smallest value of ```squared residuals```.

However, Ridge Regression uses the formula with the smallest value of:  
```squared residuals + (lambda * b1^2)```
- b1^2 adds a penalty to the traditional Least Squares method
- lambda determines how severe the b1^2 penalty is
    - The higher the lambda, the larger the squared residuals, so the higher the penalty

#### Ridge Regression for Multi-Variate Linear Regression
If we were using a multi-variate linear regression model, then each b-coefficient would used in the Ridge Regression's smallest squared residuals formula:   
```squared residuals + [lambda * (b1^2 + b2^2 + b3^2 + ... bn^2)]```

### Example of Overfit Simple Linear Regression
<img src="images/rr/overfit_regression_example.png" height="35%" width="35%"></img>

The red line (regression line) is overfit to the red training data set. Therefore, the predictions made on the green testing data set would be inaccurate.

Instead, we can solve this problem by adding a "ridge" regularization parameter that adds some bias.
<img src="images/rr/ridge_regression_example.png" height="35%" width="35%"></img>

The blue line (ridge regression line) adds a small amount of bias, but has less variance for the predicted values.

In [25]:
# import libraries
import numpy as np
import pandas as pd

In [26]:
# import the regression models
from sklearn.linear_model import LinearRegression, Ridge

# create linear data
x = np.array([[0], [5]])
y = np.array([1, 2])

In [27]:
# create a simple linear regressor, then fit it to the training data
simple_regressor = LinearRegression()
simple_regressor.fit(x, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [38]:
# create a ridge regressor with lambda = 1, then fit it to the training data
ridge_regressor = Ridge(alpha=1)
ridge_regressor.fit(x, y)

Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [39]:
# simple regressor prediction of 10
print(simple_regressor.predict([[10]]))

# ridge regressor prediction of 10
print(ridge_regressor.predict([[10]]))

[3.]
[2.88888889]
