# Regularization

Regularization is a very useful technique to improve our models and make sure they don't overfit.

We have two models that classify this data

- line
- higher degree polynomial curve

Which one is better?

- the line makes a couple of mistakes
- the curve makes zero mistake but a bit more complicated

### Example
Assume we have the equation of line 

$ 3x_1 + 4x_2 + 5 = 0$

and the equation of polynomial is

$2x_1^3 - 2x_1^2x_2-4x_2^3 + 3x_1^2 + 6x_1x_2 + 4x_2^2 + 5 = 0$

The equation of line is much simpler than the polynomial equation.

if we add the coefficients of the line to it's erro, we get a slightly larger error

but if we take all of the coefficient of the polynomial and add them to the error, we get a huge error.

### L1 regularization

take the absolute value of the coefficients of the model and adds them to the error.

for the polynomial, we add $(2 + 2 + 4 + 3 + 6 + 4) = 21$ to the error (not including the constant)

but for the line, we add $(3 + 4) = 7$ to the error.

The complicated model give us a much higher error

### L2 regularization

L2 is similar to L1, however instead of adding the absolute values, we add the squares of the coefficients.

for the polynomial, we get $2^2 + {-2}^2 + {-4}^2 + 3^2 + 6^2 + 4^2= 85$

for the linear case, we get $3^2 + 4^2 = 25$

The complex model gets punished a lot more than the simple one.

If we want to punish the complicated model too little, or too much?

We can "tune", or alter the amount later that we want to punish complex models by using a parameter called Lambda

## The lambda($\lambda$) Parameter


Using \lambdaλ, we multiply the error that comes from the complexity of the model to adjust the overall error.

- With a small lambda, the error that comes from the complexity of the model is not large enough to overtake the errors in the simplified model misclassifying points, so we will choose the complex model.

- With a large value for lambda, we're multiplying the complexity part of the error by a lot. This punishes the complex model more so the simple model wins.

## Exercise

Perhaps it's not too surprising at this point, but there are classes in sklearn that will help you perform regularization with your linear regression. 

You'll get practice with implementing that in this exercise.

In this assignment's data.csv, you'll find data for a bunch of points including six predictor variables and one outcome variable. 

Use sklearn's Lasso class to fit a linear regression model to the data, while also using L1 regularization to control for model complexity.

In [4]:
import pandas as pd
spreadsheet = pd.read_csv('./09_data.csv', delimiter = ',',header=None)
spreadsheet.head()

Unnamed: 0,0,1,2,3,4,5,6
0,1.25664,2.04978,-6.2364,4.71926,-4.26931,0.2059,12.31798
1,-3.89012,-0.37511,6.14979,4.94585,-3.57844,0.0064,23.67628
2,5.09784,0.9812,-0.29939,5.85805,0.28297,-0.20626,-1.53459
3,0.39034,-3.06861,-5.63488,6.43941,0.39256,-0.07084,-24.6867
4,5.84727,-0.15922,11.41246,7.52165,1.69886,0.29022,17.54122


In [3]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

# Assign the data to predictor and outcome variables
train_data = pd.read_csv('./09_data.csv', header = None)
X = train_data.iloc[:,:-1]
y = train_data.iloc[:,-1]

lasso_reg = Lasso()

lasso_reg.fit(X, y)

reg_coef = lasso_reg.coef_
print(reg_coef)

[ 0.          2.35793224  2.00441646 -0.05511954 -3.92808318  0.        ]
