## Regularization

### L2 and L1 regularization

Regularization is a technique that is used to avoid overfitting of the data, by adding complexity to the weights.

In other words, instead of simply aiming to minimize loss (empirical risk minimization):
```
minimize(loss_function)
```
we'll now minimize loss+complexity, which is called structural risk minimization:

```
minimize(loss_function)+complexity(model)
```

Our training optimization algorithm is now a function of two terms: the loss term, which measures how well the model fits the data, and the regularization term, which measures model complexity.


In [1]:
import pandas as pd

from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso,Ridge

from sklearn.metrics import mean_absolute_error,mean_squared_error

import warnings
warnings.filterwarnings("ignore")

In [90]:
sb.set_style("darkgrid")
plt.rcParams['font.size'] = 16
plt.rcParams['figure.figsize']=(14,7)
plt.rcParams['figure.facecolor'] = '#FFF'

In [4]:
data = pd.read_csv("../Day11: Linear Regression/car_prediction.csv")

In [2]:
numerical = list()
category = list()

minmax = MinMaxScaler()
lblenc = LabelEncoder()

### Feature Selection

In [5]:
features = ['CarName','horsepower','enginesize','peakrpm','highwaympg','doornumber','carlength']

In [6]:
X = data[features]
Y = data.price

In [7]:
for col in X:
    if X[col].dtype == "O":
        category.append(col)
    else:
        numerical.append(col)

In [8]:
numerical

['horsepower', 'enginesize', 'peakrpm', 'highwaympg', 'carlength']

In [9]:
X[numerical] = minmax.fit_transform(X[numerical])

In [10]:
for cat_col in category:
    X[cat_col] = lblenc.fit_transform(X[cat_col])

In [11]:
X

Unnamed: 0,CarName,horsepower,enginesize,peakrpm,highwaympg,doornumber,carlength
0,2,0.262500,0.260377,0.346939,0.289474,1,0.413433
1,3,0.262500,0.260377,0.346939,0.289474,1,0.413433
2,1,0.441667,0.343396,0.346939,0.263158,1,0.449254
3,4,0.225000,0.181132,0.551020,0.368421,0,0.529851
4,5,0.279167,0.283019,0.551020,0.157895,0,0.529851
...,...,...,...,...,...,...,...
200,139,0.275000,0.301887,0.510204,0.315789,0,0.711940
201,138,0.466667,0.301887,0.469388,0.236842,0,0.711940
202,140,0.358333,0.422642,0.551020,0.184211,0,0.711940
203,142,0.241667,0.316981,0.265306,0.289474,0,0.711940


In [12]:
Y

0      13495.0
1      16500.0
2      16500.0
3      13950.0
4      17450.0
        ...   
200    16845.0
201    19045.0
202    21485.0
203    22470.0
204    22625.0
Name: price, Length: 205, dtype: float64

In [13]:
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.23)

In [14]:
x_train.shape

(157, 7)

In [15]:
x_test.shape

(48, 7)

In [16]:
y_train.shape

(157,)

In [17]:
model = LinearRegression()
model.fit(x_train,y_train)

LinearRegression()

In [18]:
car_predict = model.predict(x_test)

In [19]:
model.score(x_train,y_train) * 100

82.96707034877515

In [20]:
model.score(x_test,y_test) *100

81.64961154863953

In [123]:
abs_error = mean_absolute_error(y_test,car_predict)

In [124]:
abs_error

2927.127837373467

In [21]:
mse = mean_squared_error(y_test,car_predict,squared=False)

In [22]:
mse

3518.0537865543874

In [39]:
x_test[:10]

Unnamed: 0,CarName,horsepower,enginesize,peakrpm,highwaympg,doornumber,carlength
51,51,0.083333,0.113208,0.346939,0.578947,1,0.268657
195,138,0.275,0.301887,0.510204,0.315789,0,0.71194
20,26,0.091667,0.109434,0.510204,0.710526,0,0.264179
52,61,0.083333,0.113208,0.346939,0.578947,1,0.268657
119,89,0.225,0.139623,0.55102,0.368421,1,0.241791
173,115,0.183333,0.230189,0.020408,0.473684,0,0.514925
13,12,0.304167,0.388679,0.040816,0.315789,0,0.532836
115,85,0.204167,0.222642,0.346939,0.210526,0,0.680597
102,72,0.433333,0.45283,0.428571,0.157895,0,0.649254
179,120,0.470833,0.415094,0.428571,0.210526,1,0.632836


### Ridge Regularization (L2)

penalizes the size (square of the magnitude) of the regression coefficients

In [42]:
ridge_model = Ridge(alpha = 0.01)
ridge_model.fit(x_train, y_train)

Ridge(alpha=0.01)

In [43]:
ridge_model.score(x_train,y_train)

0.8296667110295941

In [50]:
ridge_model.score(x_test,y_test)

0.816393095256166

### Lasso Regularization (L1)

Lasso Regression (Least Absolute Shrinkage and Selection Operator) adds “Absolute value of magnitude” of coefficient, as penalty term to the loss function.

In [52]:
lasso_model = Lasso(alpha = 0.02, tol = 0.01)

In [53]:
lasso_model.fit(x_train, y_train)

Lasso(alpha=0.02, tol=0.01)

In [54]:
lasso_model.score(x_train,y_train) 

0.8296706997984666

In [55]:
lasso_model.score(x_test,y_test)

0.8164959057397018

### Reference:
    
[Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/l2-regularization)

[Towards Data Science](https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b)