#### Regularization is a technique that adds penalty as the model complexity increases.


* When model complexity increases, it leads to overfitting.


* Overfitting happens when model learns signal as well as noise.


* So an overfit model will always perform well on the training data but it will perform bad on testing and actual data.


* In order to create less complex (parsimonius) model, we need to employ regularization techniques:
    1. L1-Regularization or Lasso
    2. L2-Regularization or Ridge
    3. Elasticnet Regularization - combination of Lasso and Ridge

#### Building a model to predict unemployment level in an economy

#### Import Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge,Lasso,ElasticNet
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

In [2]:
df = pd.read_csv('dataset/economics.csv')

In [3]:
df.head()

Unnamed: 0,date,pce,pop,psavert,uempmed,unemploy
0,1967-07-01,507.4,198712,12.5,4.5,2944
1,1967-08-01,510.5,198911,12.5,4.7,2945
2,1967-09-01,516.3,199113,11.7,4.6,2958
3,1967-10-01,512.9,199311,12.5,4.9,3143
4,1967-11-01,518.1,199498,12.5,4.7,3066


#### Data dictionary

* psavert - personal savings rate


* pce - personal consumption expenditure, USD billions


* uempmed - median duration of unemployment, weeks


* unemploy - number of unemployed (thousands)


* pop - population in thousands

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 574 entries, 0 to 573
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   date      574 non-null    object 
 1   pce       574 non-null    float64
 2   pop       574 non-null    int64  
 3   psavert   574 non-null    float64
 4   uempmed   574 non-null    float64
 5   unemploy  574 non-null    int64  
dtypes: float64(3), int64(2), object(1)
memory usage: 27.0+ KB


In [5]:
df.describe()

Unnamed: 0,pce,pop,psavert,uempmed,unemploy
count,574.0,574.0,574.0,574.0,574.0
mean,4843.510453,257189.381533,7.936585,8.610105,7771.557491
std,3579.287206,36730.801593,3.124394,4.108112,2641.960571
min,507.4,198712.0,1.9,4.0,2685.0
25%,1582.225,224896.0,5.5,6.0,6284.0
50%,3953.55,253060.0,7.7,7.5,7494.0
75%,7667.325,290290.75,10.5,9.1,8691.0
max,12161.5,320887.0,17.0,25.2,15352.0


In [6]:
target = df['unemploy']

features = ['pce', 'pop', 'psavert', 'uempmed']

In [9]:
df[features] = df[features]/df[features].max() # minmax scaler

In [10]:
df.describe()

Unnamed: 0,pce,pop,psavert,uempmed,unemploy
count,574.0,574.0,574.0,574.0,574.0
mean,0.398266,0.801495,0.466858,0.341671,7771.557491
std,0.294313,0.114466,0.183788,0.16302,2641.960571
min,0.041722,0.619258,0.111765,0.15873,2685.0
25%,0.130101,0.700857,0.323529,0.238095,6284.0
50%,0.325087,0.788627,0.452941,0.297619,7494.0
75%,0.630459,0.904651,0.617647,0.361111,8691.0
max,1.0,1.0,1.0,1.0,15352.0


In [13]:
X = df[features]

y = target

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)

print(f'X Shape : {X.shape}\n\nX_train Shape : {X_train.shape}\n\nX_test Shape : {X_test.shape}')

X Shape : (574, 4)

X_train Shape : (459, 4)

X_test Shape : (115, 4)


#### Build a linear regression model

In [14]:
lm = LinearRegression()

In [15]:
lm.fit(X_train,y_train)

LinearRegression()

##### Training and Testing Accuracy

In [18]:
y_pred_train = lm.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train, y_pred_train))}')
print(f'R-Squared : {r2_score(y_train, y_pred_train)}')

print('\n\nTesting Accuracy')

y_pred_test = lm.predict(X_test)

print(f'RMSE : {np.sqrt(mean_squared_error(y_test, y_pred_test))}')
print(f'R-Squared : {r2_score(y_test, y_pred_test)}')


print(f'\n\nCoefficients: {lm.coef_}')

Training Accuracy
RMSE : 985.0285748757371
R-Squared : 0.8510879906820179


Testing Accuracy
RMSE : 1001.9163075939397
R-Squared : 0.8854186379903031


Coefficients: [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


### Regularization

##### Ridge Regression 

* Extension of linear regression


* In all regularization, we add a loss function


* loss function is modified to minimize complexity of model


* modification is done by introducing a penalty parameter, smoothing parameter, lambda


* in ridge, smoothing parameter is sum of squared magnitude of coefficients


**Loss Function = OLS + α * (sum of squared coefficient values)**


* α - smoothing parameter
    * if α = 0 then it becomes only OLS
    * Low α leads to overfitting
    * High α leads to underfitting

In [20]:
rr = Ridge(alpha=0.01)

rr.fit(X_train, y_train)

y_pred_train = rr.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train, y_pred_train))}')
print(f'R-Squared : {r2_score(y_train, y_pred_train)}')

print('\n\nTesting Accuracy')

y_pred_test = rr.predict(X_test)

print(f'RMSE : {np.sqrt(mean_squared_error(y_test, y_pred_test))}')
print(f'R-Squared : {r2_score(y_test, y_pred_test)}')

print(f'\n\nOLS Coefficients: {lm.coef_}')
print(f'\n\nRidge Coefficients: {rr.coef_}')

Training Accuracy
RMSE : 988.4714111995353
R-Squared : 0.8500452277864494


Testing Accuracy
RMSE : 999.568234828448
R-Squared : 0.8859550702424106


OLS Coefficients: [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


Ridge Coefficients: [-17406.11927015  51531.47457744   4509.20141652  13654.41294602]


#### Lasso Regression or L1-Norm

Lasso - Least Absolute shrinkage and selection operator

**Loss Function = OLS + α * (sum of absolute values of magnitude of coefficients)**


In [21]:
lasso = Lasso(alpha=0.01)

lasso.fit(X_train,y_train)

Lasso(alpha=0.01)

In [23]:
y_pred_train = lasso.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train, y_pred_train))}')
print(f'R-Squared : {r2_score(y_train, y_pred_train)}')

print('\n\nTesting Accuracy')

y_pred_test = lasso.predict(X_test)

print(f'RMSE : {np.sqrt(mean_squared_error(y_test, y_pred_test))}')
print(f'R-Squared : {r2_score(y_test, y_pred_test)}')

print(f'\n\nOLS Coefficients: {lm.coef_}')
print(f'\n\nRidge Coefficients: {rr.coef_}')
print(f'\n\nLasso Coefficients: {lasso.coef_}')

Training Accuracy
RMSE : 985.0290263432382
R-Squared : 0.8510878541804994


Testing Accuracy
RMSE : 1001.864344384054
R-Squared : 0.8854305229370176


OLS Coefficients: [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


Ridge Coefficients: [-17406.11927015  51531.47457744   4509.20141652  13654.41294602]


Lasso Coefficients: [-19126.88512097  56908.65399987   5101.01103684  13551.65114574]


#### Elasticnet Regression

Combines both L1 and L2 

In [24]:
en = ElasticNet(alpha=0.01)
en.fit(X_train,y_train)

ElasticNet(alpha=0.01)

In [25]:
y_pred_train = en.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train, y_pred_train))}')
print(f'R-Squared : {r2_score(y_train, y_pred_train)}')

print('\n\nTesting Accuracy')

y_pred_test = en.predict(X_test)

print(f'RMSE : {np.sqrt(mean_squared_error(y_test, y_pred_test))}')
print(f'R-Squared : {r2_score(y_test, y_pred_test)}')

print(f'\n\nOLS Coefficients: {lm.coef_}')
print(f'\n\nRidge Coefficients: {rr.coef_}')
print(f'\n\nLasso Coefficients: {lasso.coef_}')
print(f'\n\nElasticnet Coefficients: {en.coef_}')

Training Accuracy
RMSE : 1367.644177780906
R-Squared : 0.7129362807991237


Testing Accuracy
RMSE : 1396.7830672257037
R-Squared : 0.7773057453745247


OLS Coefficients: [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


Ridge Coefficients: [-17406.11927015  51531.47457744   4509.20141652  13654.41294602]


Lasso Coefficients: [-19126.88512097  56908.65399987   5101.01103684  13551.65114574]


Elasticnet Coefficients: [ 989.44297232 2928.64351627 1271.86008179 9798.97587465]


* Ridge works well when there are many large parameters of similar value.


* Lasso works well when there are small number of significant parameters and coefficients of others are close to zero.

### How to select value of α

In [30]:
from sklearn.linear_model import RidgeCV # Ridge with Cross Validation


rrcv = RidgeCV(alphas=[0.001,0.01,0.1,1,10])

rrcv.fit(X_train,y_train)

y_pred_train = rrcv.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train, y_pred_train))}')
print(f'R-Squared : {r2_score(y_train, y_pred_train)}')

print('\n\nTesting Accuracy')

y_pred_test = rrcv.predict(X_test)

print(f'RMSE : {np.sqrt(mean_squared_error(y_test, y_pred_test))}')
print(f'R-Squared : {r2_score(y_test, y_pred_test)}')

print(f'\n\nOLS Coefficients: {lm.coef_}')
print(f'\n\nRidge Coefficients: {rr.coef_}')
print(f'\n\nLasso Coefficients: {lasso.coef_}')
print(f'\n\nElasticnet Coefficients: {en.coef_}')
print(f'\n\nRidge CV Coefficients: {rrcv.coef_}')

print(f'\n\nBest Value of Alpha = {rrcv.alpha_}')

Training Accuracy
RMSE : 985.0698654882659
R-Squared : 0.851075506177294


Testing Accuracy
RMSE : 1001.3938353306316
R-Squared : 0.8855381089959089


OLS Coefficients: [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


Ridge Coefficients: [-17406.11927015  51531.47457744   4509.20141652  13654.41294602]


Lasso Coefficients: [-19126.88512097  56908.65399987   5101.01103684  13551.65114574]


Elasticnet Coefficients: [ 989.44297232 2928.64351627 1271.86008179 9798.97587465]


Ridge CV Coefficients: [-18956.71195945  56375.07461522   5042.08225953  13562.90637798]


Best Value of Alpha = 0.001


# Great Job !