## Regularization in Regression

* **Regularization - is a technique that adds the penalty as model complexity increases.**


* When model complexity will increase, it will lead to overfitting.


* Overfitting happens when model learns signal as well as noise in the data.


* So an overfit model will always perform very well on the training data and underperform on testing / actual data.


* In order to create parsimonious (less complex) model, we employ regularization techniques.


    1. L1-Regularization or Lasso


    2. L2-Regularization or Ridge
    
    
    3. Elasticnet Regularization

### Build a regression model which will try to predict unemployment within an economy

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score



In [2]:
ls dataset/

[31mEcommerce Customers[m[m*    [31mgdp.csv[m[m*                [31mshiller.csv[m[m*
[31mUSA_Housing.csv[m[m*        [31mginzberg.csv[m[m*           [31munemployment-macro.csv[m[m*
[31meconomics.csv[m[m*          [31mmonthly-hpi.csv[m[m*
[31mfed_funds.csv[m[m*          [31mseasons.csv[m[m*


In [3]:
df = pd.read_csv('dataset/economics.csv')

df.head()

Unnamed: 0,date,pce,pop,psavert,uempmed,unemploy
0,1967-07-01,507.4,198712,12.5,4.5,2944
1,1967-08-01,510.5,198911,12.5,4.7,2945
2,1967-09-01,516.3,199113,11.7,4.6,2958
3,1967-10-01,512.9,199311,12.5,4.9,3143
4,1967-11-01,518.1,199498,12.5,4.7,3066


In [4]:
df.shape

(574, 6)

### Data Dictionary

* psavert - personal saving rate


* pce - personal consumption expenditure, USD Billions


* uempmed - median duration of unemployment, weeks


* unemploy - number of unemployed (thousands)


* pop - Population in thousands

In [5]:
df.columns

Index(['date', 'pce', 'pop', 'psavert', 'uempmed', 'unemploy'], dtype='object')

In [9]:
df.describe()

Unnamed: 0,pce,pop,psavert,uempmed,unemploy
count,574.0,574.0,574.0,574.0,574.0
mean,4843.510453,257189.381533,7.936585,8.610105,7771.557491
std,3579.287206,36730.801593,3.124394,4.108112,2641.960571
min,507.4,198712.0,1.9,4.0,2685.0
25%,1582.225,224896.0,5.5,6.0,6284.0
50%,3953.55,253060.0,7.7,7.5,7494.0
75%,7667.325,290290.75,10.5,9.1,8691.0
max,12161.5,320887.0,17.0,25.2,15352.0


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 574 entries, 0 to 573
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   date      574 non-null    object 
 1   pce       574 non-null    float64
 2   pop       574 non-null    int64  
 3   psavert   574 non-null    float64
 4   uempmed   574 non-null    float64
 5   unemploy  574 non-null    int64  
dtypes: float64(3), int64(2), object(1)
memory usage: 27.0+ KB


In [8]:
target = df['unemploy']

feature_cols = ['pce', 'pop', 'psavert', 'uempmed']

In [10]:
df[feature_cols] = df[feature_cols]/df[feature_cols].max()

In [11]:
df.describe()

Unnamed: 0,pce,pop,psavert,uempmed,unemploy
count,574.0,574.0,574.0,574.0,574.0
mean,0.398266,0.801495,0.466858,0.341671,7771.557491
std,0.294313,0.114466,0.183788,0.16302,2641.960571
min,0.041722,0.619258,0.111765,0.15873,2685.0
25%,0.130101,0.700857,0.323529,0.238095,6284.0
50%,0.325087,0.788627,0.452941,0.297619,7494.0
75%,0.630459,0.904651,0.617647,0.361111,8691.0
max,1.0,1.0,1.0,1.0,15352.0


In [14]:
X = df[feature_cols]

y = target

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)

print(f'X_train Shape : {X_train.shape}\n\nX_test Shape : {X_test.shape}')

X_train Shape : (459, 4)

X_test Shape : (115, 4)


### Build a Linear Regression Model

In [15]:
lm = LinearRegression()

lm.fit(X_train, y_train)

LinearRegression()

In [23]:
# Training Accuracy - Accuracy wrt the training data

y_pred_train = lm.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train,y_pred_train))}')
print(f'R-Squared : {(r2_score(y_train,y_pred_train))}')


# Testing Accuracy - Accuracy wrt the testing data

y_pred_test = lm.predict(X_test)

print('\n\nTesting Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_test,y_pred_test))}')
print(f'R-Squared : {(r2_score(y_test,y_pred_test))}')

print(f'\n\nCoefficients : {lm.coef_}')

Training Accuracy
RMSE : 985.0285748757371
R-Squared : 0.8510879906820179


Testing Accuracy
RMSE : 1001.9163075939397
R-Squared : 0.8854186379903031


Coefficients : [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


In [20]:
df['unemploy'].min()

2685

## Regularization

### Ridge Regression or L2-Norm


* Extension of linear regression


* In all regularization, we add a loss function


* loss function is modified to minimize the complexity of the model


* modification is done by introducing a penalty parameter, smoothing parameter, lambda


* In Ridge, the smoothing/penalty param / lambda is sum of squared magnitude of coefficients


**LOSS FUNCTION = OLS + α (sum of squared coefficient values)**


* alpha = smoothing param
   * if α = 0, then it becomes simple OLS only
   * Low α leads to overfitting
   * High α leads to underfitting

In [26]:
rr = Ridge(alpha=0.01)

rr.fit(X_train,y_train)

# Training Accuracy - Accuracy wrt the training data

y_pred_train = rr.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train,y_pred_train))}')
print(f'R-Squared : {(r2_score(y_train,y_pred_train))}')


# Testing Accuracy - Accuracy wrt the testing data

y_pred_test = rr.predict(X_test)

print('\n\nTesting Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_test,y_pred_test))}')
print(f'R-Squared : {(r2_score(y_test,y_pred_test))}')

print(f'\n\nLM Coefficients : {lm.coef_}')
print(f'\n\nRR Coefficients : {rr.coef_}')

Training Accuracy
RMSE : 988.4714111995353
R-Squared : 0.8500452277864494


Testing Accuracy
RMSE : 999.568234828448
R-Squared : 0.8859550702424106


LM Coefficients : [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


RR Coefficients : [-17406.11927015  51531.47457744   4509.20141652  13654.41294602]


### Lasso Regression or L1-Norm

* Lasso - Least absolute shrinkage and selection operator


**LOSS FUNCTION = OLS + α (absolute values of magnitude of coefficients)**


In [29]:
lsm = Lasso(alpha=0.001)

lsm.fit(X_train,y_train)

Lasso(alpha=0.001)

In [30]:
# Training Accuracy - Accuracy wrt the training data

y_pred_train = lsm.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train,y_pred_train))}')
print(f'R-Squared : {(r2_score(y_train,y_pred_train))}')


# Testing Accuracy - Accuracy wrt the testing data

y_pred_test = lsm.predict(X_test)

print('\n\nTesting Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_test,y_pred_test))}')
print(f'R-Squared : {(r2_score(y_test,y_pred_test))}')

print(f'\n\nLM Coefficients : {lm.coef_}')
print(f'\n\nRR Coefficients : {rr.coef_}')
print(f'\n\nLSM Coefficients : {lsm.coef_}')


Training Accuracy
RMSE : 985.0285793909923
R-Squared : 0.8510879893168277


Testing Accuracy
RMSE : 1001.9110813423042
R-Squared : 0.8854198333585475


LM Coefficients : [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


RR Coefficients : [-17406.11927015  51531.47457744   4509.20141652  13654.41294602]


LSM Coefficients : [-19144.90151453  56964.19811383   5107.24474545  13551.21639753]


**Lasso regression is used for feature selection because coefficients of less important/significant variables are reduced to zero.**

### Elasticnet regression

Combines both L1 and L2 normalization

In [32]:
en = ElasticNet()

en.fit(X_train,y_train)

ElasticNet()

In [33]:
# Training Accuracy - Accuracy wrt the training data

y_pred_train = en.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train,y_pred_train))}')
print(f'R-Squared : {(r2_score(y_train,y_pred_train))}')


# Testing Accuracy - Accuracy wrt the testing data

y_pred_test = en.predict(X_test)

print('\n\nTesting Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_test,y_pred_test))}')
print(f'R-Squared : {(r2_score(y_test,y_pred_test))}')

print(f'\n\nLM Coefficients : {lm.coef_}')
print(f'\n\nRR Coefficients : {rr.coef_}')
print(f'\n\nLSM Coefficients : {lsm.coef_}')
print(f'\n\nEN Coefficients : {en.coef_}')


Training Accuracy
RMSE : 2331.4927412156394
R-Squared : 0.16574234437779267


Testing Accuracy
RMSE : 2690.6795263008275
R-Squared : 0.17362905746232427


LM Coefficients : [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


RR Coefficients : [-17406.11927015  51531.47457744   4509.20141652  13654.41294602]


LSM Coefficients : [-19144.90151453  56964.19811383   5107.24474545  13551.21639753]


EN Coefficients : [ 697.72334777  284.66132963 -207.46899075  594.21973384]


* Ridge works well when there are many large parameters of similar value


* Lasso works well when there are small number of significant parameters and coefficients of others are close to zero

#### How to select value of alpha

In [36]:
from sklearn.linear_model import LassoCV, RidgeCV, ElasticNetCV

rrcv = RidgeCV(alphas=[0.001,0.01,0.1,1,10])

rrcv.fit(X_train,y_train)

# Training Accuracy - Accuracy wrt the training data

y_pred_train = rrcv.predict(X_train)

print('Training Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_train,y_pred_train))}')
print(f'R-Squared : {(r2_score(y_train,y_pred_train))}')


# Testing Accuracy - Accuracy wrt the testing data

y_pred_test = rrcv.predict(X_test)

print('\n\nTesting Accuracy')

print(f'RMSE : {np.sqrt(mean_squared_error(y_test,y_pred_test))}')
print(f'R-Squared : {(r2_score(y_test,y_pred_test))}')

print(f'\n\nLM Coefficients : {lm.coef_}')
print(f'\n\nRR Coefficients : {rr.coef_}')
print(f'\n\nLSM Coefficients : {lsm.coef_}')
print(f'\n\nEN Coefficients : {en.coef_}')
print(f'\n\nRRCV Coefficients : {rrcv.coef_}')
print(f'\n\nBest Value of Alpha : {rrcv.alpha_}')

Training Accuracy
RMSE : 985.0698654882659
R-Squared : 0.851075506177294


Testing Accuracy
RMSE : 1001.3938353306316
R-Squared : 0.8855381089959089


LM Coefficients : [-19146.90350229  56970.37011106   5107.93739571  13551.16810809]


RR Coefficients : [-17406.11927015  51531.47457744   4509.20141652  13654.41294602]


LSM Coefficients : [-19144.90151453  56964.19811383   5107.24474545  13551.21639753]


EN Coefficients : [ 697.72334777  284.66132963 -207.46899075  594.21973384]


RRCV Coefficients : [-18956.71195945  56375.07461522   5042.08225953  13562.90637798]


Best Value of Alpha : 0.001


# Great Job !