# Regularization 

* OLS regression can lead to overfitting (high variance) if we include too many variables or allow our variables to have large coefficients.


* There are many methods of protecting against overfitting; some examples include splitting our data into training and testing datasets, cross-validation, variable selection, regularization.


* Regularization is a process that helps balance the bias variance trade-off by minimizing the residual sum of squares PLUS a penalty, rather than just simply the residual sum of squares used in OLS regression. Below are two regularization methods with their respective penalties
  - Ridge Regression (L2) Penalty =  $\lambda\sum_{j=1}^p \beta_j^2$
    
  - Lasso Regression (L1) Penalty = $\lambda\sum_{j=1}^p |\beta_j|$
  

* Using Sklearn, the code below seeks to minimize the RMSE (root mean squared error) by exploring both Ridge and Lasso Regression
    
      
#### _Importing Necessary Libraries_ 

In [31]:
# Data Mining
import pandas as pd
import random
import numpy as np

# Model Building
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

# Regularizationn
from sklearn.linear_model import Ridge, RidgeCV, Lasso, LassoCV


#### _Reading in Crime Data_ 

In [2]:
# From the url below
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data'

# Setting Random Seed 
random.seed(24)

# Printing first 10 rows of data 
crime = pd.read_csv(url, header=None, na_values=['?'])
crime.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,118,119,120,121,122,123,124,125,126,127
0,8,,,Lakewoodcity,1,0.19,0.33,0.02,0.9,0.12,...,0.12,0.26,0.2,0.06,0.04,0.9,0.5,0.32,0.14,0.2
1,53,,,Tukwilacity,1,0.0,0.16,0.12,0.74,0.45,...,0.02,0.12,0.45,,,,,0.0,,0.67
2,24,,,Aberdeentown,1,0.0,0.42,0.49,0.56,0.17,...,0.01,0.21,0.02,,,,,0.0,,0.43
3,34,5.0,81440.0,Willingborotownship,1,0.04,0.77,1.0,0.08,0.12,...,0.02,0.39,0.28,,,,,0.0,,0.12
4,42,95.0,6096.0,Bethlehemtownship,1,0.01,0.55,0.02,0.95,0.09,...,0.04,0.09,0.02,,,,,0.0,,0.03
5,6,,,SouthPasadenacity,1,0.02,0.28,0.06,0.54,1.0,...,0.01,0.58,0.1,,,,,0.0,,0.14
6,44,7.0,41500.0,Lincolntown,1,0.01,0.39,0.0,0.98,0.06,...,0.05,0.08,0.06,,,,,0.0,,0.03
7,6,,,Selmacity,1,0.01,0.74,0.03,0.46,0.2,...,0.01,0.33,0.0,,,,,0.0,,0.55
8,21,,,Hendersoncity,1,0.03,0.34,0.2,0.84,0.02,...,0.04,0.17,0.04,,,,,0.0,,0.53
9,29,,,Claytoncity,1,0.01,0.4,0.06,0.87,0.3,...,0.0,0.47,0.11,,,,,0.0,,0.15


#### _Data Cleaning_

In [4]:
# Removing Categorical Features 
crime.drop([0, 1, 2, 3, 4], axis=1, inplace=True)

# Removing Missing Values 
crime.dropna(inplace=True)

## Model Building

In [51]:
# Defining X and y variables 
X = crime.drop(127, axis=1)
y = crime[127]

In [52]:
# Splitting into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

In [53]:
# Building linear regression model
linreg = LinearRegression()
linreg.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [54]:
# Making predictions
y_pred = linreg.predict(X_test)

In [55]:
# Calculate RMSE 
original_RMSE = np.sqrt(metrics.mean_squared_error(y_test, y_pred))
print(original_RMSE)

0.23381367649487023


### _Regularization methods below will seek to minimize a RMSE of 0.23381367649487025_
#### _Note : In Sklearn, $\lambda$ = alpha _

## Ridge Regression

In [58]:
# Lambda = 0.001 
ridgereg = Ridge(alpha=0.001, normalize=True)
ridgereg.fit(X_train, y_train)
y_pred = ridgereg.predict(X_test)
print('           Original RSME :',original_RMSE)
print('L2 Regularization Lambda : 0.001 ')
print('                    RMSE :', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

           Original RSME : 0.23381367649487023
L2 Regularization Lambda : 0.001 
                    RMSE : 0.20416245715571235


In [59]:
# Lambda = 0.01 
ridgereg = Ridge(alpha=0.01, normalize=True)
ridgereg.fit(X_train, y_train)
y_pred = ridgereg.predict(X_test)
print('           Original RSME :',original_RMSE)
print('L2 Regularization Lambda : 0.01 ')
print('                    RMSE :', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

           Original RSME : 0.23381367649487023
L2 Regularization Lambda : 0.01 
                    RMSE : 0.17358122138242232


In [61]:
# Creating an array of alpha values
alpha_range = 10.**np.arange(-2, 3)

# Selecting best alpha from range above with RidgeCV
ridgeregcv = RidgeCV(alphas=alpha_range, normalize=True, scoring='neg_mean_squared_error')
ridgeregcv.fit(X_train, y_train)

# Calculating RMSE for best alpha - predict method uses the best alpha value
y_pred = ridgeregcv.predict(X_test)
print('               Original RSME :',original_RMSE)
print('L2 Regularization Best Lambda:', ridgeregcv.alpha_)
print('                        RMSE :', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

               Original RSME : 0.23381367649487023
L2 Regularization Best Lambda: 1.0
                        RMSE : 0.16312978234269376


## LASSO Regression

In [62]:
# Lamda = 0.001
lassoreg = Lasso(alpha=0.001, normalize=True)
lassoreg.fit(X_train, y_train)
y_pred = lassoreg.predict(X_test)
print('           Original RSME :',original_RMSE)
print('L1 Regularization Lambda : 0.001 ')
print('                    RMSE :', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

           Original RSME : 0.23381367649487023
L1 Regularization Lambda : 0.001 
                    RMSE : 0.16003902404387876


In [63]:
# Lamda = 0.01
lassoreg = Lasso(alpha=0.01, normalize=True)
lassoreg.fit(X_train, y_train)
y_pred = lassoreg.predict(X_test)
print('           Original RSME :',original_RMSE)
print('L1 Regularization Lambda : 0.01 ')
print('                    RMSE :', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

           Original RSME : 0.23381367649487023
L1 Regularization Lambda : 0.01 
                    RMSE : 0.19816522542866322


In [65]:
# Selecting the best alpha with LassoCV
lassoregcv = LassoCV(n_alphas=10, normalize=True, random_state=1)
lassoregcv.fit(X_train, y_train)

# Calculating RMSE for best alpha - predict method uses the best alpha value
y_pred = lassoregcv.predict(X_test)
print('               Original RSME :',original_RMSE)
print('L1 Regularization Best Lambda:', lassoregcv.alpha_)
print('                        RMSE :', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

               Original RSME : 0.23381367649487023
L1 Regularization Best Lambda: 0.0014139753866298723
                        RMSE : 0.16020869025931073


