## Regularized Regression in scikit-learn

### Load data into dataframe

In [1]:
import pandas as pd
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data'
crime = pd.read_csv(url, header=None, na_values=['?'])
crime.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,118,119,120,121,122,123,124,125,126,127
0,8,,,Lakewoodcity,1,0.19,0.33,0.02,0.9,0.12,...,0.12,0.26,0.2,0.06,0.04,0.9,0.5,0.32,0.14,0.2
1,53,,,Tukwilacity,1,0.0,0.16,0.12,0.74,0.45,...,0.02,0.12,0.45,,,,,0.0,,0.67
2,24,,,Aberdeentown,1,0.0,0.42,0.49,0.56,0.17,...,0.01,0.21,0.02,,,,,0.0,,0.43
3,34,5.0,81440.0,Willingborotownship,1,0.04,0.77,1.0,0.08,0.12,...,0.02,0.39,0.28,,,,,0.0,,0.12
4,42,95.0,6096.0,Bethlehemtownship,1,0.01,0.55,0.02,0.95,0.09,...,0.04,0.09,0.02,,,,,0.0,,0.03


In [2]:
# examine the response variable
crime[127].describe()

count    1994.000000
mean        0.237979
std         0.232985
min         0.000000
25%         0.070000
50%         0.150000
75%         0.330000
max         1.000000
Name: 127, dtype: float64

In [3]:
# remove categorical features
crime.drop([0, 1, 2, 3, 4], axis=1, inplace=True)

# remove rows with any missing values
crime.dropna(inplace=True)

# check the shape
crime.shape

(319, 123)

## Simple Regression

### 1. Define X and y. Split into training and testing sets

In [4]:
from sklearn.cross_validation import train_test_split
X = crime.drop(127, axis=1)
y = crime[127]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)



### 2. Build a Linear regression model

In [5]:
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
linreg.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [15]:
### examine the coefficients
# print linreg.coef_

### 3. Make predictions and calculate RMSE

In [7]:
y_pred = linreg.predict(X_test)

In [8]:
from sklearn import metrics
import numpy as np
print np.sqrt(metrics.mean_squared_error(y_test, y_pred))

0.233813676495


#### Notes: This RMSE is what we are going to compare before and after regularization.

## Ridge

### 1. Make Ridge regression and tune alpha = 0 and alpha = 0.1 to see difference

In [11]:
# alpha=0 is equivalent to linear regression
from sklearn.linear_model import Ridge
ridgereg = Ridge(alpha=0, normalize=True)
ridgereg.fit(X_train, y_train)
y_pred = ridgereg.predict(X_test)
print('RMSE when alpha=0: ')
print np.sqrt(metrics.mean_squared_error(y_test, y_pred))


# try alpha=0.1
ridgereg = Ridge(alpha=0.1, normalize=True)
ridgereg.fit(X_train, y_train)
y_pred = ridgereg.predict(X_test)
print('RMSE when alpha=0.1: ')
print np.sqrt(metrics.mean_squared_error(y_test, y_pred))

RMSE when alpha=0: 
0.233813676495
RMSE when alpha=0.1: 
0.164279068049


### 2. Create an array of alpha values

In [13]:
alpha_range = 10.**np.arange(-2, 3)
alpha_range

array([  1.00000000e-02,   1.00000000e-01,   1.00000000e+00,
         1.00000000e+01,   1.00000000e+02])

### 3. Select the best alpha with RidgeCV

In [14]:
from sklearn.linear_model import RidgeCV
ridgeregcv = RidgeCV(alphas=alpha_range, normalize=True, scoring='mean_squared_error')
ridgeregcv.fit(X_train, y_train)
ridgeregcv.alpha_

  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)
  sample_weight=sample_weight)


1.0

### 4. Predict method uses the best alpha value

In [16]:
y_pred = ridgeregcv.predict(X_test)
print np.sqrt(metrics.mean_squared_error(y_test, y_pred))

0.163129782343


#### Notes: Pay attention to this number

## Lasso regression

### 1. Make lasso regression with alpha = 0.001 and alpha = 0.01 and examine coefficients respectively

In [17]:
from sklearn.linear_model import Lasso
lassoreg = Lasso(alpha=0.001, normalize=True)
lassoreg.fit(X_train, y_train)
print lassoreg.coef_

[ 0.          0.          0.00891952 -0.27423369  0.          0.          0.
 -0.         -0.          0.          0.          0.         -0.         -0.
 -0.         -0.19414627  0.          0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.          0.          0.          0.
  0.04335664 -0.          0.         -0.          0.03491474 -0.
 -0.06685424  0.          0.         -0.          0.10575313  0.          0.
  0.00890807  0.         -0.1378172  -0.30954312 -0.         -0.         -0.
 -0.          0.          0.          0.          0.         -0.          0.
  0.          0.          0.          0.          0.         -0.          0.
  0.          0.         -0.          0.         -0.         -0.          0.
  0.05257892 -0.          0.         -0.         -0.          0.          0.
  0.          0.          0.         -0.         -0.         -0.         -0.
 -0.         -0.         -0.          0.         -0.         -0.          0.
  0.1386108

In [18]:
lassoreg = Lasso(alpha=0.01, normalize=True)
lassoreg.fit(X_train, y_train)
print lassoreg.coef_

[ 0.          0.          0.         -0.03974695  0.          0.          0.
  0.          0.         -0.          0.          0.         -0.         -0.
 -0.         -0.         -0.          0.         -0.         -0.         -0.
 -0.         -0.         -0.         -0.         -0.         -0.          0.
  0.          0.          0.         -0.          0.         -0.         -0.
  0.          0.         -0.          0.          0.          0.          0.
  0.         -0.         -0.27503063 -0.         -0.         -0.         -0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -0.          0.          0.
  0.          0.          0.          0.         -0.          0.          0.
 -0.          0.         -0.         -0.          0.          0.         -0.
  0.          0.         -0.         -0.         -0.         -0.         -0.
 -0.         -0.          0.          0.         -0.          0.          0.

### 2. Calculate RMSE when alpha = 0.01

In [19]:
y_pred = lassoreg.predict(X_test)
print np.sqrt(metrics.mean_squared_error(y_test, y_pred))

0.198165225429


### 3. Select the best alpha with LassoCV

In [20]:
from sklearn.linear_model import LassoCV
lassoregcv = LassoCV(n_alphas=100, normalize=True, random_state=1)
lassoregcv.fit(X_train, y_train)
lassoregcv.alpha_



0.0015161594598125873

### 4. Predict method uses the best alpha value

In [21]:
y_pred = lassoregcv.predict(X_test)
print np.sqrt(metrics.mean_squared_error(y_test, y_pred))

0.160209558014


## Basic Concept Quiz

### 1. What do you understand by Bias Variance trade off?

<img src="image.png">

Bias error is useful to quantify how much on an average are the predicted values different from the actual value. A high bias error means we have a under-performing model which keeps on missing important trends. Variance on the other side quantifies how are the prediction made on same observation different from each other. A high variance model will over-fit on your training population and perform badly on any observation beyond training.

### 2.When does regularization becomes necessary in Machine Learning?

**Answer**: Regularization becomes necessary when the model begins to ovefit / underfit. This technique introduces a cost term for bringing in more features with the objective function. Hence, it tries to push the coefficients for many variables to zero and hence reduce cost term. This helps to reduce model complexity so that the model can become better at predicting (generalizing).

### 3.When is Ridge regression favorable over Lasso regression?

**Answer:** Conceptually, we can say, lasso regression (L1) does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model. In presence of correlated variables, ridge regression might be the preferred choice. Also, ridge regression works best in situations where the least square estimates have higher variance. Therefore, it depends on our model objective.