<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Regularisation

# Constraining Coefficients

What does linear regression try to **mimimise**?

Idea: why not also penalise the **size** of the coefficients?

Consequence: models cannot get too complex

This is the idea behind **regularisation**


Regularisation is an additive approach to protect models against overfitting (being potentially biased and overconfident, not generalising well).

Regularisation becomes an additional weight to coefficients, shrinking them closer to zero. Provides us with simpler models.




**Lasso regression** (or "L1 regularisation") minimises: $$\text{SSE} + \alpha \sum_{j=1}^p |\beta_j|$$

- this can even shrink coefficients to 0 to limit complexity further

**Ridge regression** (or "L2 regularisation") minimises: $$\text{SSE} + \alpha \sum_{j=1}^p \beta_j^2$$

- can be used even if there is multicollinearity

Use Lasso when we have more features than observations (k > n) and Ridge otherwise.


#### Hyperparameters

- $\alpha$ is the penalty term (the bigger it is, the higher the penalty)

- but what is a good value of $\alpha$?

- depends, but we can find out by trying different values

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [2]:
bikes = pd.read_csv("assets/data/bikeshare.csv")
bikes = pd.get_dummies(bikes, columns=["holiday", "season", "workingday", "weather"], drop_first=True)

X = bikes[["temp", "humidity", "windspeed", "holiday_1", "season_2", "season_3", "season_4",
           "workingday_1", "weather_2", "weather_3", "weather_4"]]

y = bikes["count"]

X.head(2)

Unnamed: 0,temp,humidity,windspeed,holiday_1,season_2,season_3,season_4,workingday_1,weather_2,weather_3,weather_4
0,9.84,81,0.0,0,0,0,0,0,0,0,0
1,9.02,80,0.0,0,0,0,0,0,0,0,0


In [3]:
from sklearn.linear_model import LinearRegression, Ridge

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

lr = LinearRegression()
lr.fit(X_train, y_train)

ridge = Ridge(alpha=100)
ridge.fit(X_train, y_train)

print(lr.coef_, np.sqrt(mean_squared_error(y_test, lr.predict(X_test))), "\n")
print(ridge.coef_, np.sqrt(mean_squared_error(y_test, ridge.predict(X_test))))

[ 11.19896207  -2.81479401   0.41698205  -9.36978344  -4.36073238
 -43.42518093  69.11719289   0.47992617  14.96044248  -6.07414068
 186.8354393 ] 153.84885793631048 

[ 10.96274269  -2.8151251    0.41089773  -6.14390027  -3.07501539
 -38.96244713  65.33144486   0.86977511  14.20301373  -5.35355702
   1.82417812] 153.75473253544243


In [4]:
from sklearn.linear_model import Lasso

lasso = Lasso()

lasso.fit(X_train, y_train)

print(lr.coef_, np.sqrt(mean_squared_error(y_test, lr.predict(X_test))), "\n")
print(ridge.coef_, np.sqrt(mean_squared_error(y_test, ridge.predict(X_test))), "\n")
print(lasso.coef_, np.sqrt(mean_squared_error(y_test, lasso.predict(X_test))))

[ 11.19896207  -2.81479401   0.41698205  -9.36978344  -4.36073238
 -43.42518093  69.11719289   0.47992617  14.96044248  -6.07414068
 186.8354393 ] 153.84885793631048 

[ 10.96274269  -2.8151251    0.41089773  -6.14390027  -3.07501539
 -38.96244713  65.33144486   0.86977511  14.20301373  -5.35355702
   1.82417812] 153.75473253544243 

[ 10.67469621  -2.84195743   0.38941915  -0.           0.
 -31.73575768  66.82192433   0.          10.83697365  -0.
   0.        ] 153.75744463354064


In [5]:
for z in zip(X_train.columns, lasso.coef_):
    print(z)

('temp', 10.67469621435747)
('humidity', -2.8419574279004234)
('windspeed', 0.38941915206437666)
('holiday_1', -0.0)
('season_2', 0.0)
('season_3', -31.735757681951384)
('season_4', 66.82192433206002)
('workingday_1', 0.0)
('weather_2', 10.836973651343946)
('weather_3', -0.0)
('weather_4', 0.0)


## How do I find a good $\alpha$?

- intuitively you could try lots of values and see which is best!

- this is called **grid search**

In [6]:
import warnings
warnings.filterwarnings("ignore")

In [7]:
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(estimator=Ridge(),
                    param_grid={'alpha': np.logspace(-10, 10, 21)},
                    scoring='neg_mean_squared_error',
                    return_train_score=True,
                    cv=10)

grid.fit(X_train,y_train)

GridSearchCV(cv=10, error_score='raise',
       estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'alpha': array([1.e-10, 1.e-09, 1.e-08, 1.e-07, 1.e-06, 1.e-05, 1.e-04, 1.e-03,
       1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05,
       1.e+06, 1.e+07, 1.e+08, 1.e+09, 1.e+10])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring='neg_mean_squared_error', verbose=0)

In [8]:
print(np.sqrt(-grid.best_score_), grid.best_params_)

best_model = grid.best_estimator_
np.sqrt(mean_squared_error(y_test, best_model.predict(X_test)))

154.5088062578521 {'alpha': 10.0}


153.83700486774714

In [9]:
grid.cv_results_

{'mean_fit_time': array([0.00680244, 0.00463779, 0.00457971, 0.00559659, 0.00452681,
        0.00465605, 0.00458229, 0.00464723, 0.0095948 , 0.01014771,
        0.00724909, 0.01003737, 0.00472565, 0.00474358, 0.00467589,
        0.00498726, 0.00529463, 0.00745306, 0.00589807, 0.0048497 ,
        0.00448587]),
 'std_fit_time': array([3.63900869e-03, 7.83878121e-05, 4.58088960e-05, 3.29776743e-03,
        2.79303677e-05, 1.83389440e-04, 6.29210300e-05, 1.77206137e-04,
        6.18796498e-03, 3.82905054e-03, 2.52001868e-03, 7.60858123e-03,
        3.90338135e-04, 2.55656877e-04, 2.11168224e-04, 8.03428901e-04,
        7.74584149e-04, 2.33200899e-03, 1.48897016e-03, 8.80440078e-04,
        3.48532492e-05]),
 'mean_score_time': array([0.00066605, 0.00062261, 0.00061157, 0.00061171, 0.00060542,
        0.00062947, 0.00061276, 0.00061934, 0.00099106, 0.0007194 ,
        0.00074379, 0.00075047, 0.00062628, 0.00063686, 0.0006218 ,
        0.00068192, 0.00094795, 0.00133045, 0.00073693, 0.000630