# Ridge / Lasso Regression

Addresses some of the issues of simple linear regression by adding a shrinkage coefficient which penalizes the addition of predictors which do not significantly improve the model.

Lasso regression tends to set coeficients to zero, effectively dropping the less significant predictors from the model instead of just minimizing their effect. This can be more desirable as the resultant model contains fewer terms overall and is more simple.

The regularization term, alpha, controls how much the additional terms (added complexity) are penalized. e.g. an alpha of zero will result in no penalty and thus yield the same results as simple linear regaression. As alpha increases towards infinity, the coefficients will be pushed towards zero-values.

![Ridge vs Lasso Regression](../img/ridge-lasso.png)

## Ridge Example

In [22]:
import pandas as pd
import numpy as np
from sklearn.linear_model import RidgeCV
from tabulate import tabulate

In [23]:
concrete = pd.read_csv('../data/concrete.csv')
concrete.Age = concrete.Age.astype(float)
concrete.head()

Unnamed: 0,Cement,Slag,FlyAsh,Water,SPlast,CAgg,FAgg,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28.0,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28.0,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270.0,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365.0,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360.0,44.3


In [24]:
x = concrete.iloc[:, 0:8]
y = np.array(concrete.iloc[:, 8])

In [25]:
reg = RidgeCV(alphas=[1e-3, 1e-2, 1e-1, 0.25, 0.5, 0.75, 1], cv=5).fit(x,y)

In [31]:
print('Best alpha: {}'.format(reg.alpha_))

Best alpha: 1.0


In [27]:
reg.score(x, y)

0.6155198703953081

In [20]:
 print(tabulate(list(zip(x.columns, reg.coef_))))

------  ----------
Cement   0.119804
Slag     0.103866
FlyAsh   0.0879348
Water   -0.149922
SPlast   0.292204
CAgg     0.0180856
FAgg     0.0201901
Age      0.114222
------  ----------


## Lasso Example

In [21]:
from sklearn.linear_model import LassoCV

In [43]:
reg = LassoCV(alphas=[1e-3, 1e-2, 1e-1, 0.25, 0.5, 0.75, 1], cv=5).fit(x,y)

In [44]:
print('Best alpha: {}'.format(reg.alpha_))

Best alpha: 100


In [45]:
reg.score(x, y)

0.46857169517697717

In [46]:
 print(tabulate(list(zip(x.columns, reg.coef_))))

------  ----------
Cement   0.0879115
Slag     0.0529788
FlyAsh   0.035876
Water   -0.0168768
SPlast   0
CAgg    -0
FAgg     0
Age      0.0603667
------  ----------
