## Optimizing $\alpha$ value for Ridge

Ridge is nothing but Regularized version of Least Squares. Often we don't know which value of $\alpha$ would give us the best results. What we can do is try out with different values and then select the one with the best cross-validation accuracy. 

In [6]:
# Create made up data
import numpy as np

from sklearn.datasets import make_regression

X, y = make_regression(n_features=3, effective_rank=2, noise=10)
# effective rank is the number of variables that 
# are enough to describe the input variables. Hence most 
# of the input data will be linear combination of these
# singluar vectors. Rest of the variables will be fairly
# irrelevant to the output. 

# noise is the standard deviation of the gaussian applied to
# output

In [7]:
alpha_grid = np.linspace(0.1, 1, 10)
alpha_grid

array([ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ])

In [15]:
from sklearn.linear_model import RidgeCV

clf = RidgeCV(alphas=alpha_grid, store_cv_values=True)
clf.fit(X, y)
print("Best alpha: {}".format(clf.alpha_))
print("Costs: {}".format(clf.cv_values_[:2]))

Best alpha: 0.1
Costs: [[ 122.02900475  117.88653491  114.68338853  112.11919036  110.01323844
   108.24900681  106.74737894  105.45245426  104.32345958  103.32987127]
 [  40.10985911   29.2329253    21.84018491   16.62836218   12.84723406
    10.0401097     7.91642622    6.28452633    5.01412181    4.0143423 ]]


It will internally run multiple iterations of cross validation and select the alpha with least average cost. 

## Scoring with Mean Absolute Error

In [16]:
from sklearn.metrics import mean_absolute_error, make_scorer
l1_error = make_scorer(mean_absolute_error, greater_is_better=False)

In [18]:
clf = RidgeCV(alphas=alpha_grid, store_cv_values=True, scoring=l1_error)
clf.fit(X, y)
print("Best alpha: {}".format(clf.alpha_))
print("Costs: {}".format(clf.cv_values_[:2]))

Best alpha: 0.1
Costs: [[ 1.78981083  1.60069349  1.45216994  1.33177187  1.23185649  1.1474152
   1.07499923  1.01214145  0.95702254  0.90826707]
 [ 8.01265854  7.08617217  6.35277241  5.75721293  5.26372789  4.84803726
   4.49303849  4.1863198   3.91864753  3.68300641]]


While the best alpha is same, the error are relatively smaller. This is because of the fact that by defualt RMSE is used (in the previous example) and in the last example we've used MAE

Same technique is also applicable to __Lasso__ and __LassoCV__