# Ridge Regression Lab

### Introduction

In this lesson, we'll explore ridge regression as applied to the diabetes dataset in sklearn.  Let's get started.

### Ridge Regression Application

Let's start by loading our diabetes data from the sklearn datasets module.

In [1]:
import pandas as pd
from sklearn.datasets import load_diabetes

diabetes = load_diabetes()
X = pd.DataFrame(diabetes['data'], columns = diabetes['feature_names'])
y = pd.Series(diabetes['target'])

Let's begin by split our training and test data with a `test_size` of `.2`, and a `random_state` of `2`.

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = .2,
                                                    random_state = 2)

Now when we use ridge regression, we'll need to ensure that all of our features are scaled similarly.  

> Remember that ridge regression works by finding penalizing features that have larger magnitude, so we want to make sure that all features are on a similar scale.

So let's begin by scaling our training data.  Use the standard scaler to scale the data in `X_train`, and reassign the data to a pandas dataframe with the correct columns. 

In [10]:
scaler = StandardScaler()
X_train_transformed = scaler.fit_transform(X_train)
X_train_transformed_df = pd.DataFrame(X_train_transformed, columns = X.columns)
X_transformed_df[:3]

# 	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6
# 0	0.800500	1.065488	1.297088	0.459840	-0.929746	-0.732065	-0.912451	-0.054499	0.418551	-0.370989
# 1	-0.039567	-0.938537	-1.082180	-0.553511	-0.177624	-0.402886	1.564414	-0.830301	-1.436551	-1.938479
# 2	1.793307	1.065488	0.934533	-0.119218	-0.958674	-0.718897	-0.680245	-0.054499	0.060207	-0.545154

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,0.8005,1.065488,1.297088,0.45984,-0.929746,-0.732065,-0.912451,-0.054499,0.418551,-0.370989
1,-0.039567,-0.938537,-1.08218,-0.553511,-0.177624,-0.402886,1.564414,-0.830301,-1.436551,-1.938479
2,1.793307,1.065488,0.934533,-0.119218,-0.958674,-0.718897,-0.680245,-0.054499,0.060207,-0.545154


Let's then use our scaler to transform our data in `X_test`, and let the result be a dataframe.

In [14]:
X_test_transformed = pd.DataFrame(scaler.transform(X_test), columns = X.columns)

X_test_transformed[:2]


# age	sex	bmi	bp	s1	s2	s3	s4	s5	s6
# 0	1.394248	-0.910503	0.117078	0.755914	1.060471	0.703755	1.443097	-0.802763	-0.006593	0.438779
# 1	-1.618727	1.098294	1.688238	1.117483	1.669636	1.406504	0.511315	0.002235	0.856628	-0.179012

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,1.394248,-0.910503,0.117078,0.755914,1.060471,0.703755,1.443097,-0.802763,-0.006593,0.438779
1,-1.618727,1.098294,1.688238,1.117483,1.669636,1.406504,0.511315,0.002235,0.856628,-0.179012


In [17]:
X_train.shape
# (331, 10)

(353, 10)

In [18]:
X_test.shape
# (111, 10)

(89, 10)

### Training a Baseline Model

Before moving onto ridge regression, let's train a linear regression normal linear regression model.  Assign the model to the variable `linear_model`, and fit the model on the training data.

In [15]:
from sklearn.linear_model import LinearRegression
linear_model = LinearRegression()
linear_model.fit(X_train_transformed, y_train)

LinearRegression()

Then let's score the model on both the training data, and then the test data.

In [17]:
train_score = linear_model.score(X_train_transformed, y_train)
train_score
# 0.5323676737904688

0.5323676737904688

In [18]:
test_score = linear_model.score(X_test_transformed, y_test)
test_score
# 0.4399387660024644

0.4399387660024644

Notice that the training score is higher than the test score.  This is a sign of our model overfitting to the randomness in the model.  This is a sign of variance in our model, as with different training data, our model's parameters will vary.

We know that one symptom of variance in the model is coefficients with a larger magnitude.  Let's measure the magnitude of the coefficients using the L2 norm. 

> First we'll assign the coefficients to a series, and sort the values.

In [22]:
import numpy as np
coef_series = pd.Series(index = X.columns, data = linear_model.coef_)
np.abs(coef_series).sort_values(ascending = True)[:3]

# age    0.441786
# s6     2.460215
# s4     5.809504
# dtype: float64

age    0.441786
s6     2.460215
s4     5.809504
dtype: float64

Now we know that with ridge regression, we measure the magnitude of the coefficients with L2 norm squared.  Calculate the size of the L2 norm squared.  

In [44]:
(coef_series**2).sum()

# 5263.051521586297

5263.051521586297

So we see the L2 norm of the coefficients is a `5263.051521586297`, and that we have a model that has a lower score on the test set than the test set.  We'll use this linear model as a baseline to our ridge regression.

### Ridge Regression

Now, that we've trained our baseline linear regression model, let's see how our ridge regression model compares.  Let's import Ridge from sklearn's linear_model module, and initialize our model with alpha set equal to 1.

In [26]:
from sklearn.linear_model import Ridge
ridge = Ridge(alpha = 1)
ridge.fit(X_train_transformed, y_train)

# Ridge(alpha=1)

Ridge(alpha=1)

Then let's fit the model to the training data, and assess the score on both the training and test sets.

In [27]:
ridge.score(X_train_transformed, y_train)
# 0.5320679342756284

0.5320698844221448

In [28]:
ridge.score(X_test_transformed, y_test)
# 0.4416770567731113

0.44168863198574837

So we can see that our ridge regression performs slightly better on the test set than our previous model which had a score of `.4399`.  

> One reason for the lack of improvement in scores could be a lack of multicollinearity between the data.  

Next calculate the L2 norm of the parameters of our ridge regression model.

In [29]:
(ridge.coef_**2).sum()
# 3789.1129238233434

3789.6597179177006

So we can see that our ridge regression model did succeed in reducing the L2 norm of our model's coefficients, as our initial regression model had a 
an L2 norm of `5263.051521586297`.  We did this to reduce error from variance, which it looks like was achieved.

### Finding our Hyperparameter

Now one question we may have is how to find the correct value for alpha.  To do so, we treat alpha as a hyperparameter, and we choose alpha by evaluating the performance of our model on the validation set.

In ridge regression, we can use the RidgeCV constructor to perform cross validation for us. 

> Because we do not have a complicated validation scheme, we do not have to worry about using TimeSeriesSplit, or a custom validation scheme.  

Let's import `RidgeCV` from our `linear_model` module.

In [4]:
from sklearn.linear_model import RidgeCV

Now `RidgeCV` takes a parameter called `alphas` which is a list of all of the alphas it will try out.

Use linspace to initialize a numpy array of 200 alphas, ranging from `.01` to `10`.

In [6]:
import numpy as np
alphas = np.linspace(.01, 10, 200)

alphas.shape
# (200,)

(200,)

Next, initialize the RidgeCV model passing through the list of alphas.

In [33]:
ridge_cv = RidgeCV(alphas = alphas)

Finally call `fit` on ridge_cv, and then call score, passing through the training data.

In [34]:
ridge_cv.fit(X_train_transformed, y_train)
ridge_cv.score(X_train_transformed, y_train)

# 0.5319458627105029

0.5319458627105029

This returns to us the score of the model that performed the best, according to cross validation.  Let's see how this model performs on the test data.

In [36]:
ridge_cv.score(X_test_transformed, y_test)

# 0.44198968565738184

0.44198968565738184

> Note: If we did not previously scale our data, ridge regression will do it for us with a call to `RidgeCV(normalize = True)`.

Finally, we can view the hyperparameter for the model that performed the best with a call to `alpha_`.  Call this on our `ridge_cv` model.

In [37]:
ridge_cv.alpha_

# 1.2650251256281408

1.2650251256281408

And we can see the coefficients, by calling `coef_`.

In [38]:
ridge_cv.coef_
# array([ -0.39250354,  -9.61825496,  24.63103684,  16.13971359,
#        -30.60423022,  17.07655128,   2.13699152,   4.56567086,
#         36.55650019,   2.51599225])

array([ -0.39250354,  -9.61825496,  24.63103684,  16.13971359,
       -30.60423022,  17.07655128,   2.13699152,   4.56567086,
        36.55650019,   2.51599225])

### Summary

In this lesson, we saw how to train a ridge regression model, and how it compares to our linear regression model.  Because our ridge regression model penalizes parameters with higher coefficients, we first scaled our data using the StandardScaler.  Then, when applying ridge regression we saw that it trained a model with a similar score but with smaller parameters, as assessed by the L2 norm.  

Finally, we saw how we can find the correct value of alpha to balance the bias variance tradeoff.  We did so by initializing RidgeCV with a list of alphas, and then finding the value of alpha that best predicted the validation set according to cross validation.