# Grid Search

_By Jeff Hale (mostly)_

---

## Learning Objectives
By the end of this lesson students will be able to:

- Understand what grid searching is
- Use `GridSearchCV` class from sklearn to find optimal hyperparameters
- Differentiate `cross_val_score` from `GridSearchCV`

---

## GridSearch CV
GridSearchCV is a nifty sklearn class. 😀 

It performs cross validation, shuffles the data by default, and searches over a bunch of parameters.

It replaces the old way of using a `for` loop with `cross_val_score`. 

Using GridSearchCV is the best way to optimize hyperparameters.

## Hyperparameters vs parameters.

Definition 1 of `parameters`: the things a function accepts. When you pass them to a function they are called `arguments`. This distinction is commonly misused.

Definition 2 of `parameters`: the weights in a model. For example, the $ \beta $ values in a linear regression equation.

`hyperparameters` are the arguments you choose for a model that can have different values. You tune these to improve model performance. For example the most important hyperparameter for a KNN model is `n_neighbors` (the number of nearest neighbors to include in the model). 


### Just remember: YOU choose the hyperparameters.

In [1]:
# imports
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge

In [2]:
# read in the data
boston = pd.read_csv('../data/boston_data.csv')

In [3]:
# inspect 
boston.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,5.33,36.2


In [4]:
# break into X and y
X = boston.drop('MEDV', axis=1)
y = boston['MEDV']

In [5]:
X.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,LSTAT
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,4.98
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,5.33


In [6]:
y.head()

0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
Name: MEDV, dtype: float64

In [7]:
# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)

## Standardize and Scale


In [12]:
sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train, y_train)
X_test_sc = sc.transform(X_test)

In [13]:
X_train_sc[:2]

array([[-0.39729487, -0.48704812, -0.1859885 , -0.26001083, -0.06705854,
        -0.51000005,  0.68974564, -0.62388631, -0.42954918,  0.11356132,
        -0.30420957,  0.41897607],
       [-0.40512635,  0.96592146, -0.75239087, -0.26001083, -1.0583109 ,
         0.91825466, -1.79963863,  0.78608936, -0.31594196, -0.49279803,
        -1.10552691, -1.05446122]])

In [9]:
X_test_sc[:2]

array([[-0.41106527,  1.75650785, -0.86364848, -0.26001083, -1.30177639,
         0.01441605, -0.81447805,  1.63595748, -0.42954918, -0.70472946,
        -0.86984534, -0.68407423],
       [-0.40698377, -0.48704812, -1.27689102, -0.26001083, -0.58007511,
        -0.94904609,  0.7462423 , -0.37956377, -0.77037082, -1.29342785,
        -0.30420957,  0.17835971]])

## cross_val_score with Lasso

Make a for loop with cross_val_score and a Lasso model to find a good value for the alpha hyperparameter. Try these values: [.1, .5, 1, 1.5, 2]

In [16]:
alpha_list = [.1, .5, 1, 1.5, 2]

mean_score_list = []

for alpha_value in alpha_list:
    lasso = Lasso(alpha=alpha_value)
    mean_score = cross_val_score(lasso, X_train_sc, y_train).mean()
    mean_score_list.append(mean_score)
    
print(mean_score_list)

[0.6767623578816465, 0.639593169911995, 0.6248990204892075, 0.608345222603005, 0.5800226543267771]


### Now to actually score on the test set with the best hyperparameter.

In [None]:
lasso = Lasso(alpha=.1)
lasso.fit(X_train, y_train)
lasso.score(X_test, y_test

In [17]:
list(zip(alpha_list, mean_score_list))

[(0.1, 0.6767623578816465),
 (0.5, 0.639593169911995),
 (1, 0.6248990204892075),
 (1.5, 0.608345222603005),
 (2, 0.5800226543267771)]

### Now to actually score on the test set with the best hyperparameter.

In [35]:
lasso = Lasso(alpha=.1)
lasso.fit(X_train, y_train)
lasso.score(X_test, y_test)

0.7171948200866496

## GridSearch Syntax

`GridSearchCV` accepts an sklearn `estimator` object and a parameter grid.

The param grid is a dictionary. The key is the name of the hyperparameter argument in sklearn.  

The value is an iterable to search over (generally a list or a range-style object).

What's an iterable? Something Python can iterate over. 😀

Let's use GridSearch with a Lasso model and different values for alpha.

In [None]:
# set up a param grid with the following:
#     alpha: [.1, .5, 1, 1.5, 2]

In [18]:
params = {'alpha': [.1, .5, 1, 1.5, 2] }

In [19]:
# instantiate our gridsearch object
gs = GridSearchCV(Lasso(), param_grid=params)

We use this the same as other models, `fit`ting and `score`ing like normal (but now using the hyperparameters that gave us the best results).

In [21]:
# fit on the training data
gs.fit(X_train_sc, y_train)

GridSearchCV(cv=None, error_score=nan,
             estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=1000, normalize=False, positive=False,
                             precompute=False, random_state=None,
                             selection='cyclic', tol=0.0001, warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid={'alpha': [0.1, 0.5, 1, 1.5, 2]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

In [27]:
# score the training data
gs.score(X_train_sc, y_train)

0.7247620248374471

In [26]:
# score the test data
gs.score(X_test_sc, y_test)

0.7445138376497066

So what are our best parameters?

In [28]:
# look at `.best_params_`
gs.best_params_

{'alpha': 0.1}

Note that we'll use our `best_estimator_` method to access the model that was fit with our `best_params_`.

Call `best_estimator_` to see all the values in the best-performing model.

In [29]:
gs.best_estimator_

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

Use GridSearchCV with Ridge and different values of alpha. 

In [31]:
params = {'alpha': [.1, .5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5]}

gs_r = GridSearchCV(Ridge(), param_grid=params)
gs_r.fit(X_train_sc,y_train)
gs_r.score(X_test_sc,y_test)

0.7422629088696879

### GridSearchCV 

Calling fit, GridSearchCV will automatically fit the best performing model with X_train and y_train. 

Then you can use score with that best performing, nicely fit model.

In [32]:
gs_r.best_params_

{'alpha': 5.5}

In [33]:
gs_r.best_estimator_

Ridge(alpha=5.5, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

### How was that?

# Summary

You've seen `GridSearchCV`. Isn't it cool? 👍

## Check for understanding

- Why would you want to use `GridSearchCV`?
- What do you pass `GridSearchCV`?
- How do you specify the parameter grid?
- Does `GridSearchCV` randomize the data for cross validation?

`GridSearchCV` is an extremely powerful tool for your toolkit! 🛠


In [None]:
GridSearchCV()

In [None]:
Lasso()