# Grid Search Solution

_By Jeff Hale_

---

## Learning Objectives
By the end of this lesson students will be able to:

- Understand what grid searching is
- Use `GridSearchCV` class from sklearn to find optimal hyperparameters
- Differentiate `cross_val_score` from `GridSearchCV`

---

## GridSearch CV
GridSearchCV is a nifty sklearn class. 😀 

It performs cross validation, shuffles the data by default, and searches over a bunch of parameters.

It replaces the old way of using a `for` loop with `cross_val_score`. 

Using GridSearchCV is the best way to optimize hyperparameters.

## Hyperparameters vs parameters.

Definition 1 of `parameters`: the things a function accepts. When you pass them to a function they are called `arguments`. This distinction is commonly misused.

Definition 2 of `parameters`: the weights in a model. For example, the $ \beta $ values in a linear regression equation.

`hyperparameters` are the arguments you choose for a model that can have different values. You tune these to improve model performance. For example the most important hyperparameter for a KNN model is `n_neighbors` (the number of nearest neighbors to include in the model). 


### Just remember: YOU choose the hyperparameters.

In [1]:
# imports

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge


In [2]:
# read in the data
boston = pd.read_csv('../data/boston_data.csv')

In [3]:
# inspect 
boston.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,5.33,36.2


In [4]:
# break into X and y
X = boston.drop('MEDV', axis=1)
y = boston['MEDV']

In [5]:
X.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,LSTAT
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,4.98
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,5.33


In [6]:
y.head()

0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
Name: MEDV, dtype: float64

In [7]:
# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)

## Standardize and Scale


In [17]:
sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train, y_train)  
X_test_sc = sc.transform(X_test)

In [18]:
X_train_sc[:2]

array([[-0.39711602, -0.48059791, -0.87056859, -0.25417856, -0.34414942,
        -0.34596744, -1.21038427,  1.04989059, -0.51953647, -1.08472137,
         0.81625343, -0.6636625 ],
       [-0.03245567, -0.48059791,  1.22519728,  3.9342421 ,  2.73729757,
        -1.23546049,  1.11533919, -1.16648972, -0.51953647, -0.01953236,
        -1.75706821,  1.95883254]])

In [20]:
X_test_sc[:2]

array([[ 5.28110885, -0.48059791,  1.00964944, -0.25417856,  0.36562208,
        -0.73880349,  1.11533919, -1.12305272,  1.69287165,  1.54552187,
         0.81625343, -0.37087996],
       [-0.3951446 , -0.48059791, -0.97251689, -0.25417856, -0.91542892,
         0.4860032 , -0.44342813,  0.314408  , -0.75242154, -0.94785351,
         0.0208631 , -0.87000448]])

## cross_val_score with Lasso

Make a for loop with cross_val_score and a Lasso model to find a good value for the alpha hyperparameter. Try these values for alpha: [.1, .5, 1, 1.5, 2]

In [22]:
# Make Alpha List:
a_list = [.1, .5, 1, 1.5, 2]
# Instantiate Mean Scores:
mean_score_list = []
# Function:
for a_value in a_list:
    lasso = Lasso(alpha=a_value)
    mean_score = cross_val_score(lasso, X_train_sc, y_train).mean()
    mean_score_list.append(mean_score)
    
print(mean_score_list)

[0.7214636179354497, 0.6973803188487484, 0.6808203328971718, 0.6573682671566615, 0.6235261752063069]


In [23]:
list(zip(alpha_list, mean_score_list))

[(0.1, 0.7214636179354497),
 (0.5, 0.6973803188487484),
 (1, 0.6808203328971718),
 (1.5, 0.6573682671566615),
 (2, 0.6235261752063069)]

## GridSearch Syntax

`GridSearch` accepts an sklearn `estimator` object and a parameter grid.

The param grid is a dictionary. The key is the name of the hyperparameter argument in sklearn.  

The value is an iterable to search over (generally a list or a range-style object).

What's an iterable? Something Python can iterate over. 😀

Let's use GridSearch with a Lasso model and different values for alpha.

In [32]:
# set up a param grid with the following:
#     alpha: [.1, .5, 1, 1.5, 2]

In [24]:
params = {'alpha': [.1, .5, 1, 1.5, 2]}

You can also specify the number of folds using `cv`. Default is 5.

In [25]:
# instantiate our gridsearch object
gs = GridSearchCV(Lasso(), param_grid=params, verbose=1)

We use this the same as other models, `fit`ting and `score`ing like normal (but now using the hyperparameters that gave us the best results).

In [28]:
# fit on the training data
gs.fit(X_train_sc, y_train)

Fitting 5 folds for each of 5 candidates, totalling 25 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  25 out of  25 | elapsed:    0.1s finished


GridSearchCV(cv=None, error_score=nan,
             estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=1000, normalize=False, positive=False,
                             precompute=False, random_state=None,
                             selection='cyclic', tol=0.0001, warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid={'alpha': [0.1, 0.5, 1, 1.5, 2]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=1)

In [29]:
# score the training data
gs.score(X_train_sc, y_train)

0.7437187443142204

In [32]:
# score the test data
gs.score(X_test_sc, y_test)

0.6553697776745504

So what are our best parameters?

In [33]:
# look at `.best_params_`
gs.best_params_

{'alpha': 0.1}

Note that we'll use our `best_estimator_` to access the model that was fit with our `best_params_`.

Call `best_estimator_` to see all the values in the best-performing model.

In [34]:
gs.best_estimator_

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

# Exercise

Use GridSearchCV with Ridge and different values of alpha. 

In [55]:
params2 = {'alpha': [.1, .25, .5, 1.5, 2, 2.5, 3, 4]}

In [56]:
gs = GridSearchCV(Ridge(), param_grid=params2, verbose=1)

In [57]:
gs.fit(X_train_sc, y_train)

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  40 out of  40 | elapsed:    0.1s finished


GridSearchCV(cv=None, error_score=nan,
             estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=None, normalize=False, random_state=None,
                             solver='auto', tol=0.001),
             iid='deprecated', n_jobs=None,
             param_grid={'alpha': [0.1, 0.25, 0.5, 1.5, 2, 2.5, 3, 4]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=1)

In [58]:
gs.score(X_train_sc, y_train)

0.7486865243269303

In [59]:
gs.score(X_test_sc, y_test)

0.670725824729486

In [60]:
gs.best_params_

{'alpha': 4}

In [61]:
gs.best_estimator_

Ridge(alpha=4, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='auto', tol=0.001)

How was that?

# Summary

You've seen `GridSearchCV`

## Check for understanding

- Why would you want to use `GridSearchCV`?
    Optimize Hyperparameters
- What do you pass `GridSearchCV`?
    Parameter grid and an estimator
- How do you specify the parameter grid?
    Use a dictionary, pass name of paramaeter, list of values
- Does `GridSearchCV` randomize the data for cross validation?

`GridSearchCV` is an extremely powerful tool for your toolkit! 🛠
