# Grid Search

_By Jeff Hale_

---

## Learning Objectives
By the end of this lesson students will be able to:

- Understand what grid searching is
- Use `GridSearchCV` class from sklearn to find optimal hyperparameters
- Differentiate `cross_val_score` from `GridSearchCV`

---

## GridSearch CV
GridSearchCV is nifty sklearn class. 😀 

It performs cross validation, shuffles the data by default, and searches over a bunch of parameters.

It replaces the old way of using a `for` loop with `cross_val_score`. 

Using GridSearchCV is the best way to optimize hyperparameters.

## Hyperparameters vs parameters.

Definition 1 of `parameters`: the things a function accepts. When you pass them to a function they are called `arguments`. This distinction is commonly misused.

Definition 2 of `parameters`: the weights in a model. For example, the $ \beta $ values in a linear regression equation.

`hyperparameters` are the arguments you choose for a model that can have different values. You tune these to improve model performance. For example the most important hyperparameter for a KNN model is `n_neighbors` (the number of nearest neighbors to include in the model). 


### Just remember: YOU choose the hyperparameters.

In [None]:
# imports

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge

In [None]:
# read in the data
boston = pd.read_csv('../data/boston_data.csv')

In [None]:
# inspect 
boston.head()

In [None]:
# break into X and y
X = boston.drop('MEDV', axis=1)
y = boston['MEDV']

In [None]:
y.head()

In [None]:
# train test split


## GridSearch Syntax

`GridSearch` accepts an sklearn `estimator` object and a parameter grid.

The param grid is a dictionary. The key is the name of the hyperparameter argument in sklearn.  

The value is an iterable to search over (generally a list or a range-style object).

What's an iterable? Something Python can iterate over. 😀

Let's use GridSearch with a Lasso model and different values for alpha.

In [None]:
# set up a param grid with the following:
#     alpha: [.1, .5, 1, 1.5, 2]

You can also specify the number of folds using `cv`. Default is 5.

In [None]:
# instantiate our gridsearch object


We use this the same as other models, `fit`ting and `score`ing like normal (but now using the hyperparameters that gave us the best results).

In [None]:
# fit on the training data


In [None]:
# score the training data


In [None]:
# score the test data


So what are our best parameters?

In [None]:
# look at `.best_params_`


Note that we'll use our `best_estimator_` to access the model that was fit with our `best_params_`.

Call `best_estimator_` to see all the values in the best-performing model.

This is a simple scatter plot to compare our true values to our predictions to visualize our errors.

In [None]:
plt.scatter(y_test, gs.predict(X_test))
plt.ylabel('Predicted')
plt.xlabel('True')
plt.plot([0, 50], [0, 50], color='r')

# Summary

You've seen `GridSearchCV`

## Check for understanding

- Why would you want to use `GridSearchCV`?
- What do you pass `GridSearchCV`?
- How do you specify the parameter grid?
- Does `GridSearchCV` randomize the data for cross validation?

`GridSearchCV` is an extremely powerful tool for your toolkit! 🛠
