<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px"> 

# Grid Search

_By Jeff Hale, Kiefer Katovich, David Yerrington, Matt Brems, Noelle Brown_

---

## Learning Objectives
By the end of this lesson students will be able to:

- Understand what grid searching is
- Use `GridSearchCV` class from sklearn to find optimal hyperparameters
- Differentiate `cross_val_score` from `GridSearchCV`

---

## GridSearch CV
GridSearchCV is a nifty sklearn class. 😀 

It performs cross validation and searches over a bunch of parameters.

It replaces the slow, verbose way of cross validation using a `for` loop with `cross_val_score`. 

Using `GridSearchCV` is generally the best way to optimize hyperparameters.

## Hyperparameters vs parameters.

- __Definition 1 of `parameters`__: a function "defines a parameter, and the calling code passes an argument to that parameter. You can think of the parameter as a parking space and the argument as an automobile." - Qutoed from MSDN in [this SO question](https://stackoverflow.com/q/1788923/4590385).

When you pass them to a function they are called `arguments`. The terms _argument_ and _parameter_ are often used interchangeably.

- __Definition 2 of `parameters`__: the weights in a model. For example, the $ \beta $ values in a linear regression equation. These are the model's parameter.

- `hyperparameters` are the arguments YOU CHOOSE to pass to a transformer or estimator. You tune these to improve model performance. For example, the most important hyperparameter for a scikit-learn Ridge regression model is `alpha`. 


### Just remember: YOU choose the hyperparameters.

---
## GridSearchCV

#### Imports

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# import sklearn classes and functions
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

#### Read in the data

We'll use the diamonds dataset. We want to predict `price`

In [4]:
diamonds = pd.read_csv('./data/diamonds.csv')
diamonds.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


#### Inspect 

In [5]:
diamonds.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53940 entries, 0 to 53939
Data columns (total 10 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   carat    53940 non-null  float64
 1   cut      53940 non-null  object 
 2   color    53940 non-null  object 
 3   clarity  53940 non-null  object 
 4   depth    53940 non-null  float64
 5   table    53940 non-null  float64
 6   price    53940 non-null  int64  
 7   x        53940 non-null  float64
 8   y        53940 non-null  float64
 9   z        53940 non-null  float64
dtypes: float64(6), int64(1), object(3)
memory usage: 4.1+ MB


`price` is what we want to predict.

#### Break into X and y

Use `carat` to predict `price`

In [6]:
X = diamonds[['carat']]
y = diamonds['price']

### Create holdout/test set and training/validation set with `train_test_split`

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)

In [9]:
knn = KNeighborsRegressor()

In [12]:
### The Old Slow Way
# for i in range(1,10):
#     knn = KNeighborsRegressor(n_neighbors=i).fit(X_train, y_train)
#     print(knn.score(X_test, y_test))

In [46]:
params = {'n_neighbors': range(1, 10),
         'p': [2, 3]}

In [47]:
import warnings

In [48]:
warnings.filterwarnings('ignore')

In [49]:
knn = KNeighborsRegressor()

In [50]:
grid = GridSearchCV(knn, param_grid=params, cv = 5, scoring = 'neg_mean_squared_error')

In [51]:
grid.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=KNeighborsRegressor(),
             param_grid={'n_neighbors': range(1, 10), 'p': [2, 3]},
             scoring='neg_mean_squared_error')

In [52]:
grid.best_params_

{'n_neighbors': 9, 'p': 2}

In [53]:
grid.best_score_

-2233326.3023847556

 Does this step shuffle the rows?

## Standardize and Scale 

- Instantiate
- Fit and transform X_train
- Transform X_test

#### Say you wanted to make a Lasso model and you want to search for a good value for the hyperparameter `alpha`. How would you do that with `cross_val_score`?

---
## GridSearchCV 🚀

- GridSearchCV performs cross validation for multiple models with the data you fit it with. 
- It saves the best performing model and refits it on all the data you pass it.
- You treat it like an estimator.

`GridSearchCV` accepts a scikit-learn `estimator` object and a **parameter grid**.

- The param grid is a dictionary. 
- The key is the name of the hyperparameter argument in scikit-learn.  
- The value is an iterable to search over (generally a list or a range-style object).

#### Q: What's an iterable?

#### Let's use `GridSearchCV` with a Lasso model and different values for alpha.

Note: You could get the same results with LassoCV, but GridSearchCV can be nicely combined with many algorithms and Pipelines, so I suggest sticking with GridSearchCV. 

#### Set up a parameter grid with several values for alpha

#### Instantiate a GridSearchCV object by passing it an estimator and a param_grid.

### We use this GridSearch object like it's an estimator, fitting, predicting and scoreing it like normal. 🙂

#### Fit it on the training data

#### Score on the training data

#### See all the results of training

#### What were the best params?

#### Make predictions for the test set

#### Score with the MAE, MSE, and RMSE on test set

#### Score the best model on the test data with the default scoring metric

---
# Exercise

With the same X and y, use GridSearchCV with Ridge and several values of alpha. To try to speed things up by using more of your computer's processor cores pass `n_jobs=-1`. 

---
# Summary

You've seen `GridSearchCV` in action. 🚀

It helps you find good hyperparameters for your models. 😎

#### When would you not use GridSearchCV? 

When it takes too long to fit. RandomizedSearchCV and other scikit-learn variants can serve you better in those cases.

## Check for understanding

- Why would you want to use `GridSearchCV`?
- What do you pass `GridSearchCV`?
- How do you specify the parameter grid?
- How do you get the results of fitting the models?

## Challenge questions
- Does `GridSearchCV` randomize the data for cross validation? 
- How do you parallelize the grid search so that multiple models are fit simultaneously on your processor cores? 

`GridSearchCV` is an extremely powerful tool for your toolkit! 🛠
