## GridSearch Walkthrough

What is **Grid Search**?  

Grid search is used for hyperparameter tuning. It helps us determine the optimal values for a given model.  
This is important as the performance of the entire model is based on the specified parameter and hyperparameter values.

In [8]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn import datasets, svm
import matplotlib.pyplot as plt

Let's follow the steps in the Walkthrough
  
**Create Two Toy Datasets**  
In the code below, we load the [digits dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_digits_last_image.html) , which contains **64 feature** variables. Each feature denotes the darkness of a pixel in an 8 by 8 image of a handwritten digit. We can see these features for the first observation.

In [9]:
# Load the digit data
digits = datasets.load_digits()

In [10]:
# View the features of the first observation
digits.data[0:1]

array([[ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.,  0.,  0., 13., 15., 10.,
        15.,  5.,  0.,  0.,  3., 15.,  2.,  0., 11.,  8.,  0.,  0.,  4.,
        12.,  0.,  0.,  8.,  8.,  0.,  0.,  5.,  8.,  0.,  0.,  9.,  8.,
         0.,  0.,  4., 11.,  0.,  1., 12.,  7.,  0.,  0.,  2., 14.,  5.,
        10., 12.,  0.,  0.,  0.,  0.,  6., 13., 10.,  0.,  0.,  0.]])

The **target** for this dataset is a vector containing the image’s true digit. For example, *the first observation* is a handwritten digit for ‘0’.

In [11]:
# View the target of the first observation
digits.target[0:1]

array([0])

In order to demonstrate how **Cross Validation** and **parameter tuning** work, we need to  divide the digit data into two datasets called `data1` and `data2`. `data1` contains the first 1000 rows of the digits data, while `data2` contains the remaining ~800 rows. 

**Note that** this split is separate to the cross validation we will conduct and is done purely to demonstrate something at the end of the tutorial. In other words, don’t worry about `data2` for now, we will come back to it.

In [12]:
# Create dataset 1
data1_features = digits.data[:1000]
data1_target = digits.target[:1000]

# Create dataset 2
data2_features = digits.data[1000:]
data2_target = digits.target[1000:]

**Create Parameter Candidates**   

Before looking which combination of parameter values produces the **most accurate** model, we must specify the *different candidate values* we want to try.   


In the code below we have a number of candidate parameter values, including four different values for C (1, 10, 100, 1000), two values for gamma (0.001, 0.0001), and two kernels (linear, rbf). The grid search will try all combinations of parameter values and select the set of parameters which provides the most accurate model.

*This might take a while if you have a lot of candidates*

In [13]:
parameter_candidates = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
]

Now that you have your `parameter grid`, you can conduct **Grid Search** to help you find **parameters** that produce the highest score.  
  

We can use **scikit-learn’s GridSearchCV** (Grid search Cross Validation). GridSearchCV’s Cross Validation uses 3-KFold.

In [14]:
# Create a classifier object with the classifier and parameter candidates
clf = GridSearchCV(estimator=svm.SVC(), param_grid=parameter_candidates, n_jobs=-1)

# n_jobs=Number of jobs to run in parallel. 
# None means 1
# -1 means using all processors



# Train the classifier on data1's feature and target data
clf.fit(data1_features, data1_target)   

GridSearchCV(estimator=SVC(), n_jobs=-1,
             param_grid=[{'C': [1, 10, 100, 1000], 'kernel': ['linear']},
                         {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001],
                          'kernel': ['rbf']}])

Success! We have our results! First, let’s look at the accuracy score when we apply the model to the data1’s test data.

In [15]:
# View the accuracy score
print('Best score for data1:', clf.best_score_) 

Best score for data1: 0.966


Which parameters are the best? We can tell scikit-learn to display them:



In [16]:
# View the best parameters for the model found using grid search
print('Best C:',clf.best_estimator_.C) 
print('Best Kernel:',clf.best_estimator_.kernel)
print('Best Gamma:',clf.best_estimator_.gamma)

Best C: 10
Best Kernel: rbf
Best Gamma: 0.001


**Sanity Check Using Second Dataset**  

Remember the `second dataset` we created?  

Now we will use it to prove that those parameters are actually used by the model. 

1. First, let's apply the classifier we just trained to the second dataset. 

2. Then we will train a new SVM classifier using the best parameters found in the `Grid Search`. 

We should get the same results for both models.

In [18]:
# Apply the classifier trained using data1 to data2, and view the accuracy score
clf.score(data2_features, data2_target)  

0.9698870765370138

In [19]:
# Train a new classifier using the best parameters found by the grid search
svm.SVC(C=10, kernel='rbf', gamma=0.001).fit(data1_features, data1_target).score(data2_features, data2_target)

0.9698870765370138