# Grid Search 

Grid Searching is the process of testing different parameter values for a model and selecting the ones that produce the best results. 

In this exercice, you will:
- Load data
- Make a parameter dictionary
- Initiate a GridSearch algorithm
- GridSearch your data
- Print the results

## Load Data 

Sklearn has a number of easy to use datasets [[doc]](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets).
- Load the "iris" dataset
- Extract the two first features of the dataset as "X"
- Extract the target of the dataset as y

In [1]:
from sklearn import datasets

iris = datasets.load_iris() # Load dataset as "iris"

X = iris.data[:, :2] # Keep only 2 first features, Sepal Lenght and Sepal Width

y = iris.target # Load targets

## Parameter Dictionary

The parameter dictionary defines which values of the parameter will be tested during the Grid Search.

Remember parameter 'C', the error function of an SVM? Make a dictionary to test for 'C': [0.1, 1, 10]

In [2]:
param_dic = [{'C': [0.1, 1, 10]}]

## Initiate and Fit Grid Search

Sklearn's `GridSearchCV` [[doc]](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) trains multiple models for each parameter value, cross validates the results, and stores the best parameters. 

It takes as arguments the Machine Learning algorithm, the parameter dictionary, the number of cross validations to perform, and the scoring metrics.

Initiate a gridsearch with the following arguments:
- A default classification SVM
- The above created parameter dictionary
- 10-Fold Cross Validation
- "accuracy" scoring metric

Then, fit to data.

In [3]:
from sklearn.model_selection import GridSearchCV
from sklearn import svm

gridsearch = GridSearchCV(svm.SVC(gamma='auto'), param_dic, cv=10, scoring='accuracy')

gridsearch.fit(X, y)

GridSearchCV(cv=10, error_score='raise-deprecating',
             estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='auto', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='warn', n_jobs=None, param_grid=[{'C': [0.1, 1, 10]}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='accuracy', verbose=0)

## Print Results 

The results of the gridsearch can be unpacked.

Unpack and print:
- The best parameter value
- The best classification accuracy

In [4]:
print(gridsearch.best_params_)

print(gridsearch.best_score_)

{'C': 0.1}
0.8133333333333334


Below, unpack classification accuracy and standard deviation for each tested value of 'C'. They are stored in `cv_results_`

In [5]:
mean = gridsearch.cv_results_['mean_test_score']
std = gridsearch.cv_results_['std_test_score']

for mean, std, params in zip(mean, std, gridsearch.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r"
          % (mean, std * 2, params))

0.813 (+/-0.196) for {'C': 0.1}
0.813 (+/-0.155) for {'C': 1}
0.800 (+/-0.158) for {'C': 10}


Parameter tuning is a key step of model building. Each Machine Learning algorithm has specific parameters that affect its performance. In the next section, you will explore SVM parameters Kernel and Gamma.