# Model Evaluation & Performance Improvement_3

## 2. Grid Search

### 2.4 Asymmetry Grid Search

If you don't want to calculate all of combination, then conditional grid search can be used. It's not that different with 

previous one, but there is another list in dictionory.

In [1]:
# Data load
import numpy as np, pandas as pd
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)

In [2]:
param_grid = [{'kernel': ['rbf'], 
               'C': [0.001, 0.01, 0.1, 1, 10, 100],
               'gamma': [0.001, 0.01, 0.1, 1, 10, 100]},
              {'kernel': ['linear'],
               'C': [0.001, 0.01, 0.1, 1, 10, 100]}]
print("Grid list:\n{}".format(param_grid))

Grid list:
[{'kernel': ['rbf'], 'C': [0.001, 0.01, 0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}, {'kernel': ['linear'], 'C': [0.001, 0.01, 0.1, 1, 10, 100]}]


In [3]:
grid_search = GridSearchCV(SVC(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
print("Optimized parameters: {}".format(grid_search.best_params_))
print("Best score: {:.2f}".format(grid_search.best_score_))

Optimized parameters: {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'}
Best score: 0.97


In [4]:
results = pd.DataFrame(grid_search.cv_results_)
display(results.T)



Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,32,33,34,35,36,37,38,39,40,41
mean_fit_time,0.00238976,0.00159507,0.00159383,0.00119715,0.00139594,0.00179486,0.00139594,0.00139737,0.00119729,0.00139642,...,0.00099659,0.00099721,0.0011992,0.00139604,0.000797892,0.000200129,0,0.000598431,0.0004004,0.000598288
mean_score_time,0.000599766,0.000598335,0.000398588,0.000598335,0.000409126,0.00059824,0.000199556,0.000799799,0.000399017,0.000398922,...,0,0.000399065,0.000396442,0.000399017,0.000200558,0.000598717,0.000998163,0.000199556,0.000399113,0
mean_test_score,0.366071,0.366071,0.366071,0.366071,0.366071,0.366071,0.366071,0.366071,0.366071,0.366071,...,0.955357,0.946429,0.919643,0.5625,0.366071,0.848214,0.946429,0.973214,0.964286,0.964286
mean_train_score,0.366079,0.366079,0.366079,0.366079,0.366079,0.366079,0.366079,0.366079,0.366079,0.366079,...,0.988788,1,1,1,0.366079,0.855069,0.966538,0.984368,0.988813,0.993258
param_C,0.001,0.001,0.001,0.001,0.001,0.001,0.01,0.01,0.01,0.01,...,100,100,100,100,0.001,0.01,0.1,1,10,100
param_gamma,0.001,0.01,0.1,1,10,100,0.001,0.01,0.1,1,...,0.1,1,10,100,,,,,,
param_kernel,rbf,rbf,rbf,rbf,rbf,rbf,rbf,rbf,rbf,rbf,...,rbf,rbf,rbf,rbf,linear,linear,linear,linear,linear,linear
params,"{'C': 0.001, 'gamma': 0.001, 'kernel': 'rbf'}","{'C': 0.001, 'gamma': 0.01, 'kernel': 'rbf'}","{'C': 0.001, 'gamma': 0.1, 'kernel': 'rbf'}","{'C': 0.001, 'gamma': 1, 'kernel': 'rbf'}","{'C': 0.001, 'gamma': 10, 'kernel': 'rbf'}","{'C': 0.001, 'gamma': 100, 'kernel': 'rbf'}","{'C': 0.01, 'gamma': 0.001, 'kernel': 'rbf'}","{'C': 0.01, 'gamma': 0.01, 'kernel': 'rbf'}","{'C': 0.01, 'gamma': 0.1, 'kernel': 'rbf'}","{'C': 0.01, 'gamma': 1, 'kernel': 'rbf'}",...,"{'C': 100, 'gamma': 0.1, 'kernel': 'rbf'}","{'C': 100, 'gamma': 1, 'kernel': 'rbf'}","{'C': 100, 'gamma': 10, 'kernel': 'rbf'}","{'C': 100, 'gamma': 100, 'kernel': 'rbf'}","{'C': 0.001, 'kernel': 'linear'}","{'C': 0.01, 'kernel': 'linear'}","{'C': 0.1, 'kernel': 'linear'}","{'C': 1, 'kernel': 'linear'}","{'C': 10, 'kernel': 'linear'}","{'C': 100, 'kernel': 'linear'}"
rank_test_score,27,27,27,27,27,27,27,27,27,27,...,9,11,17,24,27,21,11,1,3,3
split0_test_score,0.375,0.375,0.375,0.375,0.375,0.375,0.375,0.375,0.375,0.375,...,0.958333,0.916667,0.875,0.541667,0.375,0.916667,0.958333,1,0.958333,0.958333


#### Cross-Validation Options

Like cross_val_score uses cv as a controler of cross-validation option, GridSearchCV is also able to control this.

If you want to split one time, then use n_splits=1 and ShuffleSplit or StratifiedShuffleSplit. It is useful to large dataset.

#### Cross-Validation and Parallelized Grid Search

If there are too many parameters and dataset is large, grid search cause overload on operating, but can be parallelized easily.

In GridSearchCV and cross_val_score, change value of n_jobs to parallelize. If n_jobs=-1, It uses all available core.

But, it can be overlapped. For instance, If you use n_jobs option in algorithm, then you are not able to use n_jobs in GridSearchCV.


