# Hyperparameter Tuning

We use **cross-validation** to tune the hyperparameters of a model.
Some of the common techniques are:
- Grid Search
- Shuffle Split
- Stratified Shuffle Split
- Group K Fold

## Grid Search

GridSearchCV does an exhaustive search over specified parameter values for an estimator. The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = load_iris().data
y = load_iris().target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

param_grid = {'n_neighbors': np.arange(1, 10, 2)}
grid = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5 , return_train_score=True)
grid.fit(X_train, y_train)

print(grid.best_params_)
print(grid.best_score_)
print(grid.score(X_test, y_test))

{'n_neighbors': 3}
0.9583333333333334
1.0


In [5]:
grid

In [7]:
import pandas as pd
pd.DataFrame(grid.cv_results_)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_n_neighbors,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,...,mean_test_score,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,split3_train_score,split4_train_score,mean_train_score,std_train_score
0,0.00037,0.000321,0.00086,0.000655,1,{'n_neighbors': 1},0.958333,0.958333,0.875,1.0,...,0.95,0.040825,2,1.0,1.0,1.0,1.0,1.0,1.0,0.0
1,0.00025,0.000132,0.001353,0.001657,3,{'n_neighbors': 3},0.958333,1.0,0.875,1.0,...,0.958333,0.045644,1,0.958333,0.947917,0.989583,0.9375,0.958333,0.958333,0.01743
2,0.000205,5.5e-05,0.000534,6.8e-05,5,{'n_neighbors': 5},0.958333,0.958333,0.833333,1.0,...,0.941667,0.056519,3,0.958333,0.958333,0.989583,0.958333,0.96875,0.966667,0.012148
3,0.000177,7e-06,0.000476,1e-05,7,{'n_neighbors': 7},0.958333,0.958333,0.833333,1.0,...,0.941667,0.056519,3,0.958333,0.958333,0.989583,0.947917,0.958333,0.9625,0.01413
4,0.000151,2.7e-05,0.000433,3.6e-05,9,{'n_neighbors': 9},0.958333,0.916667,0.833333,1.0,...,0.933333,0.056519,5,0.947917,0.958333,0.989583,0.947917,0.96875,0.9625,0.01559


## KFold, ShuffleSplit, StratifiedShuffleSplit, GroupKFold

In [9]:
from sklearn.model_selection import KFold, StratifiedKFold, ShuffleSplit, RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_score
kfold = KFold(n_splits=5)
skfold = StratifiedKFold(n_splits=5)
ss = ShuffleSplit(n_splits=5, test_size=0.2)
rskfold = RepeatedStratifiedKFold(n_splits=5, n_repeats=2)

print("KFold:\n", cross_val_score(KNeighborsClassifier(), X, y, cv=kfold))
print("StratifiedKFold:\n", cross_val_score(KNeighborsClassifier(), X, y, cv=skfold))
print("ShuffleSplit:\n", cross_val_score(KNeighborsClassifier(), X, y, cv=ss))
print("RepeatedStratifiedKFold:\n", cross_val_score(KNeighborsClassifier(), X, y, cv=rskfold))


KFold:
 [1.         1.         0.83333333 0.93333333 0.8       ]
StratifiedKFold:
 [0.96666667 1.         0.93333333 0.96666667 1.        ]
ShuffleSplit:
 [0.96666667 1.         0.96666667 0.96666667 0.93333333]
RepeatedStratifiedKFold:
 [0.96666667 0.93333333 0.96666667 1.         0.96666667 1.
 0.96666667 0.96666667 0.93333333 0.93333333]
