# Introduction
Hyperparameters tuning requires nested cross validation. The idea is that during hyperparameters tuning using GridSearchCV, the data used to find the optimal hyperparameters cannot be used to estimate the performance of them, a separate hold out set is necessary to assess their performance in an unbiased manner. 

In [Scikit-Learn's nested cross validation](http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html) example, the code showed how to nest a GridSearchCV into a cross_val_score for easy nested cross validation performance estimation. However, one drawback in using this methodology is that you won't see the "winning" hyperparameters in each of the inner cross validation loop. 

This notebook shows "side-by-side" the SKLearn method and a longer method which exposes the "winning" hyper parameters.

## Setting up the imports

In [2]:
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

from sklearn.model_selection import GridSearchCV,KFold,cross_val_predict,cross_val_score,StratifiedKFold

from sklearn.svm import SVC
from collections import Counter
from sklearn.metrics import classification_report,accuracy_score

## Loading data set, we will be using the Iris data set

In [3]:
iris = datasets.load_iris()
features = iris.data
target = iris.target

## Setting up the inner and outer cross validation loops to be used in the SKLearn example and the "long" way example

In [4]:
# Following kf is the outer loop
outer_kf = StratifiedKFold(n_splits=8,shuffle=True,random_state=1)
inner_kf = StratifiedKFold(n_splits=3,shuffle=True,random_state=2)
model = SVC()
params = {'kernel':['rbf','linear'],'C':[1,10]}
outer_loop_accuracy_scores = []
inner_loop_won_params = []

## Scikit-Learn example from the link in the Introduction. In this method, you won't be able to see the winning hyperparameters in each of the inner GridSearchCV loops

In [5]:
clf = GridSearchCV(estimator=model,param_grid=params,cv=inner_kf)
clf.fit(features,target)
print 'Non nested best score:',clf.best_score_

nested_score = cross_val_score(clf,features,target,cv=outer_kf)
print 'Nested scores:',nested_score
print 'Nested score mean:',nested_score.mean()

Non nested best score: 0.98
Nested scores: [ 0.95238095  0.95238095  1.          0.94444444  0.94444444  1.          1.
  0.94444444]
Nested score mean: 0.967261904762


## Long way of doing nested loops, but you can see the winning hyperparameters for each inner GridSearchCV loops

In [6]:
# Looping through the outer loop, feeding each training set into a GSCV as the inner loop
for train_index,test_index in outer_kf.split(features,target):
    
    GSCV = GridSearchCV(estimator=model,param_grid=params,cv=inner_kf)
    
    # GSCV is looping through the training data to find the best parameters. This is the inner loop
    GSCV.fit(features[train_index],target[train_index])
    
    # The best hyper parameters from GSCV is now being tested on the unseen outer loop test data.
    pred = GSCV.predict(features[test_index])
    
    # Appending the "winning" hyper parameters and their associated accuracy score
    inner_loop_won_params.append(GSCV.best_params_)
    outer_loop_accuracy_scores.append(accuracy_score(target[test_index],pred))

for i in zip(inner_loop_won_params,outer_loop_accuracy_scores):
    print i

print 'Mean of outer loop accuracy score:',np.mean(outer_loop_accuracy_scores)

({'kernel': 'rbf', 'C': 1}, 0.95238095238095233)
({'kernel': 'rbf', 'C': 1}, 0.95238095238095233)
({'kernel': 'linear', 'C': 1}, 1.0)
({'kernel': 'linear', 'C': 1}, 0.94444444444444442)
({'kernel': 'linear', 'C': 1}, 0.94444444444444442)
({'kernel': 'linear', 'C': 1}, 1.0)
({'kernel': 'rbf', 'C': 1}, 1.0)
({'kernel': 'rbf', 'C': 1}, 0.94444444444444442)
Mean of outer loop accuracy score: 0.967261904762


As you can see, the mean outer loop accuracy score in the longer method is identical to the nested score mean in the SKLearn method. The nested scores in both methods are also identical. However, in the longer method, you can see the winning hyperparameters.