# Model Selection. Evaluating Performance After Model Selection

You want to evaluate the performance of a model found through model selection.

### Nested Cross Validation 

Avoid biased evaluation.

In K-Fold cross-validation we train our model on k-1 folds of the data and use this model's predictions compare to the true values on the remaining fold. We then repeat this process k times.

Problem == Previously we used cross validation to evaluate which hyperparameter values produced the best models. However, a nuunaced and generally underappreciated problem arises. Since we used the data to select the best hyperparameter values we cannot use that same data to evaluate the model's performance.

SOLUTION == Wrap the cross validation used for model seach in another cross validation.

In nested cross validation the inner cross validation selects the best model while the ourter cross validation provides us with an unbiased evaluation of the model's performance.

In our example the inner cross validation is oir GridSearchCV object which we them wrap in an outer cross validation using cross_val_score

In [18]:
# Load libraries 

import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV, cross_val_score

In [19]:
# Load Data

iris = datasets.load_iris()

# Create Features matrix and Target vector

features = iris.data
target = iris.target

In [20]:
# Create Logistic Regression

logistic = linear_model.LogisticRegression(max_iter = 1000, solver = "liblinear")

In [21]:
# Create range of 20 candidates values for C

C = np.logspace(0,4,20)

In [22]:
# Create hyperparameters options

hyperparameters = dict(C=C)

In [23]:
# Create grid search

gridsearch = GridSearchCV(logistic, hyperparameters, cv = 5, n_jobs=-1, verbose = 1)

In [24]:
# Conduct nested cross validation and output the average score

cross_val_score(gridsearch, features, target).mean()

Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  68 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.9s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.2s finished


0.9733333333333334

In [27]:
best_model = gridsearch.fit(features,target)
best_model

Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.2s finished


GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=1000, multi_class='auto',
                                          n_jobs=None, penalty='l2',
                                          random_state=None, solver='liblinear',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=-1,
             param_grid={'C': array([1.00000000e+00,...5090e+00, 4.28133240e+00,
       6.95192796e+00, 1.12883789e+01, 1.83298071e+01, 2.97635144e+01,
       4.83293024e+01, 7.84759970e+01, 1.27427499e+02, 2.06913808e+02,
       3.35981829e+02, 5.45559478e+02, 8.85866790e+02, 1.43844989e+03,
       2.33572147e+03, 3.79269019e+03, 6.15848211e+03, 

In [28]:
scores = cross_val_score(gridsearch, features, target)
scores

Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.6s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    0.3s finished


array([1.        , 1.        , 0.93333333, 0.93333333, 1.        ])

The inner cross validation trained 20 models * 5 times to find the best model, and this model was evaluated using an outer three fold cross validating creating a total of 300 models trained 