STAT 479: Machine Learning (Fall 2019)  
Instructor: Sebastian Raschka (sraschka@wisc.edu)  

Course website: http://pages.stat.wisc.edu/~sraschka/teaching/stat479-fs2019/

# L11: Model Evaluation 4 -- Algorithm Comparison (Nested Cross-Validation)


## verbose version 1 (using `StratifiedKFold` directly)

This notebook illustrates how to implement nested cross-validation in scikit-learn.

![./nested-cv-image.png](nested-cv-image.png)


In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -d -p sklearn,mlxtend -v

Sebastian Raschka 2019-11-06 

CPython 3.7.3
IPython 7.9.0

sklearn 0.21.2
mlxtend 0.18.0.dev0


- Setting up classifiers (or pipelines) and the parameter grids for model tuning
- Remember, the hyperparameter tuning takes place in the inner loop

In [2]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from mlxtend.data import mnist_data
from sklearn.metrics import accuracy_score

# Loading and splitting the dataset
# Note that this is a small (stratified) subset
# of MNIST; it consists of 5000 samples only, that is,
# 10% of the original MNIST dataset
# http://yann.lecun.com/exdb/mnist/
X, y = mnist_data()
X = X.astype(np.float32)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=1,
                                                    stratify=y)

# Initializing Classifiers
clf1 = LogisticRegression(multi_class='multinomial',
                          solver='newton-cg',
                          random_state=1)
clf2 = KNeighborsClassifier(algorithm='ball_tree',
                            leaf_size=50)
clf3 = DecisionTreeClassifier(random_state=1)
clf4 = SVC(random_state=1)
clf5 = RandomForestClassifier(random_state=1)

# Building the pipelines
pipe1 = Pipeline([('std', StandardScaler()),
                  ('clf1', clf1)])

pipe2 = Pipeline([('std', StandardScaler()),
                  ('clf2', clf2)])

pipe4 = Pipeline([('std', StandardScaler()),
                  ('clf4', clf4)])


# Setting up the parameter grids
param_grid1 = [{'clf1__penalty': ['l2'],
                'clf1__C': np.power(10., np.arange(-4, 4))}]

param_grid2 = [{'clf2__n_neighbors': list(range(1, 10)),
                'clf2__p': [1, 2]}]

param_grid3 = [{'max_depth': list(range(1, 10)) + [None],
                'criterion': ['gini', 'entropy']}]

param_grid4 = [{'clf4__kernel': ['rbf'],
                'clf4__C': np.power(10., np.arange(-4, 4)),
                'clf4__gamma': np.power(10., np.arange(-5, 0))},
               {'clf4__kernel': ['linear'],
                'clf4__C': np.power(10., np.arange(-4, 4))}]

param_grid5 = [{'n_estimators': [10, 100, 500, 1000, 10000]}]

In [3]:
# Setting up multiple GridSearchCV objects, 1 for each algorithm
gridcvs = {}
inner_cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=1)

for pgrid, est, name in zip((param_grid1, param_grid2,
                             param_grid3, param_grid4, param_grid5),
                            (pipe1, pipe2, clf3, pipe4, clf5),
                            ('Softmax', 'KNN', 'DTree', 'SVM', 'RForest')):
    gcv = GridSearchCV(estimator=est,
                       param_grid=pgrid,
                       scoring='accuracy',
                       n_jobs=-1,
                       cv=inner_cv,
                       verbose=0,
                       refit=True)
    gridcvs[name] = gcv

- Next, we define the outer loop
- The training folds from the outer loop will be used in the inner loop for model tuning
- The inner loop selects the best hyperparameter setting
- This best hyperparameter setting can be evaluated on both the avg. over the inner test folds and the 1 corresponding test fold of the outer loop

In [4]:
for name, gs_est in sorted(gridcvs.items()):

    print(50 * '-', '\n')
    print('Algorithm:', name)
    print('    Inner loop:')
    
    outer_scores = []
    outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)
    
    
    for train_idx, valid_idx in outer_cv.split(X_train, y_train):
        
        gridcvs[name].fit(X_train[train_idx], y_train[train_idx])
        print('\n        Best ACC (avg. of inner test folds) %.2f%%' % (gridcvs[name].best_score_ * 100))
        print('        Best parameters:', gridcvs[name].best_params_)
        
        outer_scores.append(gridcvs[name].best_estimator_.score(X_train[valid_idx], y_train[valid_idx]))
        print('        ACC (on outer test fold) %.2f%%' % (outer_scores[-1]*100))
    
    print('\n    Outer Loop:')
    print('        ACC %.2f%% +/- %.2f' % 
              (np.mean(outer_scores) * 100, np.std(outer_scores) * 100))

-------------------------------------------------- 

Algorithm: DTree
    Inner loop:

        Best ACC (avg. of inner test folds) 72.28%
        Best parameters: {'criterion': 'entropy', 'max_depth': 8}
        ACC (on outer test fold) 81.25%

        Best ACC (avg. of inner test folds) 75.34%
        Best parameters: {'criterion': 'gini', 'max_depth': 9}
        ACC (on outer test fold) 76.12%

        Best ACC (avg. of inner test folds) 72.72%
        Best parameters: {'criterion': 'entropy', 'max_depth': 7}
        ACC (on outer test fold) 78.88%

        Best ACC (avg. of inner test folds) 74.16%
        Best parameters: {'criterion': 'entropy', 'max_depth': 7}
        ACC (on outer test fold) 73.12%

        Best ACC (avg. of inner test folds) 74.25%
        Best parameters: {'criterion': 'entropy', 'max_depth': None}
        ACC (on outer test fold) 77.25%

    Outer Loop:
        ACC 77.33% +/- 2.72
-------------------------------------------------- 

Algorithm: KNN
    Inner l




        Best ACC (avg. of inner test folds) 88.00%
        Best parameters: {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
        ACC (on outer test fold) 91.88%





        Best ACC (avg. of inner test folds) 88.25%
        Best parameters: {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
        ACC (on outer test fold) 90.62%





        Best ACC (avg. of inner test folds) 89.09%
        Best parameters: {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
        ACC (on outer test fold) 90.38%





        Best ACC (avg. of inner test folds) 89.72%
        Best parameters: {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
        ACC (on outer test fold) 88.12%

        Best ACC (avg. of inner test folds) 88.22%
        Best parameters: {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
        ACC (on outer test fold) 90.62%

    Outer Loop:
        ACC 90.32% +/- 1.22




------

- Determine the best algorithm from the experiment above; e.g., we find that Random Forest is performing best
- Now, select a hyperparameters for the model based on regular k-fold on the whole training set

In [10]:
gcv_model_select = GridSearchCV(estimator=clf5,
                                param_grid=param_grid5,
                                scoring='accuracy',
                                n_jobs=-1,
                                cv=inner_cv,
                                verbose=1,
                                refit=True)

#gcv_model_select.fit(X_train, y_train)
print('Best CV accuracy: %.2f%%' % (gcv_model_select.best_score_*100))
print('Best parameters:', gcv_model_select.best_params_)

Best CV accuracy: 93.42%
Best parameters: {'n_estimators': 1000}


- Using these settings, we can now train the best model to the whole training set

In [11]:
## We can skip the next step because we set refit=True
## so scikit-learn has already fit the model to the
## whole training set

# gcv_model_select.fit(X_train, y_train)

train_acc = accuracy_score(y_true=y_train, y_pred=gcv_model_select.predict(X_train))
test_acc = accuracy_score(y_true=y_test, y_pred=gcv_model_select.predict(X_test))

print('Training Accuracy: %.2f%%' % (100 * train_acc))
print('Test Accuracy: %.2f%%' % (100 * test_acc))

Training Accuracy: 100.00%
Test Accuracy: 93.90%


For comparison, previously, we have seen that using this algorithm, that the avg. outer fold accuracy was 

    ACC 93.90% +/- 0.70
    
and the accuracy corresponding to these settings we found as best via the inner folds was


    Best ACC (avg. of inner test folds) 93.31%