<a href="https://colab.research.google.com/github/ishandahal/ml_model_evaluation/blob/main/Model_Eval_Algorithm_Comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- Implementing nested cross-validation in scikit-learn.

In [None]:
pip install watermark

In [3]:
%load_ext watermark
%watermark -a 'ishan dahal' -d -p sklearn,mlxtend -v

ishan dahal 2020-11-26 

CPython 3.6.9
IPython 5.5.0

sklearn 0.0
mlxtend 0.14.0


- Setting up classifiers and the parameters for model tuning
- Hyperparameter tuning takes place in the inner loop

In [7]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from mlxtend.data import mnist_data
from sklearn.metrics import accuracy_score


X, y = mnist_data()
X = X.astype(np.float32)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=1,
                                                    stratify=y)
# Initalizing Classifiers
clf1 = LogisticRegression(multi_class='multinomial',
                          solver='newton-cg',
                          random_state=1)
clf2 = KNeighborsClassifier(algorithm='ball_tree',
                            leaf_size=50)
clf3 = DecisionTreeClassifier(random_state=1)
clf4 = SVC(random_state=1)
clf5 = RandomForestClassifier(random_state=1)

# Building the pipelines
pipe1 = Pipeline([('std', StandardScaler()),
                  ('clf1', clf1)])

pipe2 = Pipeline([('std', StandardScaler()),
                  ('clf2', clf2)])

pipe4 = Pipeline([('std', StandardScaler()),
                  ('clf4', clf4)])

# Setting up the parameter grids
param_grid1 = [{'clf1__penalty': ['l2'],
                'clf1__C': np.power(10., np.arange(-4, 4))}]

param_grid2 = [{'clf2__n_neighbors': list(range(1, 10)),
                'clf2__p': [1, 2]}]

param_grid3 = [{'max_depth': list(range(1, 10)) + [None],
                'criterion': ['gini', 'entropy']}]

param_grid4 = [{'clf4__kernel': ['rbf'],
                'clf4__C': np.power(10., np.arange(-4, 4)),
                'clf4__gamma': np.power(10., np.arange(-5, 0))},
               {'clf4__kernel': ['linear'],
                'clf4__C': np.power(10., np.arange(-4, 4))}]

param_grid5 = [{'n_estimators': [10, 100, 500, 1000, 10000]}]

In [8]:
## Setting up multiple GridSearchCV objects, 1 for each algorithm
gridcvs = {}
inner_cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=1)

for pgrid, est, name in zip((param_grid1, param_grid2, param_grid3, param_grid4,
                             param_grid5), (pipe1, pipe2, clf3, pipe4, clf5),
                            ('Softmax', 'KNN', 'DTree', 'SVM', 'RForest')):
    gcv = GridSearchCV(estimator=est,
                       param_grid=pgrid,
                       scoring='accuracy',
                       n_jobs=-1,
                       cv=inner_cv,
                       verbose=0,
                       refit=True)
    gridcvs[name] = gcv

- Next, we define the outer loop
- The training folds from the outer loop will be used in the inner loop for model tuning
- The inner loop selects the best hyperparameter setting
- This best hyperparameter setting can be evaluated on both the avg. over the inner test folds and the 1 corresponding test fold of the outer loop

In [9]:
for name, gs_est in sorted(gridcvs.items()):

    print(50 * '-', '\n')
    print('Algorithm: ', name)
    print('    Inner loop:')

    outer_scores = []
    outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)

    for train_idx, valid_idx in outer_cv.split(X_train, y_train):

        gridcvs[name].fit(X_train[train_idx], y_train[train_idx]) # inner loop for hyperparameter tuning
        print(f'\n       Best ACC (avg. of inner test folds) {gridcvs[name].best_score_ * 100:.2f}%')
        print('         Best parameters: ', gridcvs[name].best_params_)
        ## perform on test fold (valid_idx)
        outer_scores.append(gridcvs[name].best_estimator_.score(X_train[valid_idx], y_train[valid_idx]))
        print(f'         ACC (on outer test fold) {outer_scores[-1] * 100}')

        print('\n       Outer Loop:')
        print(f'            ACC {np.mean(outer_scores) * 100:.2f}% +/- {np.std(outer_scores) * 100:.2f}')

-------------------------------------------------- 

Algorithm:  DTree
    Inner loop:

       Best ACC (avg. of inner test folds) 72.59%
         Best parameters:  {'criterion': 'gini', 'max_depth': None}
         ACC (on outer test fold) 75.5

       Outer Loop:
            ACC 75.50% +/- 0.00

       Best ACC (avg. of inner test folds) 74.03%
         Best parameters:  {'criterion': 'entropy', 'max_depth': 7}
         ACC (on outer test fold) 78.25

       Outer Loop:
            ACC 76.88% +/- 1.37

       Best ACC (avg. of inner test folds) 73.88%
         Best parameters:  {'criterion': 'entropy', 'max_depth': 9}
         ACC (on outer test fold) 77.375

       Outer Loop:
            ACC 77.04% +/- 1.15

       Best ACC (avg. of inner test folds) 73.38%
         Best parameters:  {'criterion': 'entropy', 'max_depth': 8}
         ACC (on outer test fold) 74.875

       Outer Loop:
            ACC 76.50% +/- 1.37

       Best ACC (avg. of inner test folds) 73.91%
         Best par




       Best ACC (avg. of inner test folds) 88.91%
         Best parameters:  {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
         ACC (on outer test fold) 90.0

       Outer Loop:
            ACC 90.00% +/- 0.00





       Best ACC (avg. of inner test folds) 88.75%
         Best parameters:  {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
         ACC (on outer test fold) 91.0

       Outer Loop:
            ACC 90.50% +/- 0.50





       Best ACC (avg. of inner test folds) 89.31%
         Best parameters:  {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
         ACC (on outer test fold) 90.0

       Outer Loop:
            ACC 90.33% +/- 0.47

       Best ACC (avg. of inner test folds) 88.59%
         Best parameters:  {'clf1__C': 0.1, 'clf1__penalty': 'l2'}
         ACC (on outer test fold) 89.375

       Outer Loop:
            ACC 90.09% +/- 0.58

       Best ACC (avg. of inner test folds) 88.66%
         Best parameters:  {'clf1__C': 0.01, 'clf1__penalty': 'l2'}
         ACC (on outer test fold) 89.5

       Outer Loop:
            ACC 89.97% +/- 0.57




- Determine the best algorithm from the experiment above; e.g, we find the Random Forest is performing the best
- Now select hyperparameters for the model based on regular k-fold on the whole training set

In [11]:
gcv_model_select = GridSearchCV(estimator=clf5,
                                param_grid=param_grid5,
                                scoring='accuracy',
                                n_jobs=-1,
                                cv=inner_cv,
                                verbose=1,
                                refit=True)

gcv_model_select.fit(X_train, y_train)
print(f'Best CV accuracy: {gcv_model_select.best_score_ * 100:.2f}%')
print('Best parameters:', gcv_model_select.best_params_)

Fitting 2 folds for each of 5 candidates, totalling 10 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:  3.4min finished


Best CV accuracy: 93.30%
Best parameters: {'n_estimators': 10000}


- Using these settings, we can now train the best model to the whole training set

In [12]:
# we can skip refitting the model because we set refit=True
# so sklearn has already fit the mdoel to the entire dataset

train_acc = accuracy_score(y_true=y_train, y_pred=gcv_model_select.predict(X_train))
test_acc = accuracy_score(y_true=y_test, y_pred=gcv_model_select.predict(X_test))

print(f"Training Accuracy: {100 * train_acc:.2f}%")
print(f"Test Accuracy: {100 * test_acc:.2f}%")

Training Accuracy: 100.00%
Test Accuracy: 94.00%


The outer fold accuracy previously was ACC: 93.98% +/- 0.98, which is pretty close to the what we have. 