# Nested k-Fold CV

Here is a very simple example of nested k-fold cross-validation applied to a classification problem. The classifier is a Random Forest, whose tuning hyperparameters are the number of trees and the maximum depth of a tree.


In [1]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier

pipeline = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('pca', PCA(n_components=2)),
    ('classifier', RandomForestClassifier())
])

### Steps:

1. generate or load the dataset
2. define the hyperparameter grid of values
3. definne the loops (see below)
4. perform the nested CV using `cross_val_score`

- Outer CV Loop: Splits the data into training and test sets.
- Inner CV Loop: Conducts hyperparameter tuning (e.g., using `GridSearchCV` or `RandomizedSearchCV`) on the training data.

In [2]:
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold
from sklearn.datasets import make_classification

# Example dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Hyperparameters to tune
param_grid = {
    'pca__n_components': [2, 5, 10],
    'classifier__n_estimators': [50, 100],
    'classifier__max_depth': [None, 10, 20],
}

# Inner CV for hyperparameter tuning
grid_search = GridSearchCV(pipeline, param_grid, cv=5)

# Outer CV for model evaluation
outer_cv = KFold(n_splits=5)

# Nested CV
nested_score = cross_val_score(grid_search, X, y, cv=outer_cv)

print("Nested CV Score = ", nested_score.mean())

Nested CV Score:  0.7850000000000001
