# Manual k-Fold Cross-Validation
The gold standard for machine learning model evaluation is k-fold cross-validation. It provides
a robust estimate of the performance of a model on unseen data. It does this by splitting
the training dataset into k subsets, taking turns training models on all subsets except one,
which is held out, and evaluating model performance on the held-out validation dataset. The
process is repeated until all subsets are given an opportunity to be the held-out validation
set. The performance measure is then averaged across all models that are created.

It is important to understand that cross-validation means estimating a model design (e.g.,
3-layer vs. 4-layer neural network) rather than a specific fitted model. You do not want to
use a specific dataset to fit the models and compare the result since this may be due to that
particular dataset fitting better on one model design (as known as overfitting). Instead, you
want to use multiple datasets to fit, resulting in multiple fitted models of the same design,
taking the average performance measure for comparison.

Cross-validation is often not used for evaluating deep learning models because of the
greater computational expense. For example, k-fold cross-validation is often used with 5 or
10 folds. As such, 5 or 10 models must be constructed and evaluated, significantly adding to
the evaluation time of a model. Nevertheless, when the problem is small enough or if you have
sufficient computing resources, k-fold cross-validation can give you a less-biased estimate of
the performance of your model.

In the example below, you will use the handy `StratifiedKFold` class from the scikit-learn
Python machine learning library to split the training dataset into 10 folds. The folds are
stratified, meaning that the algorithm attempts to balance the number of instances of each
class in each fold. The example creates and evaluates 10 models using the 10 splits of the
data and collects all the scores. The verbose output for each epoch is turned off by passing
`verbose=0` to the `fit()` and `evaluate()` functions on the model. The performance is printed for
each model, and it is stored. The average and standard deviation of the model performance
are then printed at the end of the run to provide a robust estimate of model accuracy.

In [1]:
# MLP for Pima Indians Dataset with 10-fold cross validation
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy as np

In [2]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

In [4]:
# load pima indians dataset
dataset = np.loadtxt("../7_first_nn/pima-indians-diabetes.data.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

In [5]:
# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
cvscores = []

In [6]:
for train, test in kfold.split(X, Y):
    # create model
    model = Sequential()
    model.add(Dense(12, input_shape=(8,), activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    # Fit the model
    model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)

    # evaluate the model
    scores = model.evaluate(X[test], Y[test], verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    cvscores.append(scores[1] * 100)

print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))

2022-09-29 21:48:52.345077: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-09-29 21:48:52.345166: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-09-29 21:48:52.345227: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (theia-20220907-182918): /proc/driver/nvidia/version does not exist
2022-09-29 21:48:52.345535: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
accuracy: 74.03%
accuracy: 77.92%
accuracy: 76.62%
accuracy: 66.23%
accuracy: 64.94%
accuracy: 67.53%