# K-Fold cross-validation

[Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html)

Provides indices to split the data into train/test sets, into k consecutive folds. There are variations, such as [leave one out](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneOut.html), [leave p out](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeavePOut.html), [group k-fold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html), etc. 

Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
Available [visuals](https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html#sphx-glr-auto-examples-model-selection-plot-cv-indices-py).

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import KFold

iris = datasets.load_iris(as_frame=True)
X, y = iris['data'], iris['target']

Keep in mind two things:
1) KFold does not split - it just gives you the indices.
2) It can work for both supervised and unsupervised learning.

#### Supervised case:

In [None]:
kf = KFold(n_splits=5, shuffle=False)

# kf.split produces a generator that yields two numpy arrays each time: 
# (train_index0, test_index0), (train_index1, test_index1), (train_index2, test_index2)...

# enumerate(kf.split(X)) basically places an index for each of these:
# (0, (train_index0, test_index0)), (1,(train_index1, test_index1)), (2,(train_index2, test_index2))...
 
for fold, (train_index, test_index) in enumerate(kf.split(X)):
    print(f"\n Fold: {fold}")

    X_train, X_test = X['train_index'], X['test_index']
    y_train, y_test = y['train_index'], y['test_index']

    print(f"Train index: {train_index}")
    print(f"X train: {X_train}")
    print(f"y train: {y_train}")

    print(f"Test index: {test_index}")
    print(f"X test: {X_train}")
    print(f"y test: {y_train}")