# k-fold cross validation

<b>Cross-validation</b>, sometimes called <b>rotation estimation</b> is a technique to evaluate predictive models by partitioning the original sample into a training set to train the model, and a test set to evaluate it. 

In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k-1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (or otherwise combined) to produce a single estimation. The advantage of this method is that all observations are used for both training and validation, and each observation is used for validation exactly once.

In [1]:
from sklearn.datasets import load_digits

In [2]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

In [3]:
digits = load_digits()

In [4]:
from sklearn.model_selection import cross_val_score

### Decision Tree

In [5]:
dt_score = cross_val_score(DecisionTreeClassifier(), digits.data, digits.target)
dt_score

array([0.75747508, 0.8230384 , 0.75671141])

### Logistic Regression

In [6]:
lr_score = cross_val_score(LogisticRegression(), digits.data, digits.target)
lr_score

array([0.89534884, 0.94991653, 0.90939597])

### Support Vector Machine

In [7]:
svm_score = cross_val_score(SVC(), digits.data, digits.target)
svm_score

array([0.39368771, 0.41068447, 0.45973154])

### Random Forest

In [8]:
rf_score = cross_val_score(RandomForestClassifier(n_estimators=60), digits.data, digits.target)
rf_score

array([0.93355482, 0.94991653, 0.92449664])

In [9]:
rf_score = cross_val_score(RandomForestClassifier(n_estimators=50), digits.data, digits.target)
rf_score

array([0.92857143, 0.94490818, 0.92785235])

# Result: Best score so far is from Random Forest.