# Validation
## Why validation?
When we trained a model we have to test it with test data. This is called <i>validation</i>. It provides a measure for the quality of our model. <br>
<b>Never validate a model with data it has already been trained with!</b>

## Split train and test data

In [9]:
# import data
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

In [10]:
# Choose model
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=1)

In [11]:
from sklearn.model_selection import train_test_split

# Split data
X1, X2, y1, y2 = train_test_split(X,y,random_state=0,test_size=0.5)

# Fit model to train data
model.fit(X1, y1)

# Predict labels of test data
y_pred = model.predict(X2)

In [12]:
# Get accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_pred, y2)

0.9066666666666666

## k-fold Cross validation

In [13]:
from sklearn.model_selection import cross_val_score
k = 5
scores = cross_val_score(model, X, y, cv = k)
print(scores)

[0.96666667 0.96666667 0.93333333 0.93333333 1.        ]


## Leave-one-out cross validation
Is basically an extreme case of k-fold cross validation with $k =$ <i>number of instances</i>. Just like with k-fold cross validation, `cross_val_score(..., cv = LeaveOneOut(len(X)))` will return an array with the success rates of each try. But since the model is then only tested on only one element, each element in the array is either 0 or 1. A good measure for the overall success rate is the mean of all values.

In [14]:
from sklearn.cross_validation import LeaveOneOut
scores = cross_val_score(model, X, y, cv = LeaveOneOut(len(X)))
print('Shape of scores:')
print(scores)
print('\nSuccess rate:', scores.mean())

Shape of scores:
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1.
 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1.]

Success rate: 0.96
