# Cross-validation

**• Uses multiple train-test splits, not just a single one. Each split used to train & evaluate a separate model**

• Why is this better?

– The accuracy score of a supervised learning method can vary, depending on which samples happen to end up in the training set.

– Using multiple train-test splits gives more stable and reliable estimates for how the classifier is likely to perform on average.

– Results are averaged over multiple different training sets instead of relying on a single model trained on a particular training set.

In [None]:
from sklearn.model_selection import cross_val_score

clf = KNeighborsClassifier(n_neighbors = 5)
X = X_fruits_2d.as_matrix()
y = y_fruits_2d.as_matrix()

cv_scores = cross_val_score(clf, X, y)

print('Cross-validation scores (3-fold):', cv_scores)
print('Mean cross-validation score (3-fold): {:.3f}'
     .format(np.mean(cv_scores)))

**A note on performing cross-validation for more advanced scenarios.**

In some cases (e.g. when feature values have very different ranges), we've seen the need to scale or normalize the training and test sets before use with a classifier. The proper way to do cross-validation when you need to scale the data is not to scale the entire dataset with a single transform, since this will indirectly leak information into the training data about the whole dataset, including the test data. 

Instead, scaling/normalizing must be computed and applied for each cross-validation fold separately. To do this, the easiest way in scikit-learn is to **use pipelines**

## Validation curve example

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import validation_curve

param_range = np.logspace(-3, 3, 4)
train_scores, test_scores = validation_curve(SVC(), X, y,
                                            param_name='gamma',
                                            param_range=param_range, cv=3)

print(train_scores)
print(test_scores)

See:  http://scikit-learn.org/stable/auto_examples/model_selection/plot_validation_curve.html