# Cross Validation

## The Basic Machine Learning Process
To evaluate our supervised models, we have split our dataset into a training set and a test set using the train_test_split function.

We then build a model on the training set by calling the fit method, and evaluate it on the test set using the score method, which for classification computes the fraction of correctly classified samples.

Below is an example of that process: 

In [1]:
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# create a synthetic dataset
X, y = make_blobs(random_state = 0)
# split data and labels into a training and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=44)
# instantiate a model and fit it to the training set
logreg = LogisticRegression().fit(X_train, y_train)
# evaluate the model on the test set
print("Test set score: {:.2f}".format(logreg.score(X_test, y_test)))

Test set score: 1.00


## Why we split the data
We are interested in measuring how well our model generalizes to new, previously unseen data.

We are not interested in how well our model fits the training set, but rather in how well it can make predictions for data that was not observed during training.

## Better Evaluation Approaches

### Cross-Validation(A More Robust Way to Assess Generalization Performance)

### Grid Search(An Effective method for adjusting parameters in supervised models for the best generalization performance)

### Cross-Validation


In [8]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

In [19]:
iris = load_iris()
logreg = LogisticRegression(max_iter=400)

In [20]:
scores = cross_val_score(logreg, iris.data, iris.target, cv=5)
for score in scores:
    print(score)
print("Average cross-validation score: {:.2f}".format(scores.mean()))

0.9666666666666667
1.0
0.9333333333333333
0.9666666666666667
1.0
Average cross-validation score: 0.97
