## CHAPTER 11
---
# MODEL EVALUATION

---
- In this chapter we will examine strategies for evaluating the quality of models created through our learning algorithms. 
- It might appear strange to discuss model evaluation before discussing how to create them, but there is a method to our madness. 
- Models are only as useful as the quality of their predictions, and thus fundamentally our goal is not to create models (which is easy) but to create high-quality models (which is hard). 
- Therefore, before we explore the myriad learning algorithms, we first set up how we can evaluate the models they produce.

## 11.1 Cross-Validating Models
*Problem:* we want to evaluate how well our model will work in the real world
*Solution:* we will create a pipeline that
- preprocesses the data, 
- trains the model, and then 
- evaluates it using cross-validation

In [1]:
# Load libraries
from sklearn import datasets
from sklearn import metrics
from sklearn.model_selection import KFold, cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# load digits dataset
digits = datasets.load_digits()

# create features matrix
features = digits.data

# create target vector
target = digits.target

# create standardizer
standardizer = StandardScaler()

# create logitic regression object
logit = LogisticRegression()

# create a pipeline that standardizes, then runs logistic regression
pipeline = make_pipeline(standardizer, logit)

# create k-fold cross-validation
kf = KFold(n_splits=10, shuffle=True, random_state=1)

# conduct k-fold cross-validation
cv_results = cross_val_score(pipeline, # Pipeline
                             features, # feature matrix
                             target, # target vector
                             cv=kf, # cross-validation technique,
                             scoring="accuracy", # loss function
                             n_jobs=-1) # use all CPU cores

# calculate mean
cv_results.mean()

0.9693916821849783

### Discussion:
- Our goal is to evaluate how well our model does on data it has never seen before (e.g., a new customer, a new crime, a new image). 
- **The validation approach:**
    - split data into training set and test set
    - set the test set aside and pretend it's never been seen before
    - train the model on the training set and teach it how to make the best predictions
    - evaluate the model on the testing set and see how it does
- The two major weaknesses of the validation approach:
    - the performance of the model can be highly dependent on which few observations were selected for the test set. 
    - Second, the model is not being trained using all the available data, and not being evaluated on all the available data.
- **The k-fold cross-validation (KFCV) strategy:**
    - data is split into k parts, called *"folds"*
    - the model is trained using k-1 folds, combined as a training set
    - the last fold is used as a test set
    - this is repeated k times each time using a different fold as the test set. 
    - The performance on the model for each of the k iterations is then averaged to produce an overall measurement.