# Model Evaluation Using Cross-Validation
This notebook discusses the practical aspects of assessing the generalization performance of a model via cross-validation instead of a single train-test split.

## Data Preparation

In [1]:
import pandas as pd

adult_census = pd.read_csv("../adult.csv")

target_name = "class"
target = adult_census["class"]
data = adult_census.drop(columns=target_name)

numerical_columns = ["age", "capital-gain", "capital-loss", "hours-per-week"]
data_numeric = data[numerical_columns]

In [3]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(), LogisticRegression())

## Cross-Validation
When we split the data into simple train-test sets, the score of the model will depend on the way the split is made. This single split does not give any information about the variability of the selection.

**Cross-Validation:** technique that consists of repeating the procedure such that training and testing sets are different each time. Generalization performance metrics are collected for each repetition and aggregated. *This can be computationally expensive as it requires training several models instead of one*.

One such strategy is **K-fold**, where the entire dataset is split into `K` partitions. The fitting and scoring procedure is repeated `K` times, where at each iteration `K-1` partitions are used to fit the model and `1` partition is used to score.

In [4]:
from sklearn.model_selection import cross_validate

cv_result = cross_validate(
    model,
    data_numeric,
    target,
    cv=5
)
cv_result

{'fit_time': array([0.03228688, 0.02814007, 0.0289371 , 0.02435994, 0.02446795]),
 'score_time': array([0.0071609 , 0.0058589 , 0.00586581, 0.00609303, 0.00567818]),
 'test_score': array([0.79557785, 0.80049135, 0.79965192, 0.79873055, 0.80456593])}

`cross_validate` performs cross-validation by taking a model, the data, the target, and a strategy (`cv`).

The output is a `dict`, which contains the `fit_time`, the `score_time`, and the `test_score` for each fold.

By default, the `cross_validate` function discards the `K` models that were trained on the different subsets of the dataset. This is because *the goal of cross-validation is not the train a model, but to estimate approximately the generalization performance of a model that would have been trained to the full training set.*