#### Introduction to Statistical Learning, Lab 5.2

# Leave-One-Out & $k$-Fold Cross Validation

We will use the leave-one-out (LOOCV) and $k$-fold cross validation approaches to evaluate the test error rates from various linear models on the `Auto` data set.

We will use the linear models and tools from the `sklearn` library in this lab.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn.linear_model as skl_lm
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import PolynomialFeatures
from islpy import datasets, utils, lmplots
sns.set()
%matplotlib inline

We first load the data set and create the linear model `mpg~horsepower`.

In [None]:
auto = datasets.Auto()
x = auto[['horsepower']]
y = auto['mpg']
model = skl_lm.LinearRegression()

#### Leave-One-Out

The LOOCV can be automated with `sklearn`'s `LeaveOneOut()`.

In [None]:
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import KFold

In [None]:
loocv = LeaveOneOut()
cv = loocv.split(x)
scores = cross_val_score(model, x, y, scoring="neg_mean_squared_error", cv=cv)

In [None]:
print(f'Folds: {len(scores)}, MSE: {np.mean(np.abs(scores)):.2f}, STD: {np.std(scores):.2f}')

Or using `sklearn`'s `KFold` with $k = n$.

In [None]:
cv = KFold(n_splits=auto.shape[0], random_state=None, shuffle=False)
scores = cross_val_score(model, x, y, scoring="neg_mean_squared_error", cv=cv)

In [None]:
print(f'Folds: {len(scores)}, MSE: {np.mean(np.abs(scores)):.2f}, STD: {np.std(scores):.2f}')

#### $k$-Fold

We evaluate the MSE for ten different polynomial models using $k$-fold cross validation with $k = 10$.

In [None]:
errors = []
for degree in range(1, 11):
    poly = PolynomialFeatures(degree=degree)
    x_train = poly.fit_transform(x)
    cv = KFold(n_splits=10, random_state=1, shuffle=True)
    scores = cross_val_score(model, x_train, y, scoring="neg_mean_squared_error", cv=cv)
    errors.append(np.mean(np.abs(scores)))
    print(f'Degree: {degree:2d}, MSE: {errors[-1]:.2f}')
print(errors)