# Leave-One-Out Cross-Validation

For ridge regression, the leave-one-out cross-validation (LOOCV) can be computed using a more efficient formula that avoids repeatedly fitting a model for each data point.

*This notebook verifies the more efficient computation against the brute force approach.*

## Import Dependencies

In [1]:
import numpy as np

## Generate Random Data

In [2]:
np.random.seed(0)
n, p = 25, 3
sigma = 0.1
X = np.random.random_sample((n, p))
beta = np.random.random_sample(p)
y = np.dot(X, beta) + np.random.normal(scale=sigma, size=n)
Gamma = np.diag(np.random.random_sample(p))

## Compute LOOCV the Slow Way

In [3]:
def compute_loocv_slow(X, y, Gamma):
    result = 0
    for i in range(len(y)):
        x_i = X[i,:]
        y_i = y[i]
        X_mi = np.delete(X, i, 0)
        y_mi = np.delete(y, i)
        A_mi = np.dot(X_mi.T, X_mi) + np.dot(Gamma.T, Gamma)
        b_hat_mi = np.dot(np.linalg.inv(A_mi), np.dot(X_mi.T, y_mi))
        y_hat_i = np.dot(x_i, b_hat_mi)
        result += (y_i - y_hat_i)**2
    return result / len(y)

## Compute LOOCV the Fast Way

In [4]:
def compute_loocv(X, y, Gamma):
    A = np.dot(X.T, X) + np.dot(Gamma.T, Gamma)
    A_inv = np.linalg.inv(A)
    b_hat = np.dot(A_inv, np.dot(X.T, y))
    y_hat = np.dot(X, b_hat)
    h = np.array([np.dot(x_i, np.dot(A_inv, x_i)) for x_i in X])
    return np.sum(((y - y_hat) / (1 - h))**2) / len(y)

## Verify Both Approaches Result in the Same Number

In [5]:
"%f ~ %f" % (compute_loocv_slow(X, y, Gamma), compute_loocv(X, y, Gamma))

'0.007926 ~ 0.007926'