Given feature matrix $X$, target vector $y$, and regularization matrix $\Gamma$, let 

$$\hat{b}_\Gamma=\left( X^T X + \Gamma^T \Gamma\right)^{-1}X^T y$$

denote the ridge regression model fit to the trainging data and $\hat{b}_{-i, \Gamma}$ denote the model fit to the data with the ith entry removed. 

This notebook verifies that the leave-one-out cross validation

$$\sum_i \left| y_i - x_i^T \hat{b}_{-i,\Gamma} \right|^2$$

can be computed via the more efficient formulation

$$\sum_i \left| \frac{y_i - \hat{y}_i}{1 - h_i} \right|^2$$

where

$$\hat{y} = X \hat{b}_\Gamma$$

and

$$h_i = x_i^T \left(X^T X + \Gamma^T \Gamma\right)^{-1} x_i$$

In [1]:
import numpy as np
np.random.seed(0)
n, k = 25, 3
sigma = 0.1
X = np.random.random_sample((n,k))
beta = np.random.random_sample(k)
y = np.dot(X, beta) + np.random.normal(scale=sigma, size=n)
Gamma = np.diag(np.random.random_sample(k))

In [2]:
def compute_loocv_slow(X, y, Gamma):
    result = 0
    for i in range(len(y)):
        x_i = X[i,:]
        y_i = y[i]
        X_mi = np.delete(X, i, 0)
        y_mi = np.delete(y, i)
        A_mi = np.dot(X_mi.T, X_mi) + np.dot(Gamma.T, Gamma)
        b_hat_mi = np.dot(np.linalg.inv(A_mi), np.dot(X_mi.T, y_mi))
        y_hat_i = np.dot(x_i, b_hat_mi)
        result += (y_i - y_hat_i)**2
    return result

In [3]:
def compute_loocv(X, y, Gamma):
    A = np.dot(X.T, X) + np.dot(Gamma.T, Gamma)
    A_inv = np.linalg.inv(A)
    b_hat = np.dot(A_inv, np.dot(X.T, y))
    y_hat = np.dot(X, b_hat)
    h = np.array([np.dot(x_i, np.dot(A_inv, x_i)) for x_i in X])
    return np.sum(((y - y_hat) / (1 - h))**2)

In [4]:
compute_loocv_slow(X, y, Gamma)

0.1981535144765843

In [5]:
compute_loocv(X, y, Gamma)

0.19815351447658439