"One of the main advantages of the maximum likelihood technique is that
it provides a test of the hypothesis $H_k$. that k common factors are
sufficient to describe the data against the alternative that $\Sigma$ has no
constraints." (Section 9.5)

$$
-2 \log \lambda = n F(\hat\lambda, \hat\Psi) = n \ \text{tr}((\hat{\Lambda}\hat{\Lambda}' + \Psi)^{-1} S) - \log(|(\hat{\Lambda}\hat{\Lambda}' + \Psi)^{-1} S|) - p
$$

$-2 \log \lambda$ has an asymptotic $\chi^2_s$ destribution under $H_k$. 
$$
s = \frac{1}{2}(p - k)^2 - \frac{1}{2}(p + k)
$$

Approximation is improved if:
$$
n' = n - 1 - \frac{1}{6}(2p + 5) - 2/3k
$$

$$
U = n' F(\hat\lambda, \hat\Psi) \sim \chi^2_s
$$

---

In [1]:
import numpy as np
from utils import open_closed_data, calculate_objective, factor_model_solution

In [2]:
X = open_closed_data()
k = 1

psi_hat, lambda_hat = factor_model_solution(X, k=k)
objective = calculate_objective(psi_hat[np.diag_indices_from(psi_hat)], X, k=k)
objective

0.10319722696589828

In [3]:
n = X.shape[0]
p = X.shape[1]

n_mark = n - 1 - 1/6 * (2 * p + 5) - 2/3 * k
n_mark

83.83333333333333

In [4]:
U = objective * n_mark
U

8.651367527307805

$$
U = 8.65 \sim \chi^2_5
$$

In [5]:
from scipy.stats import chi2
s = 5
chi2.ppf(0.95, df=s) # 95%-Quantile. Inverse of cdf

11.070497693516351

Since $x_{5; 0.05}^2 = 11.1$, we accept the one-factor solution as adequate for this data.

In terms of p-value:

In [6]:
1 - chi2.cdf(U, df=s)

0.12380436101001835

Let's generalize the process in a function.

In [7]:
def factor_goodness_of_fit_test(X, k):
    """
    Calculate the p-value for the null hypothesis that k factors is sufficient to describe the data, 
    against the alternative that Sigma has no constraints.

    Parameters:
    ---
    X:  (n, p) matrix
        Data matrix

    k:  Integer
        Number of factors to test for
    
    Returns:
    ---
    p-value: float
        The p-value for the U statistic under the null hypothesis
    """
    
    psi_hat, _ = factor_model_solution(X, k)
    objective = calculate_objective(psi_hat[np.diag_indices_from(psi_hat)], X, k)
    n = X.shape[0]
    p = X.shape[1]

    n_mark = n - 1 - 1/6 * (2 * p + 5) - 2/3 * k

    U = objective * n_mark
    s = 1/2 * (p - k) ** 2 - 1/2 * (p + k)
    return chi2.sf(U, df=s)

In [8]:
factor_goodness_of_fit_test(X=X, k=k)

0.12380436101001834