In [1]:
import numpy as np
from statsmodels.distributions.empirical_distribution import ECDF

Get our results for the Pearson categorical divergences for the CTPMHg simulations.

In [2]:
npzfile = np.load('categorical_divergences_and_chi2_approx_pvals.npz')
pearson_categorical_divergences = npzfile['pearson_categorical_divergences']

Load our simulated empirical distribution of the Pearson categorical divergence statistic under the null multinomial distribution. (So basically a "Monte Carlo approximation to an exact Multinomial test for Pearson's $\chi^2$ statistic", using more standard terminology).

In [3]:
mc_npzfile = np.load('../monte_carlo_results/complete_chi2_simulation.npz')
monte_carlo_vals = mc_npzfile['chi2_stats']
monte_carlo_vals.shape

(1000000000,)

In [4]:
monte_carlo_ecdf = ECDF(monte_carlo_vals)

The $p$-values correspond to the survival function (probability of being _more_ extreme, rather than CDF=no more extreme than), i.e. $1 - CDF$.

In [5]:
monte_carlo_pvals = 1. - monte_carlo_ecdf(pearson_categorical_divergences)

For easy/quick reference/use later

In [6]:
np.savez_compressed('monte_carlo_pvals.npz', monte_carlo_pvals=monte_carlo_pvals)