# Goodness-of-Fit, its $R$-Dependence, and Gamma Mixtures
This notebook evaluates the goodness-of-fit of the gamma distribution to regional aggregate
heat flow distributions obtained from random global $R$-disk coverings (RGRDCs). The impact
of $R$, the disk radius, is evaluated to clarify whether large regions might be worse captured
by the gamma distribution than smaller regions (assuming the existence of "large scale"
trends in heat flow).

Finally, a mixture of two gamma distribution is used as the model for regional aggregate heat
flow, corresponding to the case that the gamma distribution can capture well the variability
in a "homogeneous" heat flow region but that sometimes the $R$-disk includes the boundary of
two such regions.

In [None]:
import json
import numpy as np
from pyproj import Proj
from pickle import Unpickler
from cache import cached_call
from scipy.special import erf
import matplotlib.pyplot as plt
from pdtoolbox import normal_pdf
from matplotlib.ticker import FixedLocator
from loaducerf3 import Polygon, PolygonSelector
from matplotlib.patches import Polygon as MPolygon
from pdtoolbox import gamma_pdf
from pdtoolbox import normal_mvue, normal_logL, normal_cdf, \
                      frechet_mle, frechet_logL, frechet_cdf, \
                      gamma_mle, gamma_logL, gamma_cdf, \
                      nakagami_mle, nakagami_logL, nakagami_cdf, \
                      log_logistic_mle, log_logistic_logL, log_logistic_cdf, \
                      shifted_gompertz_mle, shifted_gompertz_logL, shifted_gompertz_cdf, \
                      weibull_mle, weibull_logL, weibull_cdf, \
                      log_normal_mle, log_normal_logL, log_normal_cdf, \
                      inverse_gamma_mle, inverse_gamma_logL, inverse_gamma_cdf
from pdtoolbox.cython.gamma_accel import gamma_ks_ad_batch
from pdtoolbox import GammaDistribution, LogLogisticDistribution
from pdtoolbox.gof import LillieforsTable, AndersonDarlingTable

from reheatfunq.coverings import random_global_R_disk_coverings
from reheatfunq.resilience import generate_synthetic_heat_flow_coverings_mix3, \
                                  generate_normal_mixture_errors_3

Pretty figures on HiDPI:

In [None]:
%config InlineBackend.figure_format = 'retina'

Load the data:

In [None]:
hf_continental = np.load('intermediate/heat-flow-selection-mW_m2.npy')

In [None]:
with open('intermediate/02-Geometry.pickle','rb') as f:
    saf_geometry = Unpickler(f).load()

proj_saf = Proj(saf_geometry["proj_str"])

In [None]:
with open('intermediate/03-Buffered-Poly.pickle','rb') as f:
    buffered_poly = Unpickler(f).load()

In [None]:
mask = np.ones(hf_continental.shape[1], dtype=bool)
hf_xy = np.stack(proj_saf(*hf_continental[1:3,:]), axis=1)

for poly in saf_geometry["selection_polygons_xy"]:
    select = PolygonSelector(Polygon(*poly[:-1].T))
    mask &= ~select.array_mask(hf_xy)
hf_independent = (hf_continental.T)[mask]

In [None]:
with open("intermediate/A1-Critical-Gamma.json", 'r') as f:
    LA = json.load(f)
    LG, ADG = LillieforsTable.from_json(LA[0]), AndersonDarlingTable.from_json(LA[1])

In [None]:
with open("intermediate/A1-Critical-Log-Logistic.json", 'r') as f:
    LA = json.load(f)
    LLL, ADLL = LillieforsTable.from_json(LA[0]), AndersonDarlingTable.from_json(LA[1])

## Behavior for Different Radii
Here we investigate, using goodness-of-fit tests, how well the regional aggregate heat
flow distributions are described by the gamma distributions.

We use the default configuration:

In [None]:
REPETITIONS = 200
SEED = 890128959
DMIN_KM = 20
MIN_POINTS = 10

Now we investigate radii $R$ in a range from $60\,\mathrm{km}$ to $260\,\mathrm{km}$. For each radius, we
generate a number of `REPETITIONS` RGRDCs that overlap in their coverage of the NGHF data set.

In [None]:
sequence = np.random.SeedSequence(SEED)
seeds = sequence.spawn(REPETITIONS)

dist_db = {}
dist_info_db = {}
R_set = [60,70,80,90,100,120,140, 160, 190, 220, 260]
for r in R_set:
    print("r =",r,"km")
    dist_db[r] = []
    dist_info_db[r] = []
    for i in range(REPETITIONS):
        valid_points, _, distributions, distribution_lola, distribution_indices \
           = cached_call(random_global_R_disk_coverings, r*1e3, MIN_POINTS, hf_independent,
                         buffered_poly, saf_geometry["proj_str"], dmin=DMIN_KM*1e3, seed=seeds[i])
        dist_db[r].append([list(dist) for dist in distributions])
        dist_info_db[r].append((distribution_indices, valid_points, distribution_lola))

Now investigate each of these RGRDCs using the Lilliefors (Kolmgorov-Smirnov)
and Anderson-Darling goodness-of-fit tests (Stephens, 1980). For each RGRDC,
we compute the rate at which the Lilliefors and AD tests reject the gamma hypothesis
for the $R$-disks at $\alpha=5\%$. The critical table for the gamma distribution
has been computed in notebook `A1-Critical-EDF-Statistics.ipynb`.

In [None]:
def distribution_analysis(distribution, dist_db_json, R_set, Ni=100,
                          LG_json = json.dumps(LG.to_json(), sort_keys=True),
                          ADG_json = json.dumps(ADG.to_json(), sort_keys=True),
                          LLL_json = json.dumps(LLL.to_json(), sort_keys=True),
                          ADLL_json = json.dumps(ADLL.to_json(), sort_keys=True)):
    """
    Analyzes a distribution for different R.
    """
    # Regenerate key-value pairs to dictionaries:
    dist_db = {int(key) : val for key,val in json.loads(dist_db_json).items()}
    LG   = LillieforsTable.from_json(json.loads(LG_json))
    ADG  = AndersonDarlingTable.from_json(json.loads(ADG_json))
    LLL  = LillieforsTable.from_json(json.loads(LLL_json))
    ADLL = LillieforsTable.from_json(json.loads(ADLL_json))
    print("dist_db.keys():",dist_db.keys())
    
    
    if distribution == "gamma":
        distribution = GammaDistribution
    elif distribution == "log-logistic":
        distribution = LogLogisticDistribution
    else:
        raise ValueError()
    
    # Distribution for the data:
    print("Starting the iterations")
    n_dists = {}
    ad = {}
    ks = {}
    ad_reject = {}
    ks_reject = {}
    M_SET = set()
    for r in R_set:
        n_dists[r] = len(dist_db[r])
        m_set = [[len(dist) for dist in dists] for dists in dist_db[r]]
        ad_r = []
        ks_r = []
        adrej_r = []
        ksrej_r = []
        for dists in dist_db[r]:
            ad_r.append([])
            ks_r.append([])
            adrej_r.append([])
            ksrej_r.append([])
            for dist in dists:
                dist = np.array(dist)
                if distribution == GammaDistribution:
                    adrej_r[-1].append(ADG.test_reject(dist))
                    ksrej_r[-1].append(LG.test_reject(dist))
                elif distribution == LogLogisticDistribution:
                    adrej_r[-1].append(ADLL.test_reject(dist))
                    ksrej_r[-1].append(LLL.test_reject(dist))
                ad_r[-1].append(distribution.anderson_darling_statistic(dist))
                ks_r[-1].append(distribution.kolmogorov_smirnov_statistic(dist))
        M_SET |= set(list(np.concatenate(m_set)))
        ad[r] = (ad_r, m_set)
        ks[r] = (ks_r, m_set)
        ad_reject[r] = adrej_r
        ks_reject[r] = ksrej_r
    
    # Random number generator for reference distribution:
    rng = np.random.default_rng(18298)
    if distribution == LogLogisticDistribution:
        rng_dist = fisk(10)
        get_random = lambda m : rng_dist.rvs(m, random_state=rng)
    elif distribution == GammaDistribution:
        get_random = lambda m : rng.gamma(3.0, size=m)
        
    
    # Reference distribution:
    print("compute reference distributions.")
    ad_reference =  {}
    ks_reference =  {}
    if distribution != GammaDistribution:
        for m in M_SET:
            ad_m = []
            ks_m = []
            for i in range(Ni):
                X = get_random(m)
                ad_m.append(distribution.anderson_darling_statistic(X))
                ks_m.append(distribution.kolmogorov_smirnov_statistic(X))
            ad_reference[m] = np.array(ad_m)
            ks_reference[m] = np.array(ks_m)
    
    return ad, ad_reference, ad_reject, ks, ks_reference, ks_reject

In [None]:
Ni = 100

For fast hash computation in caching, convert the `dist_db` into a key-sorted JSON string:

In [None]:
dist_db_str = json.dumps(dist_db, sort_keys=True)

In [None]:
ad1, ad_reference1, ad_reject1, ks1, ks_reference1, ks_reject1 \
    = cached_call(distribution_analysis, "gamma", dist_db_str, R_set, Ni=Ni)

If one would want to do the same for the log-logistic distribution:

#### Rejection rates
Now, compute the rejection rates for each RGRDC. The rejection rate is computed for each $R$ and for each of the `Ni` RGRDCs per $R$.

In [None]:
n_select = np.arange(10,201)
rej_rate = np.zeros((len(R_set), REPETITIONS, 2))
region_count = np.zeros((len(R_set), REPETITIONS))
for i,r in enumerate(R_set):
    ad,m = ad1[r]
    ks,m = ks1[r]
    for l in range(REPETITIONS):
        region_count[i,l] = len(ks_reject1[r][l])
        rej_rate[i,l,0] = np.count_nonzero(ks_reject1[r][l]) / region_count[i,l] 
        rej_rate[i,l,1] = np.count_nonzero(ad_reject1[r][l]) / region_count[i,l] 

### Synthetic rejection rate
Here, we compute a synthetic data base of RGRDCs which corresponds to the empirical
one within the bounds we impose to the synthetic data base:
1) The synthetic data base is derived from gamma-distributed data.
2) The gamma distributions are parameterized as derived from RGRDC disk samples
   using maximum likelihood estimators (the gamma distribution "look like the data")
   and they have the same sample sizes, mimicking the real-world data within the
   boundaries imposed by using a gamma distribution.
3) After drawing from the gamma distribution, a relative error is added to the synthetic
   data that is similar to the relative error observed in the NGHF data set for 'A' quality
   data (whereever an error estimate is available).
4) We also apply the limits $0 < q < 250\,\mathrm{mWm}^{-2}$ to the heat flow data.

With this approach, we aim to understand whether the gamma distribution in addition to some
measurement error could reproduce the test results from above.

First, we rebuild the 'A' quality error distribution using a mixture of three normal distributions:

In [None]:
X00 = 0.08
X01 = 0.30
X02 = 0.34
W0 = 0.39
S0 = 0.08
S1 = 0.08
W1 = 0.3
S2 = 0.24
W2 = 1 - W0 - W1

In [None]:
def mixture_pdf(x):
    SQ2 = np.sqrt(2)
    norm =   W0 * 0.5 * (1.0 - erf(-X00/(SQ2*S0))) \
           + W1 * 0.5 * (1.0 - erf(-X01/(SQ2*S1))) \
           + W2 * 0.5 * (1.0 - erf(-X02/(SQ2*S2)))
    return  (W0 * normal_pdf(xplot, X00, S0)
             + W1*normal_pdf(xplot, X01, S1)
             + W2*normal_pdf(xplot, X02, S2)) / norm

fig = plt.figure()
ax = fig.add_subplot(111)
xplot = np.linspace(0, 0.8, 200)
ax.plot(xplot, mixture_pdf(xplot))
ax.hist(generate_normal_mixture_errors_3(500000, W0, X00, S0, W1, X01, S1, X02, S2, 983928),
        density=True, zorder=0, color='lightgray', bins='auto')
print("This should look like Fig. S4 from the SI (A quality).")

Generate and analyze the synthetic heat flow data base using the same analysis
approach as for the NGHF data:

In [None]:
def generate_synthetic_heat_flow_database(repetitions, R_set, dist_db_json, rng_seed,
                                          hfmax=250, hfmin=0.0, kmin=1.0, w0=W0, x00=X00, s0=S0,
                                          w1=W1, x01=X01, s1=S1, x02=X02, s2=S2):
    """
    Generate a full synthetic global heat flow data base.
    """
    dist_db = {int(key) : [[np.array(dist) for dist in dists] for dists in val]
               for key,val in json.loads(dist_db_json).items()}
    
    # Compute the gamma distribution fits:
    print("compute gamma fits.")
    gamma_kt = {}
    gamma_N = {}
    for r in R_set:
        gamma_kt[r] = [[GammaDistribution._mle(dist) for dist in dists]
                       for dists in dist_db[r]]
        gamma_N[r] = [[dist.size for dist in dists] for dists in dist_db[r]]
    
    print("generate synthetic database.")
    rng = np.random.default_rng(rng_seed)
    synthetic_db = {}
    for i,R in enumerate(R_set):
        coverings = []
        K = rng.integers(0, len(dist_db[R]), size=repetitions)
        gamma_k = [np.array(gamma_kt[R][k])[:,0] for k in K]
        gamma_t = [np.array(gamma_kt[R][k])[:,1] for k in K]
        gamma_N_i = [np.array(gamma_N[R][k]) for k in K]
        synthetic = generate_synthetic_heat_flow_coverings_mix3(gamma_k, gamma_t, gamma_N_i, hfmax,
                                                                w0, x00, s0, w1, x01, s1, x02, s2,
                                                                rng.integers(2**63), 12)
        coverings += synthetic

        synthetic_db[R] = coverings
    

    
    print("perform AD and KS tests.")
    ad_reject = {}
    ks_reject = {}
    for i,R in enumerate(R_set):
        adrej_r = []
        ksrej_r = []
        for j in range(repetitions):
            # Perform tests:
            distributions = synthetic_db[R][j]
            dist_n = np.array([len(d) for d in distributions])
            ks, ad, k = gamma_ks_ad_batch(distributions, kmin)
            adrej_r.append(ad >= ADG(dist_n, k))
            ksrej_r.append(ks >= LG(dist_n, k))
        ad_reject[R] = adrej_r
        ks_reject[R] = ksrej_r
    
    
    print("computing rejection rates.")
    rej_rate = np.zeros((len(R_set),repetitions,2))
    region_count = np.zeros((len(R_set), repetitions))
    for i,R in enumerate(R_set):
        for j in range(repetitions):
            region_count[i,j] = len(ks_reject[R][j])
            rej_rate[i,j,0] = np.count_nonzero(ks_reject[R][j]) / region_count[i,j] 
            rej_rate[i,j,1] = np.count_nonzero(ad_reject[R][j]) / region_count[i,j]
    
    return synthetic_db, rej_rate

In [None]:
_, synthetic_rej_rate = \
   generate_synthetic_heat_flow_database(1000, R_set, dist_db_str, 209398, hfmin=0.0, hfmax=250)

#### Plot of the NGHF and synthetic data rejection rates for different $R$.

In [None]:
with plt.rc_context({'font.size': 8, 'axes.labelpad': 0.05, 'xtick.major.pad': 1.2, 'ytick.major.pad': 1.2}):
    fig = plt.figure(figsize=(6.975, 3.0), dpi=300)
    ax = fig.add_axes((0.065, 0.16, 0.36, 0.75))
    y = []
    c = []
    res = []
    R_plot = np.array(R_set)
    mask = np.ones(len(R_set), dtype=bool)
    skip = [70,90]
    for i,r in enumerate(R_set):
        if r in skip:
            mask[i] = False
            continue
        yi = []
        ni = 0
        ri = []
        for j in range(REPETITIONS):
            nj = region_count[i,j]
            if nj > 0:
                yi.append(rej_rate[i,j,0])
                ni += nj
                ri.append([1.0 / nj])
        y.append(yi)
        c.append(ni)
        res.append(np.mean(ri) if len(ri) > 0 else 1.0)
    R_plot = R_plot[mask]
    h0 = ax.boxplot(y, positions=np.array(R_plot)+2, widths=2.5, patch_artist=True,
                    boxprops=dict(facecolor='w', linewidth=0.8),
                    whiskerprops=dict(linewidth=0.8),
                    flierprops=dict(markersize=2, markeredgewidth=0.8))
    h1 = ax.boxplot(list(synthetic_rej_rate[mask,:,0]),
                    positions=np.array(R_plot)-2, widths=2.5, patch_artist=True,
                    boxprops=dict(facecolor='w', linewidth=0.8, edgecolor='gray'),
                    whiskerprops=dict(linewidth=0.8, color='gray'),
                    flierprops=dict(markersize=1, markeredgewidth=0.8, markeredgecolor='gray'),
                    capprops=dict(color='gray'), medianprops=dict(color='tab:blue'))
    ax.axhline(0.05, color='k', linewidth=0.5)
    ax.set_xlabel('Disk radius $R$ (km)')
    ax.set_ylim(0,ax.get_ylim()[1])
    # The best selection:
    axtw = ax.twinx()
    h2 = axtw.plot(R_plot, 100*np.array(res), marker='s', markeredgecolor='none',linestyle=':', color='gray',
                   linewidth=0.8, markersize=2)
    ax.set_zorder(1)
    ax.patch.set_visible(False)
    axtw.set_ylabel('$\langle 100 / \\mathrm{\# disks}\\rangle$', fontsize=8, labelpad=2)
    axtw.tick_params(axis='y', which='major', pad=1.5)
    ax.set_yticks((0.0, 0.05, 0.2, 0.4, 0.6),
                  labels=("0", "5", "20", "40", "60"),
                  fontsize=8)
    xticks = [60,80,100,120,140,160,190,220,260]
    ax.set_xticks(xticks, labels=(str(x) for x in xticks), fontsize=6)
    ax.tick_params(axis='y', which='major', pad=0.5)
    ax.set_ylabel('Average rejection rate (%)', fontsize=8, labelpad=2)
    ax.set_title('Kolmogorov-Smirnov test', fontsize=10)
    axtw.set_ylim(0,100*ax.get_ylim()[1])
    ymin,ymax = axtw.get_ylim()
    axtw.add_patch(MPolygon([(76,ymin), (104,ymin), (104,ymax), (76,ymax)],
                            color='antiquewhite', zorder=0))

    ax.legend(handles=((h0["boxes"][0], h0["medians"][0]),
                       (h1["boxes"][0], h1["medians"][0]),
                       h2[0]),
              labels=('NGHF RGRDCs',
                      "Synthetic Gamma\nwith 'A' quality error",
                      "Inverse average number\nof disks in RDRDC"),
              fontsize='small')

    #
    # Axis 2
    #
    ax = fig.add_axes((0.575, 0.16, 0.36, 0.75))
    y = []
    c = []
    for i,r in enumerate(R_set):
        if r in skip:
            continue
        yi = []
        ni = 0
        for j in range(REPETITIONS):
            nj = region_count[i,j]
            if nj > 0:
                yi.append(rej_rate[i,j,1])
                ni += nj
                ri.append([1.0 / nj])
        y.append(yi)
        c.append(ni)
    h0 = ax.boxplot(y, positions=np.array(R_plot)+2, widths=2.5, patch_artist=True,
                    boxprops=dict(facecolor='w', linewidth=0.8),
                    whiskerprops=dict(linewidth=0.8),
                    flierprops=dict(markersize=2, markeredgewidth=0.8))
    h1 = ax.boxplot(list(synthetic_rej_rate[mask,:,1]),
                    positions=np.array(R_plot)-2, widths=2.5, patch_artist=True,
                    boxprops=dict(facecolor='w', linewidth=0.8, edgecolor='gray'),
                    whiskerprops=dict(linewidth=0.8, color='gray'),
                    flierprops=dict(markersize=1, markeredgewidth=0.8, markeredgecolor='gray'),
                    capprops=dict(color='gray'), medianprops=dict(color='tab:blue'))
    ax.axhline(0.05, color='k', linewidth=0.5)
    ax.set_xlabel('Disk radius $R$ (km)')
    ax.set_ylim(0,ax.get_ylim()[1])
    axtw = ax.twinx()
    ax.set_zorder(1)
    ax.patch.set_visible(False)
    h2 = axtw.plot(R_plot, 100 / (np.array(c) / REPETITIONS), marker='x',
                   linestyle=':', color='tab:red', linewidth=0.8, markersize=2)
    # The best selection:
    axtw.set_ylim(0, 100*ax.get_ylim()[1])
    ymin,ymax = axtw.get_ylim()
    axtw.add_patch(MPolygon([(76,ymin), (104,ymin), (104,ymax), (76,ymax)],
                            color='antiquewhite', zorder=0))
    axtw.set_ylabel('$\langle 100 / \\mathrm{\# disks}\\rangle$', fontsize=8, labelpad=2)
    axtw.tick_params(axis='y', which='major', pad=1.5)
    ax.set_yticks((0.0, 0.05, 0.2, 0.4, 0.6),
                  labels=("0", "5", "20", "40", "60"),
                  fontsize=8)
    xticks = [60,80,100,120,140,160,190,220,260]
    ax.set_xticks(xticks, labels=(str(x) for x in xticks), fontsize=6)
    ax.tick_params(axis='y', which='major', pad=0.5)
    ax.set_ylabel('Average rejection rate (%)', fontsize=8, labelpad=2)
    ax.set_title('Anderson-Darling test', fontsize=10);

    ax.legend(handles=((h0["boxes"][0], h0["medians"][0]),
                       (h1["boxes"][0], h1["medians"][0]),
                       h2[0]),
              labels=('NGHF RGRDCs',
                      "Synthetic Gamma\nwith 'A' quality error",
                      "Inverse average number\nof disks in RDRDC"),
              fontsize='small')

    fig.savefig('figures/A2-Rejection-Rates-KS-AD-by-R-real-vs-synthetic.pdf')

## Synthetic Rejection Rate for Two Gamma Mixture
Now we investigate what happens if heat flow were gamma distributed but within a region,
mixtures of two different regimes (i.e. gamma distributions) might occur.

The size distribution as a function of $R$:

In [None]:
size_distribution = {}
for r in R_set:
    sizer = [len(dist) for dists in dist_db[r] for dist in dists]
    size_distribution[r] = sizer

The parameterization of the gamma mixture distributions:

In [None]:
def density_a(x):
    return gamma_pdf(x, 50, 1.0)

mixture_prob = 0.2
mixture_x0   = 25.0
mixture_x1   = 50.0
mixture_s0   = 5.0
mixture_s1   = 7.0
mixture_k0 = 25.0
mixture_t0 = 1.0
mixture_k1 = 50.0
mixture_t1 = 1.0

mixture2_prob = 0.4
mixture2_k0 = 120.0
mixture2_t0 = 0.57
mixture2_k1 = 50
mixture2_t1 = 1.0

def density_b(x):
    return mixture_prob*gamma_pdf(x, mixture_k0, mixture_t0) \
            + (1.0 - mixture_prob) * gamma_pdf(x, mixture_k1, mixture_t1)

def density_c(x):
    return mixture2_prob*gamma_pdf(x, mixture2_k0, mixture2_t0) \
            + (1.0 - mixture2_prob) * gamma_pdf(x, mixture2_k1, mixture2_t1)
    

def generate_sample_a(size,rng):
    return rng.gamma(50.0, size=size)


def generate_sample_mixture(size, rng, prob, k0, t0, k1, t1):
    mix = rng.random(size) <= prob
    n0 = np.count_nonzero(mix)
    n1 = size - n0
    s = np.empty(size)
    s[mix]  = t0 * rng.gamma(k0, size=n0)
    s[~mix] = t1 * rng.gamma(k1, size=n1)
    return s

Now repeat the previous analyses for synthetic data from these gamma mixture distributions:

In [None]:
def evaluate_mixture_rejection_power(repetitions, R_set, size_distribution_list, seed,
                                     mix0_prob, mix0_k0, mix0_t0, mix0_k1, mix0_t1,
                                     mix1_prob, mix1_k0, mix1_t0, mix1_k1, mix1_t1,
                                     ADG_json_str):
    """
    Compute the power of the Anderson-Darling test against
    """
    ADG = AndersonDarlingTable.from_json(json.loads(ADG_json_str))
    perc = 0
    rng = np.random.default_rng(seed)
    power = {}
    for j,r in enumerate(R_set):
        powerr = np.zeros(3)
        sizes = rng.choice(size_distribution_list[j], size=repetitions)
        for i in range(repetitions):
            if j * repetitions + i > 0.01 * perc * repetitions * len(R_set):
                print(" ",perc,"% of samples generated")
                perc += 5
            sai = generate_sample_a(sizes[i], rng)
            sbi = generate_sample_mixture(sizes[i], rng, mix0_prob, mix0_k0, mix0_t0, mix0_k1, mix0_t1)
            sci = generate_sample_mixture(sizes[i], rng, mix1_prob, mix1_k0, mix1_t0, mix1_k1, mix1_t1)
            powerr[0] += ADG.test_reject(sai)
            try:
                powerr[1] += ADG.test_reject(sbi)
            except:
                print("sbi:", sbi.size)
                print(sbi)
                print("gamma:", gamma_mle(sbi, kmin=1.0))
                raise RuntimeError()
            powerr[2] += ADG.test_reject(sci)
        powerr /= repetitions
        power[r] = powerr
    return power

In [None]:
repetitions = 10000
power = cached_call(evaluate_mixture_rejection_power, repetitions, R_set,
                    [np.array(size_distribution[r]) for r in R_set], 8938,
                    mixture_prob, mixture_k0, mixture_t0, mixture_k1, mixture_t1,
                    mixture2_prob, mixture2_k0, mixture2_t0, mixture2_k1, mixture2_t1,
                    json.dumps(ADG.to_json()))

In [None]:
from matplotlib.lines import Line2D

In [None]:
with plt.rc_context({'axes.labelpad': 2.5, 'xtick.major.pad': 1.2, 'ytick.major.pad': 1.2,
                     'xtick.labelsize' : 6, 'ytick.labelsize' : 6, 'axes.labelsize' : 8}):
    fig = plt.figure(figsize=(5.63250, 2.5), dpi=300)
    #ax_bg = fig.add_axes((0,0,1,1))

    color0 = 'tab:blue'
    color1 = 'tab:orange'

    ax = fig.add_axes((0.083, 0.13, 0.5, 0.8))
    #ax.plot(R_set, [power[r][0] for r in R_set])
    h0 = ax.plot(R_set, [power[r][1] for r in R_set], marker='.', color=color0, linewidth=1)
    h1 = ax.plot(R_set, [power[r][2] for r in R_set], marker='.', color=color1, linewidth=1)
    mix = 0.28
    ax.set_ylim(0.0, ax.get_ylim()[1])
    h2 = ax.plot(R_set, mix*np.array([power[r][1] for r in R_set]) + (1.0 - mix)*0.05,
                 color=color0, linestyle='--', linewidth=0.8)
    ax.boxplot(list(rej_rate[:,:,1]),
               positions=np.array(R_set), widths=5, patch_artist=True,
               boxprops=dict(facecolor='w', linewidth=0.8, edgecolor='k'),
               whiskerprops=dict(linewidth=0.8, color='k'),
               flierprops=dict(markersize=1, markeredgewidth=0.8, markeredgecolor='k'),
               capprops=dict(color='k'), medianprops=dict(color='tab:blue'))

    ax.axhline(0.05, color='k', linewidth=0.8)
    ax.set_xlabel("Disk radius $R$ (km)")
    ax.set_ylabel('Anderson-Darling rejection rate', fontsize='small')
    xticks = [60,80,100,120,140,160,190,220,260]
    ax.set_xticks(xticks, labels=(str(x) for x in xticks))
    ax.set_yticks(ax.get_yticks(), labels=(str(round(x,1)) for x in ax.get_yticks()))
    ax.xaxis.set_minor_locator(FixedLocator([70,90]))
    ax.patch.set_facecolor('none')
    ax.set_facecolor('none')
    ax.legend(handles=(h0[0],h1[0],h2[0]),
              labels=('Γ mix 0',
                      'Γ mix 1',
                      f'{int(100*mix)} % Γ mix 0\n+ {int(100*(1-mix))} % pure Γ'),
              fontsize=8,
              loc=('center right'),
              bbox_to_anchor=(0.5, 0.25, 0.5, 0.75))


    xplot = np.linspace(0,110,200)[1:]
    ax = fig.add_axes((0.66, 0.13, 0.33, 0.8))
    h0 = ax.plot(xplot, 100*density_b(xplot), color=color0)
    ax.plot(xplot, 100*mixture_prob * gamma_pdf(xplot, mixture_k0, mixture_t0), linestyle=':', color=color0,
            linewidth=0.8)
    ax.plot(xplot, 100*(1-mixture_prob) * gamma_pdf(xplot, mixture_k1, mixture_t1), linestyle=':', color=color0,
            linewidth=0.8)

    h1 = ax.plot(xplot, 100*density_c(xplot), color=color1)
    ax.plot(xplot, 100*mixture2_prob * gamma_pdf(xplot, mixture2_k0, mixture2_t0), linestyle=':', color=color1,
            linewidth=0.8)
    ax.plot(xplot, 100*(1-mixture2_prob) * gamma_pdf(xplot, mixture2_k1, mixture2_t1), linestyle=':', color=color1,
            linewidth=0.8)
    ax.set_xlabel('Heat flow ($\mathrm{mW}\mathrm{m}^{-2}$)');
    ax.set_ylabel('Density ($10^{-2}\,\mathrm{m^2}\mathrm{mW}^{-1}$)');
    ax.legend(handles=(h0[0],h1[0],Line2D([], [], color='k', linestyle=':', linewidth=0.8)),
              labels=('Γ mix 0',
                      'Γ mix 1',
                      f'Components'),
              fontsize=6)
    fig.savefig('figures/A2-Gamma-Mix-AD-Rejections.pdf')

### References
> Stephens, M. A. (1980). "Tests based on EDF statistics". In: Stephens, M. A. \& D'Agostino, R. B. *Goodness-of-Fit Technieques*. Marcehl Dekker, Inc.; New York.

### License
```
A notebook to evaluate the goodness-of-fit and its R-dependence
of the gamma distribution model for regional aggregate heat flow
distributions.

This file is part of the REHEATFUNQ model.

Author: Malte J. Ziebarth (ziebarth@gfz-potsdam.de)

Copyright © 2019-2022 Deutsches GeoForschungsZentrum Potsdam,
            2022 Malte J. Ziebarth
            

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.
```