<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

A well-known test of the null hypothesis that data were drawn from a
given distribution is the Kolmogorov-Smirnov (KS) test, available in SciPy
as `scipy.stats.ks_1samp`. Suppose we wish to test whether the following
data:


In [None]:
import numpy as np
from scipy import stats
rng = np.random.default_rng()
x = stats.uniform.rvs(size=75, random_state=rng)

were sampled from a normal distribution. To perform a KS test, the
empirical distribution function of the observed data will be compared
against the (theoretical) cumulative distribution function of a normal
distribution. Of course, to do this, the normal distribution under the null
hypothesis must be fully specified. This is commonly done by first fitting
the ``loc`` and ``scale`` parameters of the distribution to the observed
data, then performing the test.


In [None]:
loc, scale = np.mean(x), np.std(x, ddof=1)
cdf = stats.norm(loc, scale).cdf
stats.ks_1samp(x, cdf)

KstestResult(statistic=0.1119257570456813,
             pvalue=0.2827756409939257,
             statistic_location=0.7751845155861765,
             statistic_sign=-1)

An advantage of the KS-test is that the p-value - the probability of
obtaining a value of the test statistic under the null hypothesis as
extreme as the value obtained from the observed data - can be calculated
exactly and efficiently. `goodness_of_fit` can only approximate these
results.


In [None]:
known_params = {'loc': loc, 'scale': scale}
res = stats.goodness_of_fit(stats.norm, x, known_params=known_params,
                            statistic='ks', random_state=rng)
res.statistic, res.pvalue

(0.1119257570456813, 0.2788)

The statistic matches exactly, but the p-value is estimated by forming
a "Monte Carlo null distribution", that is, by explicitly drawing random
samples from `scipy.stats.norm` with the provided parameters and
calculating the stastic for each. The fraction of these statistic values
at least as extreme as ``res.statistic`` approximates the exact p-value
calculated by `scipy.stats.ks_1samp`.

However, in many cases, we would prefer to test only that the data were
sampled from one of *any* member of the normal distribution family, not
specifically from the normal distribution with the location and scale
fitted to the observed sample. In this case, Lilliefors [6] argued that
the KS test is far too conservative (that is, the p-value overstates
the actual probability of rejecting a true null hypothesis) and thus lacks
power - the ability to reject the null hypothesis when the null hypothesis
is actually false.
Indeed, our p-value above is approximately 0.28, which is far too large
to reject the null hypothesis at any common significance level.

Consider why this might be. Note that in the KS test above, the statistic
always compares data against the CDF of a normal distribution fitted to the
*observed data*. This tends to reduce the value of the statistic for the
observed data, but it is "unfair" when computing the statistic for other
samples, such as those we randomly draw to form the Monte Carlo null
distribution. It is easy to correct for this: whenever we compute the KS
statistic of a sample, we use the CDF of a normal distribution fitted
to *that sample*. The null distribution in this case has not been
calculated exactly and is tyically approximated using Monte Carlo methods
as described above. This is where `goodness_of_fit` excels.


In [None]:
res = stats.goodness_of_fit(stats.norm, x, statistic='ks',
                            random_state=rng)
res.statistic, res.pvalue

(0.1119257570456813, 0.0196)

Indeed, this p-value is much smaller, and small enough to (correctly)
reject the null hypothesis at common significance levels, including 5% and
2.5%.

However, the KS statistic is not very sensitive to all deviations from
normality. The original advantage of the KS statistic was the ability
to compute the null distribution theoretically, but a more sensitive
statistic - resulting in a higher test power - can be used now that we can
approximate the null distribution
computationally. The Anderson-Darling statistic [1] tends to be more
sensitive, and critical values of the this statistic have been tabulated
for various significance levels and sample sizes using Monte Carlo methods.


In [None]:
res = stats.anderson(x, 'norm')
print(res.statistic)

1.2139573337497467

In [None]:
print(res.critical_values)

[0.549 0.625 0.75  0.875 1.041]

In [None]:
print(res.significance_level)

[15.  10.   5.   2.5  1. ]

Here, the observed value of the statistic exceeds the critical value
corresponding with a 1% significance level. This tells us that the p-value
of the observed data is less than 1%, but what is it? We could interpolate
from these (already-interpolated) values, but `goodness_of_fit` can
estimate it directly.


In [None]:
res = stats.goodness_of_fit(stats.norm, x, statistic='ad',
                            random_state=rng)
res.statistic, res.pvalue

(1.2139573337497467, 0.0034)

A further advantage is that use of `goodness_of_fit` is not limited to
a particular set of distributions or conditions on which parameters
are known versus which must be estimated from data. Instead,
`goodness_of_fit` can estimate p-values relatively quickly for any
distribution with a sufficiently fast and reliable ``fit`` method. For
instance, here we perform a goodness of fit test using the Cramer-von Mises
statistic against the Rayleigh distribution with known location and unknown
scale.


In [None]:
rng = np.random.default_rng()
x = stats.chi(df=2.2, loc=0, scale=2).rvs(size=1000, random_state=rng)
res = stats.goodness_of_fit(stats.rayleigh, x, statistic='cvm',
                            known_params={'loc': 0}, random_state=rng)

This executes fairly quickly, but to check the reliability of the ``fit``
method, we should inspect the fit result.


In [None]:
res.fit_result  # location is as specified, and scale is reasonable

  params: FitParams(loc=0.0, scale=2.1026719844231243)
 success: True
 message: 'The fit was performed successfully.'

In [None]:
import matplotlib.pyplot as plt  # matplotlib must be installed to plot
res.fit_result.plot()
plt.show()

If the distribution is not fit to the observed data as well as possible,
the test may not control the type I error rate, that is, the chance of
rejecting the null hypothesis even when it is true.

We should also look for extreme outliers in the null distribution that
may be caused by unreliable fitting. These do not necessarily invalidate
the result, but they tend to reduce the test's power.


In [None]:
_, ax = plt.subplots()
ax.hist(np.log10(res.null_distribution))
ax.set_xlabel("log10 of CVM statistic under the null hypothesis")
ax.set_ylabel("Frequency")
ax.set_title("Histogram of the Monte Carlo null distribution")
plt.show()

This plot seems reassuring.

If ``fit`` method is working reliably, and if the distribution of the test
statistic is not particularly sensitive to the values of the fitted
parameters, then the p-value provided by `goodness_of_fit` is expected to
be a good approximation.


In [None]:
res.statistic, res.pvalue

(0.2231991510248692, 0.0525)