<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

Suppose we wish to test whether two samples are drawn from the same
distribution. Assume that the underlying distributions are unknown to us,
and that before observing the data, we hypothesized that the mean of the
first sample would be less than that of the second sample. We decide that
we will use the difference between the sample means as a test statistic,
and we will consider a p-value of 0.05 to be statistically significant.

For efficiency, we write the function defining the test statistic in a
vectorized fashion: the samples ``x`` and ``y`` can be ND arrays, and the
statistic will be calculated for each axis-slice along `axis`.


In [None]:
import numpy as np
def statistic(x, y, axis):
    return np.mean(x, axis=axis) - np.mean(y, axis=axis)

After collecting our data, we calculate the observed value of the test
statistic.


In [None]:
from scipy.stats import norm
rng = np.random.default_rng()
x = norm.rvs(size=5, random_state=rng)
y = norm.rvs(size=6, loc = 3, random_state=rng)
statistic(x, y, 0)

-3.5411688580987266

Indeed, the test statistic is negative, suggesting that the true mean of
the distribution underlying ``x`` is less than that of the distribution
underlying ``y``. To determine the probability of this occurring by chance
if the two samples were drawn from the same distribution, we perform
a permutation test.


In [None]:
from scipy.stats import permutation_test
# because our statistic is vectorized, we pass `vectorized=True`
# `n_resamples=np.inf` indicates that an exact test is to be performed
res = permutation_test((x, y), statistic, vectorized=True,
                       n_resamples=np.inf, alternative='less')
print(res.statistic)

-3.5411688580987266

In [None]:
print(res.pvalue)

0.004329004329004329

The probability of obtaining a test statistic less than or equal to the
observed value under the null hypothesis is 0.4329%. This is less than our
chosen threshold of 5%, so we consider this to be significant evidence
against the null hypothesis in favor of the alternative.

Because the size of the samples above was small, `permutation_test` could
perform an exact test. For larger samples, we resort to a randomized
permutation test.


In [None]:
x = norm.rvs(size=100, random_state=rng)
y = norm.rvs(size=120, loc=0.2, random_state=rng)
res = permutation_test((x, y), statistic, n_resamples=9999,
                       vectorized=True, alternative='less',
                       random_state=rng)
print(res.statistic)

-0.4230459671240913

In [None]:
print(res.pvalue)

0.0015

The approximate probability of obtaining a test statistic less than or
equal to the observed value under the null hypothesis is 0.0225%. This is
again less than our chosen threshold of 5%, so again we have significant
evidence to reject the null hypothesis in favor of the alternative.

For large samples and number of permutations, the result is comparable to
that of the corresponding asymptotic test, the independent sample t-test.


In [None]:
from scipy.stats import ttest_ind
res_asymptotic = ttest_ind(x, y, alternative='less')
print(res_asymptotic.pvalue)

0.0014669545224902675

The permutation distribution of the test statistic is provided for
further investigation.


In [None]:
import matplotlib.pyplot as plt
plt.hist(res.null_distribution, bins=50)
plt.title("Permutation distribution of test statistic")
plt.xlabel("Value of Statistic")
plt.ylabel("Frequency")
plt.show()

Inspection of the null distribution is essential if the statistic suffers
from inaccuracy due to limited machine precision. Consider the following
case:


In [None]:
from scipy.stats import pearsonr
x = [1, 2, 4, 3]
y = [2, 4, 6, 8]
def statistic(x, y, axis=-1):
    return pearsonr(x, y, axis=axis).statistic
res = permutation_test((x, y), statistic, vectorized=True,
                       permutation_type='pairings',
                       alternative='greater')
r, pvalue, null = res.statistic, res.pvalue, res.null_distribution

In this case, some elements of the null distribution differ from the
observed value of the correlation coefficient ``r`` due to numerical noise.
We manually inspect the elements of the null distribution that are nearly
the same as the observed value of the test statistic.


In [None]:
r

0.7999999999999999

In [None]:
unique = np.unique(null)
unique

array([-1. , -1. , -0.8, -0.8, -0.8, -0.6, -0.4, -0.4, -0.2, -0.2, -0.2,
    0. ,  0.2,  0.2,  0.2,  0.4,  0.4,  0.6,  0.8,  0.8,  0.8,  1. ,
    1. ])  # may vary

In [None]:
unique[np.isclose(r, unique)].tolist()

[0.7999999999999998, 0.7999999999999999, 0.8]  # may vary

If `permutation_test` were to perform the comparison naively, the
elements of the null distribution with value ``0.7999999999999998`` would
not be considered as extreme or more extreme as the observed value of the
statistic, so the calculated p-value would be too small.


In [None]:
incorrect_pvalue = np.count_nonzero(null >= r) / len(null)
incorrect_pvalue

0.14583333333333334  # may vary

Instead, `permutation_test` treats elements of the null distribution that
are within ``max(1e-14, abs(r)*1e-14)`` of the observed value of the
statistic ``r`` to be equal to ``r``.


In [None]:
correct_pvalue = np.count_nonzero(null >= r - 1e-14) / len(null)
correct_pvalue

0.16666666666666666

In [None]:
res.pvalue == correct_pvalue

True

This method of comparison is expected to be accurate in most practical
situations, but the user is advised to assess this by inspecting the
elements of the null distribution that are close to the observed value
of the statistic. Also, consider the use of statistics that can be
calculated using exact arithmetic (e.g. integer statistics).
