<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

In [None]:
from scipy.stats.sampling import DiscreteGuideTable
import numpy as np

To create a random number generator using a probability vector, use:


In [None]:
pv = [0.1, 0.3, 0.6]
urng = np.random.default_rng()
rng = DiscreteGuideTable(pv, random_state=urng)

The RNG has been setup. Now, we can now use the `rvs` method to
generate samples from the distribution:


In [None]:
rvs = rng.rvs(size=1000)

To verify that the random variates follow the given distribution, we can
use the chi-squared test (as a measure of goodness-of-fit):


In [None]:
from scipy.stats import chisquare
_, freqs = np.unique(rvs, return_counts=True)
freqs = freqs / np.sum(freqs)
freqs

array([0.092, 0.355, 0.553])

In [None]:
chisquare(freqs, pv).pvalue

0.9987382966178464

As the p-value is very high, we fail to reject the null hypothesis that
the observed frequencies are the same as the expected frequencies. Hence,
we can safely assume that the variates have been generated from the given
distribution. Note that this just gives the correctness of the algorithm
and not the quality of the samples.

If a PV is not available, an instance of a class with a PMF method and a
finite domain can also be passed.


In [None]:
urng = np.random.default_rng()
from scipy.stats import binom
n, p = 10, 0.2
dist = binom(n, p)
rng = DiscreteGuideTable(dist, random_state=urng)

Now, we can sample from the distribution using the `rvs` method
and also measure the goodness-of-fit of the samples:


In [None]:
rvs = rng.rvs(1000)
_, freqs = np.unique(rvs, return_counts=True)
freqs = freqs / np.sum(freqs)
obs_freqs = np.zeros(11)  # some frequencies may be zero.
obs_freqs[:freqs.size] = freqs
pv = [dist.pmf(i) for i in range(0, 11)]
pv = np.asarray(pv) / np.sum(pv)
chisquare(obs_freqs, pv).pvalue

0.9999999999999989

To check that the samples have been drawn from the correct distribution,
we can visualize the histogram of the samples:


In [None]:
import matplotlib.pyplot as plt
rvs = rng.rvs(1000)
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.arange(0, n+1)
fx = dist.pmf(x)
fx = fx / fx.sum()
ax.plot(x, fx, 'bo', label='true distribution')
ax.vlines(x, 0, fx, lw=2)
ax.hist(rvs, bins=np.r_[x, n+1]-0.5, density=True, alpha=0.5,
        color='r', label='samples')
ax.set_xlabel('x')
ax.set_ylabel('PMF(x)')
ax.set_title('Discrete Guide Table Samples')
plt.legend()
plt.show()

To set the size of the guide table use the `guide_factor` keyword argument.
This sets the size of the guide table relative to the probability vector


In [None]:
rng = DiscreteGuideTable(pv, guide_factor=1, random_state=urng)

To calculate the PPF of a binomial distribution with $n=4$ and
$p=0.1$: we can set up a guide table as follows:


In [None]:
n, p = 4, 0.1
dist = binom(n, p)
rng = DiscreteGuideTable(dist, random_state=42)
rng.ppf(0.5)

0.0