<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

Suppose we wish to fit a distribution to the following data.


In [None]:
import numpy as np
from scipy import stats
rng = np.random.default_rng()
dist = stats.nbinom
shapes = (5, 0.5)
data = dist.rvs(*shapes, size=1000, random_state=rng)

Suppose we do not know how the data were generated, but we suspect that
it follows a negative binomial distribution with parameters *n* and *p*\.
(See `scipy.stats.nbinom`.) We believe that the parameter *n* was fewer
than 30, and we know that the parameter *p* must lie on the interval
[0, 1]. We record this information in a variable `bounds` and pass
this information to `fit`.


In [None]:
bounds = [(0, 30), (0, 1)]
res = stats.fit(dist, data, bounds)

`fit` searches within the user-specified `bounds` for the
values that best match the data (in the sense of maximum likelihood
estimation). In this case, it found shape values similar to those
from which the data were actually generated.


In [None]:
res.params

FitParams(n=5.0, p=0.5028157644634368, loc=0.0)  # may vary

We can visualize the results by superposing the probability mass function
of the distribution (with the shapes fit to the data) over a normalized
histogram of the data.


In [None]:
import matplotlib.pyplot as plt  # matplotlib must be installed to plot
res.plot()
plt.show()

Note that the estimate for *n* was exactly integral; this is because
the domain of the `nbinom` PMF includes only integral *n*, and the `nbinom`
object "knows" that. `nbinom` also knows that the shape *p* must be a
value between 0 and 1. In such a case - when the domain of the distribution
with respect to a parameter is finite - we are not required to specify
bounds for the parameter.


In [None]:
bounds = {'n': (0, 30)}  # omit parameter p using a `dict`
res2 = stats.fit(dist, data, bounds)
res2.params

FitParams(n=5.0, p=0.5016492009232932, loc=0.0)  # may vary

If we wish to force the distribution to be fit with *n* fixed at 6, we can
set both the lower and upper bounds on *n* to 6. Note, however, that the
value of the objective function being optimized is typically worse (higher)
in this case.


In [None]:
bounds = {'n': (6, 6)}  # fix parameter `n`
res3 = stats.fit(dist, data, bounds)
res3.params

FitParams(n=6.0, p=0.5486556076755706, loc=0.0)  # may vary

In [None]:
res3.nllf() > res.nllf()

True  # may vary

Note that the numerical results of the previous examples are typical, but
they may vary because the default optimizer used by `fit`,
`scipy.optimize.differential_evolution`, is stochastic. However, we can
customize the settings used by the optimizer to ensure reproducibility -
or even use a different optimizer entirely - using the `optimizer`
parameter.


In [None]:
from scipy.optimize import differential_evolution
rng = np.random.default_rng()
def optimizer(fun, bounds, *, integrality):
    return differential_evolution(fun, bounds, strategy='best2bin',
                                  seed=rng, integrality=integrality)
bounds = [(0, 30), (0, 1)]
res4 = stats.fit(dist, data, bounds, optimizer=optimizer)
res4.params

FitParams(n=5.0, p=0.5015183149259951, loc=0.0)