Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Incorrect docstring for boxcox_normmax argument method="mle", should be "maximizes the log-likelihood" #18748

Closed
fkiraly opened this issue Jun 26, 2023 · 1 comment · Fixed by #18756
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Milestone

Comments

@fkiraly
Copy link
Contributor

fkiraly commented Jun 26, 2023

The docstring of boxcox_normmax seems incorrect: in

def boxcox_normmax(x, brack=None, method='pearsonr', optimizer=None):
"""Compute optimal Box-Cox transform parameter for input data.
Parameters
----------
x : array_like
Input array.
brack : 2-tuple, optional, default (-2.0, 2.0)
The starting interval for a downhill bracket search for the default
`optimize.brent` solver. Note that this is in most cases not
critical; the final result is allowed to be outside this bracket.
If `optimizer` is passed, `brack` must be None.
method : str, optional
The method to determine the optimal transform parameter (`boxcox`
``lmbda`` parameter). Options are:
'pearsonr' (default)
Maximizes the Pearson correlation coefficient between
``y = boxcox(x)`` and the expected values for ``y`` if `x` would be
normally-distributed.
'mle'
Minimizes the log-likelihood `boxcox_llf`. This is the method used
in `boxcox`.
'all'
Use all optimization methods available, and return all results.
Useful to compare different methods.
optimizer : callable, optional
`optimizer` is a callable that accepts one argument:
fun : callable
The objective function to be optimized. `fun` accepts one argument,
the Box-Cox transform parameter `lmbda`, and returns the negative
log-likelihood function at the provided value. The job of `optimizer`
is to find the value of `lmbda` that minimizes `fun`.
and returns an object, such as an instance of
`scipy.optimize.OptimizeResult`, which holds the optimal value of
`lmbda` in an attribute `x`.
See the example below or the documentation of
`scipy.optimize.minimize_scalar` for more information.
Returns
-------
maxlog : float or ndarray
The optimal transform parameter found. An array instead of a scalar
for ``method='all'``.
See Also
--------
boxcox, boxcox_llf, boxcox_normplot, scipy.optimize.minimize_scalar
Examples
--------
>>> import numpy as np
>>> from scipy import stats
>>> import matplotlib.pyplot as plt
We can generate some data and determine the optimal ``lmbda`` in various
ways:
>>> rng = np.random.default_rng()
>>> x = stats.loggamma.rvs(5, size=30, random_state=rng) + 5
>>> y, lmax_mle = stats.boxcox(x)
>>> lmax_pearsonr = stats.boxcox_normmax(x)
>>> lmax_mle
2.217563431465757
>>> lmax_pearsonr
2.238318660200961
>>> stats.boxcox_normmax(x, method='all')
array([2.23831866, 2.21756343])
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> prob = stats.boxcox_normplot(x, -10, 10, plot=ax)
>>> ax.axvline(lmax_mle, color='r')
>>> ax.axvline(lmax_pearsonr, color='g', ls='--')
>>> plt.show()
Alternatively, we can define our own `optimizer` function. Suppose we
are only interested in values of `lmbda` on the interval [6, 7], we
want to use `scipy.optimize.minimize_scalar` with ``method='bounded'``,
and we want to use tighter tolerances when optimizing the log-likelihood
function. To do this, we define a function that accepts positional argument
`fun` and uses `scipy.optimize.minimize_scalar` to minimize `fun` subject
to the provided bounds and tolerances:
>>> from scipy import optimize
>>> options = {'xatol': 1e-12} # absolute tolerance on `x`
>>> def optimizer(fun):
... return optimize.minimize_scalar(fun, bounds=(6, 7),
... method="bounded", options=options)
>>> stats.boxcox_normmax(x, optimizer=optimizer)
6.000...
"""
# If optimizer is not given, define default 'brent' optimizer.
if optimizer is None:
# Set default value for `brack`.
if brack is None:
brack = (-2.0, 2.0)
def _optimizer(func, args):
return optimize.brent(func, args=args, brack=brack)
# Otherwise check optimizer.
else:
if not callable(optimizer):
raise ValueError("`optimizer` must be a callable")
if brack is not None:
raise ValueError("`brack` must be None if `optimizer` is given")
# `optimizer` is expected to return a `OptimizeResult` object, we here
# get the solution to the optimization problem.
def _optimizer(func, args):
def func_wrapped(x):
return func(x, *args)
return getattr(optimizer(func_wrapped), 'x', None)
def _pearsonr(x):
osm_uniform = _calc_uniform_order_statistic_medians(len(x))
xvals = distributions.norm.ppf(osm_uniform)
def _eval_pearsonr(lmbda, xvals, samps):
# This function computes the x-axis values of the probability plot
# and computes a linear regression (including the correlation) and
# returns ``1 - r`` so that a minimization function maximizes the
# correlation.
y = boxcox(samps, lmbda)
yvals = np.sort(y)
r, prob = _stats_py.pearsonr(xvals, yvals)
return 1 - r
return _optimizer(_eval_pearsonr, args=(xvals, x))
def _mle(x):
def _eval_mle(lmb, data):
# function to minimize
return -boxcox_llf(lmb, data)
return _optimizer(_eval_mle, args=(x,))
def _all(x):
maxlog = np.empty(2, dtype=float)
maxlog[0] = _pearsonr(x)
maxlog[1] = _mle(x)
return maxlog
methods = {'pearsonr': _pearsonr,
'mle': _mle,
'all': _all}
if method not in methods.keys():
raise ValueError("Method %s not recognized." % method)
optimfunc = methods[method]
res = optimfunc(x)
if res is None:
message = ("`optimizer` must return an object containing the optimal "
"`lmbda` in attribute `x`")
raise ValueError(message)
return res

I think it should say "maximizes the log-likelihood", instead of

        'mle'
            Minimizes the log-likelihood `boxcox_llf`.  This is the method used
            in `boxcox`.

(internally, it minimizes, but the negative log-likelihood)

@mdhaber mdhaber added scipy.stats Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org labels Jun 26, 2023
@mdhaber
Copy link
Contributor

mdhaber commented Jun 26, 2023

Thanks. I'll fix that tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants