[DOC] Incorrect docstring for `boxcox_normmax` argument `method="mle"`, should be "maximizes the log-likelihood" #18748

fkiraly · 2023-06-26T00:39:05Z

The docstring of boxcox_normmax seems incorrect: in

Lines 1115 to 1280 in 694a357

    
           def boxcox_normmax(x, brack=None, method='pearsonr', optimizer=None): 
        
               """Compute optimal Box-Cox transform parameter for input data. 
        
               Parameters 
        
               ---------- 
        
               x : array_like 
        
                   Input array. 
        
               brack : 2-tuple, optional, default (-2.0, 2.0) 
        
                    The starting interval for a downhill bracket search for the default 
        
                    `optimize.brent` solver. Note that this is in most cases not 
        
                    critical; the final result is allowed to be outside this bracket. 
        
                    If `optimizer` is passed, `brack` must be None. 
        
               method : str, optional 
        
                   The method to determine the optimal transform parameter (`boxcox` 
        
                   ``lmbda`` parameter). Options are: 
        
                   'pearsonr'  (default) 
        
                       Maximizes the Pearson correlation coefficient between 
        
                       ``y = boxcox(x)`` and the expected values for ``y`` if `x` would be 
        
                       normally-distributed. 
        
                   'mle' 
        
                       Minimizes the log-likelihood `boxcox_llf`.  This is the method used 
        
                       in `boxcox`. 
        
                   'all' 
        
                       Use all optimization methods available, and return all results. 
        
                       Useful to compare different methods. 
        
               optimizer : callable, optional 
        
                   `optimizer` is a callable that accepts one argument: 
        
                   fun : callable 
        
                       The objective function to be optimized. `fun` accepts one argument, 
        
                       the Box-Cox transform parameter `lmbda`, and returns the negative 
        
                       log-likelihood function at the provided value. The job of `optimizer` 
        
                       is to find the value of `lmbda` that minimizes `fun`. 
        
                   and returns an object, such as an instance of 
        
                   `scipy.optimize.OptimizeResult`, which holds the optimal value of 
        
                   `lmbda` in an attribute `x`. 
        
                   See the example below or the documentation of 
        
                   `scipy.optimize.minimize_scalar` for more information. 
        
               Returns 
        
               ------- 
        
               maxlog : float or ndarray 
        
                   The optimal transform parameter found.  An array instead of a scalar 
        
                   for ``method='all'``. 
        
               See Also 
        
               -------- 
        
               boxcox, boxcox_llf, boxcox_normplot, scipy.optimize.minimize_scalar 
        
               Examples 
        
               -------- 
        
               >>> import numpy as np 
        
               >>> from scipy import stats 
        
               >>> import matplotlib.pyplot as plt 
        
               We can generate some data and determine the optimal ``lmbda`` in various 
        
               ways: 
        
               >>> rng = np.random.default_rng() 
        
               >>> x = stats.loggamma.rvs(5, size=30, random_state=rng) + 5 
        
               >>> y, lmax_mle = stats.boxcox(x) 
        
               >>> lmax_pearsonr = stats.boxcox_normmax(x) 
        
               >>> lmax_mle 
        
               2.217563431465757 
        
               >>> lmax_pearsonr 
        
               2.238318660200961 
        
               >>> stats.boxcox_normmax(x, method='all') 
        
               array([2.23831866, 2.21756343]) 
        
               >>> fig = plt.figure() 
        
               >>> ax = fig.add_subplot(111) 
        
               >>> prob = stats.boxcox_normplot(x, -10, 10, plot=ax) 
        
               >>> ax.axvline(lmax_mle, color='r') 
        
               >>> ax.axvline(lmax_pearsonr, color='g', ls='--') 
        
               >>> plt.show() 
        
               Alternatively, we can define our own `optimizer` function. Suppose we 
        
               are only interested in values of `lmbda` on the interval [6, 7], we 
        
               want to use `scipy.optimize.minimize_scalar` with ``method='bounded'``, 
        
               and we want to use tighter tolerances when optimizing the log-likelihood 
        
               function. To do this, we define a function that accepts positional argument 
        
               `fun` and uses `scipy.optimize.minimize_scalar` to minimize `fun` subject 
        
               to the provided bounds and tolerances: 
        
               >>> from scipy import optimize 
        
               >>> options = {'xatol': 1e-12}  # absolute tolerance on `x` 
        
               >>> def optimizer(fun): 
        
               ...     return optimize.minimize_scalar(fun, bounds=(6, 7), 
        
               ...                                     method="bounded", options=options) 
        
               >>> stats.boxcox_normmax(x, optimizer=optimizer) 
        
               6.000... 
        
               """ 
        
               # If optimizer is not given, define default 'brent' optimizer. 
        
               if optimizer is None: 
        
                   # Set default value for `brack`. 
        
                   if brack is None: 
        
                       brack = (-2.0, 2.0) 
        
                   def _optimizer(func, args): 
        
                       return optimize.brent(func, args=args, brack=brack) 
        
               # Otherwise check optimizer. 
        
               else: 
        
                   if not callable(optimizer): 
        
                       raise ValueError("`optimizer` must be a callable") 
        
                   if brack is not None: 
        
                       raise ValueError("`brack` must be None if `optimizer` is given") 
        
                   # `optimizer` is expected to return a `OptimizeResult` object, we here 
        
                   # get the solution to the optimization problem. 
        
                   def _optimizer(func, args): 
        
                       def func_wrapped(x): 
        
                           return func(x, *args) 
        
                       return getattr(optimizer(func_wrapped), 'x', None) 
        
               def _pearsonr(x): 
        
                   osm_uniform = _calc_uniform_order_statistic_medians(len(x)) 
        
                   xvals = distributions.norm.ppf(osm_uniform) 
        
                   def _eval_pearsonr(lmbda, xvals, samps): 
        
                       # This function computes the x-axis values of the probability plot 
        
                       # and computes a linear regression (including the correlation) and 
        
                       # returns ``1 - r`` so that a minimization function maximizes the 
        
                       # correlation. 
        
                       y = boxcox(samps, lmbda) 
        
                       yvals = np.sort(y) 
        
                       r, prob = _stats_py.pearsonr(xvals, yvals) 
        
                       return 1 - r 
        
                   return _optimizer(_eval_pearsonr, args=(xvals, x)) 
        
               def _mle(x): 
        
                   def _eval_mle(lmb, data): 
        
                       # function to minimize 
        
                       return -boxcox_llf(lmb, data) 
        
                   return _optimizer(_eval_mle, args=(x,)) 
        
               def _all(x): 
        
                   maxlog = np.empty(2, dtype=float) 
        
                   maxlog[0] = _pearsonr(x) 
        
                   maxlog[1] = _mle(x) 
        
                   return maxlog 
        
               methods = {'pearsonr': _pearsonr, 
        
                          'mle': _mle, 
        
                          'all': _all} 
        
               if method not in methods.keys(): 
        
                   raise ValueError("Method %s not recognized." % method) 
        
               optimfunc = methods[method] 
        
               res = optimfunc(x) 
        
               if res is None: 
        
                   message = ("`optimizer` must return an object containing the optimal " 
        
                              "`lmbda` in attribute `x`") 
        
                   raise ValueError(message) 
        
               return res

I think it should say "maximizes the log-likelihood", instead of

        'mle'
            Minimizes the log-likelihood `boxcox_llf`.  This is the method used
            in `boxcox`.

(internally, it minimizes, but the negative log-likelihood)

The text was updated successfully, but these errors were encountered:

mdhaber · 2023-06-26T02:37:49Z

Thanks. I'll fix that tonight.

mdhaber added scipy.stats Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org labels Jun 26, 2023

mdhaber mentioned this issue Jun 26, 2023

DOC: stats.boxcox_normmax: correct minimize -> maximize #18756

Merged

tupui closed this as completed in #18756 Jun 26, 2023

j-bowhay added this to the 1.12.0 milestone Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] Incorrect docstring for `boxcox_normmax` argument `method="mle"`, should be "maximizes the log-likelihood" #18748

[DOC] Incorrect docstring for `boxcox_normmax` argument `method="mle"`, should be "maximizes the log-likelihood" #18748

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023

[DOC] Incorrect docstring for boxcox_normmax argument method="mle", should be "maximizes the log-likelihood" #18748

[DOC] Incorrect docstring for boxcox_normmax argument method="mle", should be "maximizes the log-likelihood" #18748

Comments

fkiraly commented Jun 26, 2023 • edited

mdhaber commented Jun 26, 2023

[DOC] Incorrect docstring for `boxcox_normmax` argument `method="mle"`, should be "maximizes the log-likelihood" #18748

[DOC] Incorrect docstring for `boxcox_normmax` argument `method="mle"`, should be "maximizes the log-likelihood" #18748

fkiraly commented Jun 26, 2023 •

edited