BUG (?): box-cox related `BracketError`-s downstream in `sktime` #18761

fkiraly · 2023-06-26T21:38:33Z

Describe your issue.

Since 1.11.0, the BoxCoxTransformer in sktime is failing with a BracketError, which ultimately comes from optimize.brent and optimize.fminbound.

We would appreciate help with diagnosing the issue, as none of the more obvious fixes have helped.

Further details:

the original code was a copy of boxcox_normmax in scipy. Replacing it with current boxcox_normmax and/or using custom optimize.minimize_scalar does not fix the error, see [BUG] fix BoxCoxTransformer failure after scipy 1.11.0 sktime/sktime#4770.
the code works without any problems on < 1.11.0
specifying an initial bracket that should theoretically work (e.g., (1.0, 2.0)) does not seem to help either.

PS: the docstrings in the current main version were not too clear about the "should", i.e., when should scipy be called and when a de-novo implementation. The PR sktime/sktime#4770 also fixes that.

Reproducing Code Example

With sktime 0.20.0 or earlier versions or current main, and scipy 1.11.0:

from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.utils._testing.panel import _make_panel_X

X = _make_panel_X(
    n_instances=7, n_columns=1, n_timepoints=10, random_state=42
)

est = BoxCoxTransformer()

est.fit(X)

sktime bug report: sktime/sktime#4769

Error message

BracketError: The algorithm terminated without finding a valid bracket. Consider trying different initial points.

Full traceback:

---------------------------------------------------------------------------
BracketError                              Traceback (most recent call last)
Cell In[2], line 10
      4 X = _make_panel_X(
      5     n_instances=7, n_columns=1, n_timepoints=10, random_state=42
      6 )
      8 est = BoxCoxTransformer()
---> 10 est.fit(X)

File C:\Workspace\sktime\sktime\transformations\base.py:439, in BaseTransformer.fit(self, X, y)
    436     self._fit(X=X_inner, y=y_inner)
    437 else:
    438     # otherwise we call the vectorized version of fit
--> 439     self._vectorize("fit", X=X_inner, y=y_inner)
    441 # this should happen last: fitted state is set to True
    442 self._is_fitted = True

File C:\Workspace\sktime\sktime\transformations\base.py:1205, in BaseTransformer._vectorize(self, methodname, **kwargs)
   1202     else:
   1203         transformers_ = self.transformers_
-> 1205     self.transformers_ = X.vectorize_est(
   1206         transformers_, method=methodname, **kwargs
   1207     )
   1208     return self
   1210 if methodname in TRAFO_METHODS:
   1211     # loop through fitted transformers one-by-one, and transform series/panels

File C:\Workspace\sktime\sktime\datatypes\_vectorize.py:584, in VectorizedDF.vectorize_est(self, estimator, method, args, args_rowvec, return_type, rowname_default, colname_default, varname_of_self, **kwargs)
    581     args_i[varname_of_self] = group
    583 est_i_method = getattr(est_i, method)
--> 584 est_i_result = est_i_method(**args_i)
    586 if group_name is None:
    587     group_name = rowname_default

File C:\Workspace\sktime\sktime\transformations\base.py:436, in BaseTransformer.fit(self, X, y)
    434 # we call the ordinary _fit if no looping/vectorization needed
    435 if not vectorization_needed:
--> 436     self._fit(X=X_inner, y=y_inner)
    437 else:
    438     # otherwise we call the vectorized version of fit
    439     self._vectorize("fit", X=X_inner, y=y_inner)

File C:\Workspace\sktime\sktime\transformations\series\boxcox.py:156, in BoxCoxTransformer._fit(self, X, y)
    154 X = X.flatten()
    155 if self.method != "guerrero":
--> 156     self.lambda_ = _boxcox_normmax(X, bounds=self.bounds, method=self.method)
    157 else:
    158     self.lambda_ = _guerrero(X, self.sp, self.bounds)

File C:\Workspace\sktime\sktime\transformations\series\boxcox.py:377, in _boxcox_normmax(x, bounds, brack, method)
    374     raise ValueError("Method %s not recognized." % method)
    376 optimfunc = methods[method]
--> 377 return optimfunc(x)

File C:\Workspace\sktime\sktime\transformations\series\boxcox.py:364, in _boxcox_normmax.._mle(x)
    360 def _eval_mle(lmb, data):
    361     # function to minimize
    362     return -boxcox_llf(lmb, data)
--> 364 return optimizer(_eval_mle, args=(x,))

File C:\Workspace\sktime\sktime\transformations\series\boxcox.py:322, in _make_boxcox_optimizer..optimizer(func, args)
    321 def optimizer(func, args):
--> 322     return optimize.brent(func, brack=brack, args=args)

File c:\ProgramData\Anaconda3\envs\sktime-skbase-311\Lib\site-packages\scipy\optimize\_optimize.py:2641, in brent(func, args, brack, tol, full_output, maxiter)
   2569 """
   2570 Given a function of one variable and a possible bracket, return
   2571 a local minimizer of the function isolated to a fractional precision
   (...)
   2637 
   2638 """
   2639 options = {'xtol': tol,
   2640            'maxiter': maxiter}
-> 2641 res = _minimize_scalar_brent(func, brack, args, **options)
   2642 if full_output:
   2643     return res['x'], res['fun'], res['nit'], res['nfev']

File c:\ProgramData\Anaconda3\envs\sktime-skbase-311\Lib\site-packages\scipy\optimize\_optimize.py:2678, in _minimize_scalar_brent(func, brack, args, xtol, maxiter, disp, **unknown_options)
   2675 brent = Brent(func=func, args=args, tol=tol,
   2676               full_output=True, maxiter=maxiter, disp=disp)
   2677 brent.set_bracket(brack)
-> 2678 brent.optimize()
   2679 x, fval, nit, nfev = brent.get_result(full_output=True)
   2681 success = nit < maxiter and not (np.isnan(x) or np.isnan(fval))

File c:\ProgramData\Anaconda3\envs\sktime-skbase-311\Lib\site-packages\scipy\optimize\_optimize.py:2448, in Brent.optimize(self)
   2445 def optimize(self):
   2446     # set up for optimization
   2447     func = self.func
-> 2448     xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
   2449     _mintol = self._mintol
   2450     _cg = self._cg

File c:\ProgramData\Anaconda3\envs\sktime-skbase-311\Lib\site-packages\scipy\optimize\_optimize.py:2417, in Brent.get_bracket_info(self)
   2415     xa, xb, xc, fa, fb, fc, funcalls = bracket(func, args=args)
   2416 elif len(brack) == 2:
-> 2417     xa, xb, xc, fa, fb, fc, funcalls = bracket(func, xa=brack[0],
   2418                                                xb=brack[1], args=args)
   2419 elif len(brack) == 3:
   2420     xa, xb, xc = brack

File c:\ProgramData\Anaconda3\envs\sktime-skbase-311\Lib\site-packages\scipy\optimize\_optimize.py:3047, in bracket(func, xa, xb, args, grow_limit, maxiter)
   3045     e = BracketError(msg)
   3046     e.data = (xa, xb, xc, fa, fb, fc, funcalls)
-> 3047     raise e
   3049 return xa, xb, xc, fa, fb, fc, funcalls

BracketError: The algorithm terminated without finding a valid bracket. Consider trying different initial points.

SciPy/NumPy/Python version and system information

1.11.0 1.25.0 sys.version_info(major=3, minor=11, micro=3, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /c/opt/64/include
    lib directory: /c/opt/64/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SKYLAKEX MAX_THREADS=2
    pc file directory: c:/opt/64/lib/pkgconfig
    version: 0.3.21.dev
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /c/opt/64/include
    lib directory: /c/opt/64/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SKYLAKEX MAX_THREADS=2
    pc file directory: c:/opt/64/lib/pkgconfig
    version: 0.3.21.dev
  pybind11:
    detection method: config-tool
    include directory: unknown
    name: pybind11
    version: 2.10.4
Compilers:
  c:
    commands: cc
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  c++:
    commands: c++
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 0.29.35
  fortran:
    commands: gfortran
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  pythran:
    include directory: C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-12ywd98f\overlay\Lib\site-packages/pythran
    version: 0.13.1
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: windows
  cross-compiled: false
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: windows
Python Information:
  path: C:\Users\runneradmin\AppData\Local\Temp\cibw-run-uqg6lkbl\cp311-win_amd64\build\venv\Scripts\python.exe
  version: '3.11'

The text was updated successfully, but these errors were encountered:

fkiraly · 2023-06-26T21:45:10Z

@mdhaber, not sure whether the root of the problem is in stats, in optimize, or in sktime proper...

The upstream change condition is fully localized within optimize afaik, not stats.

mdhaber · 2023-06-26T21:55:05Z

Oops, right. You wrote:

specifying an initial bracket that should theoretically work (e.g., (1.0, 2.0)) does not seem to help either.

Do you mean that

these are valid bounds for the parameter that is being optimized (i.e. the minimizer is between 1 and 2) or
that a downhill bracket search that begins with (1, 2) should find a three-point scalar minimization bracket?

(My guess is that you mean 1; the SciPy documentation was confused about this point before. In the stack trace I see _boxcox_normmax(X, bounds=self.bounds, method=self.method), suggesting that the bounds for the optimal lmda are known, but then brent gets called with a bracket argument. bounds and bracket are not the same, and often bounds are not good values for bracket.)

Do you know what the optimal value of the parameter (lmda) was in previous versions?

fkiraly · 2023-06-26T22:05:50Z

@mdhaber, no, I mean a bracket (in the sense of 2), not bounds.

You can see me grappling with the confusion in this post: sktime/sktime#4770 (comment)

The lambda_ found pre-0.11.0 for all seven instances is 8.47. To see this, execute est.get_fitted_params() after the MRE on scipy<0.11.0.

mdhaber · 2023-06-26T22:08:15Z

OK. The change in behavior is due to #17704.

Is is possible for you to post the values of the arguments that _boxcox_normmax is being called with?

fkiraly · 2023-06-26T22:09:21Z

Do you mean the current scipy boxcox_normmax in the attempted fix-PR, or the old/frozen _boxcox_normmax that is on sktime main?

(both result in the same error, albeit with a different traceback in sktime)

mdhaber · 2023-06-26T22:12:19Z

Ultimately what I want to do locally is replicate the call to brent. I suspect that the bracket search always failed silently in the past, just returning an invalid bracket, and I want to know what brent did in that case. It sounds like it also returned without error, but that doesn't necessarily mean that it returned a valid minimizer since the bracket was invalid. So if you can send what I would need to replicate the call to brent (either before or after your attempted fix, or both), that would be even better.

fkiraly · 2023-06-26T22:17:42Z

uh-oh, I just spotted that the array passed contains negative values, which could be upsetting the box-cox transform. Passing positive values does not cause the error. Still I wonder why it did not result in an exception, pre-1.11.0.

Arguments of boxcox_normmax for the failing call:

import numpy as np
x = np.array([0.24835708, -0.06913215, 0.32384427, 0.76151493, -0.11707669, -0.11706848, 0.78960641, 0.38371736, -0.23473719, 0.27128002])
method = "mle"
brack = None

mdhaber · 2023-06-26T22:23:04Z

Sorry, is that the call to SciPy's boxcox_normmax or sktime's _boxcox_normmax? Passing brack=None would do different things depending on which, it seems.

Yeah, with that data, scipy.stats.boxcox_normax would have always raised ValueError: array must not contain infs or NaNs before it even gets to brent.

mdhaber · 2023-06-26T22:31:16Z

Ok here is what I have:

import numpy as np
from scipy import stats, optimize
x = np.array([0.24835708, -0.06913215, 0.32384427, 0.76151493, -0.11707669, -0.11706848, 0.78960641, 0.38371736, -0.23473719, 0.27128002])

from scipy.stats import boxcox_llf
def _eval_mle(lmb, data):
    # function to minimize
    return -boxcox_llf(lmb, data)

optimize.brent(_eval_mle, brack=(-2, 2), args=(x,))  # 8.472135811722177

fkiraly · 2023-06-26T22:33:13Z

Well, I get BracketError: The algorithm terminated without finding a valid bracket. Consider trying different initial points. on the same code as above. So does the sktime CI (which runs on all python versions 3.8 - 3.11, and three operating systems)

fkiraly · 2023-06-26T22:34:32Z

Yeah, with that data, scipy.stats.boxcox_normax would have always raised ValueError: array must not contain infs or NaNs before it even gets to brent.

I think it doesn't, because there are no infs or nans - only negative values. Just tested it, I think it doesn't in any of the recent scipy versions.

mdhaber · 2023-06-26T22:36:35Z

Right, in SciPy 1.11.0 you get that. I mean previously.

I'm just checking that I have the call to brent correct, since it's returning the optimal lmda you told me. But take a look:

lmbda = optimize.brent(_eval_mle, brack=(-2, 2), args=(x,))  # 8.472135811722177
_eval_mle(lmbda, x)  # nan

So it looks like that "optimal" lmbda was garbage before. Now you get an error and it's because a bracket couldn't be found to start minimization.

fkiraly · 2023-06-26T22:38:34Z

yes, the garbage is due to negative values in the input.

The reason that this was not spotted is that this just was test data in tests for interface conformance, return type and such - the resulting lambda is not tested for sensibility but only for the right type.

Now, here's the prize question: why did it produce 8.47 in the first place, making it look like non-garbage?

mdhaber · 2023-06-26T22:39:07Z

I think it doesn't, because there are no infs or nans - only negative values. Just tested it, I think it doesn't in any of the recent scipy versions.

Negative values cause NaNs in boxcox_llf.

from scipy import stats
x = np.array([0.24835708, -0.06913215, 0.32384427, 0.76151493, -0.11707669, -0.11706848, 0.78960641, 0.38371736, -0.23473719, 0.27128002])
stats.boxcox_normmax(x)
# ValueError: array must not contain infs or NaNs
# because
stats.boxcox_llf(8.47, x)  # nan

fkiraly · 2023-06-26T22:42:10Z

ah, yes, it does

mdhaber · 2023-06-26T22:43:11Z

Now, here's the prize question: why did it produce 8.47 in the first place, making it look like non-garbage?

Haha I have so many good answers for this : ) but I'll keep it tame: because the bug reported by gh-14858 was not fixed until gh-17704.

Does this resolve the issue? Thanks for the fast iteration.

fkiraly · 2023-06-26T22:46:28Z

Yes, I think it does, we will have to see whether enforcing positive values fixes this in all occurrences throughout the tests.
Thanks for the swift iteration!

Here is my attempt at a fix: sktime/sktime#4770

Btw, some feedback from this: the error message is not very user friendly - instead of complaining about the intermediate result (there are infs and nans! but there aren't in any user-visible object), boxcox_normmax should complain about non-positive values imo.

fkiraly · 2023-06-26T22:53:08Z

Haha I have so many good answers for this : ) but I'll keep it tame: because the bug reported by #14858 was not fixed until #17704.

I can't really tell whether this qualifies you for the prize, but have one nonetheless 🏆

(it will be sent by mail and will arrive in approximately 2 years)

mdhaber · 2023-06-26T22:55:06Z

boxcox_normmax should complain about non-positive values IMO.

Yes. That could be fixed by catching and re-raising. Care to submit a PR? If not, I can do that for you.

fkiraly · 2023-06-26T23:06:57Z

Sure, why not: #18764

fkiraly · 2023-06-27T00:47:46Z

@mdhaber, now this is curious - in the CI of this #18764, the error is again BracketError rather than ValueError - despite passing negative values and NaN. I'm slightly confused.

mdhaber · 2023-06-27T01:00:35Z

What happens locally?

mdhaber · 2023-06-27T01:25:14Z

Locally, with two elements in bad_x, I get a BracketError, and with more, I get a ValueError. It comes down to different behaviors in pearsonr.

from scipy import stats
x = [1, 2]
y = [0, np.nan]
stats.pearsonr(x, y)  # PearsonRResult(statistic=nan, pvalue=1.0)

as it should, if it's consistent with most other stats functions, whereas

from scipy import stats
x = [1, 2, 3]
y = [0, np.nan, 1]
stats.pearsonr(x, y)  # ValueError: array must not contain infs or NaNs

With your test, there are two elements. pearsonr doesn't raise, so you get the BracketError when bracket finds that the algorithm didn't produce a valid bracket. So you could change your test to have three elements in bad_x, and it would work. That's what I'd suggest that you do. As for making pearsonr more consistent, that is covered by gh-14651, and will be fixed in due time.

fkiraly · 2023-06-27T09:32:02Z

Ok, @mdhaber, so can you confirm your suggestion: I should just make the negative examples length 3?

fkiraly · 2023-06-30T08:18:15Z

happy for this to be closed, as this turned out not to be bug in scipy, nor a bug in the sktime BoxCoxTransformer but a problem in the sktime tests masked by another unfixed bug in scipy (right)?

mdhaber · 2023-06-30T13:32:18Z

Yup, I think that's right. When the bug in SciPy was fixed, it revealed the issue in the test input. Thanks @fkiraly!

Fixes #4769 by using `scipy`'s on-board box-cox-fitter for methods `"mle"` and `"pearsonr"` and enforcing positive values. The reason for the failure was use of negative values in some interface test cases, which previously would produce garbage results but `scipy` would not complain. As of 1.11.0, it complains (albeit with a confusing error message that makes it hard to track down that it is in fact from negative values in some test cases). See further discussion here: scipy/scipy#18761 This is now solved by introducing an `enforce_positive` argument to `BoxCoxTransformer`, which ensures positive values before fitting. This can be turned off to obtain the previous behaviour, but for positive values it will be the same (although with an additional `np.sign` and `np.abs` calculation that should be fast). Also makes the following changes, while we're cleaning this up: * The `sktime` box-cox-fitter in `BoxCoxTransformer` was not using `scipy`'s since the latter did not have settable bounds when the former was originally implemented. I have hence replaced it with `scipy`'s own box-cox-fitter and using a bounded optimizer if bounds are passed, from `scipy` 1.7.0 on where it exists in the form we need. Pre-1.7.0, I've left the old behaviour. * improved docstring - general improvements, removed some wrong math statements, made clear what the choice of method means and when it interfaces `scipy` * added a `"fixed"` method, which allows to use the transformer with a fixed parameter - or tune it via grid search. * input checks on `method` Note: the fix is behaviour changing, but imo does not require deprecation because it changes some behaviour from buggy/misleading (only in the sub-case where negative values were passed) to sensible.

fkiraly added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Jun 26, 2023

mdhaber added the scipy.stats label Jun 26, 2023

mdhaber added scipy.optimize and removed scipy.stats labels Jun 26, 2023

fkiraly mentioned this issue Jun 26, 2023

[BUG] fix BoxCoxTransformer failure after scipy 1.11.0 sktime/sktime#4770

Merged

fkiraly mentioned this issue Jun 26, 2023

ENH: stats: add informative error message to boxcox_normmax for non-positive input #18764

Merged

mdhaber closed this as completed Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG (?): box-cox related `BracketError`-s downstream in `sktime` #18761

BUG (?): box-cox related `BracketError`-s downstream in `sktime` #18761

fkiraly commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023

mdhaber commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023

fkiraly commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023

mdhaber commented Jun 26, 2023

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023

fkiraly commented Jun 26, 2023

mdhaber commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023

fkiraly commented Jun 26, 2023

fkiraly commented Jun 27, 2023

mdhaber commented Jun 27, 2023

mdhaber commented Jun 27, 2023

fkiraly commented Jun 27, 2023 •

edited

fkiraly commented Jun 30, 2023 •

edited

mdhaber commented Jun 30, 2023

BUG (?): box-cox related BracketError-s downstream in sktime #18761

BUG (?): box-cox related BracketError-s downstream in sktime #18761

Comments

fkiraly commented Jun 26, 2023 • edited

Describe your issue.

Reproducing Code Example

Error message

SciPy/NumPy/Python version and system information

fkiraly commented Jun 26, 2023 • edited

mdhaber commented Jun 26, 2023 • edited

fkiraly commented Jun 26, 2023

mdhaber commented Jun 26, 2023 • edited

fkiraly commented Jun 26, 2023 • edited

mdhaber commented Jun 26, 2023 • edited

fkiraly commented Jun 26, 2023 • edited

mdhaber commented Jun 26, 2023 • edited

mdhaber commented Jun 26, 2023

fkiraly commented Jun 26, 2023 • edited

fkiraly commented Jun 26, 2023

mdhaber commented Jun 26, 2023

fkiraly commented Jun 26, 2023 • edited

mdhaber commented Jun 26, 2023

fkiraly commented Jun 26, 2023

mdhaber commented Jun 26, 2023 • edited

fkiraly commented Jun 26, 2023 • edited

fkiraly commented Jun 26, 2023 • edited

mdhaber commented Jun 26, 2023

fkiraly commented Jun 26, 2023

fkiraly commented Jun 27, 2023

mdhaber commented Jun 27, 2023

mdhaber commented Jun 27, 2023

fkiraly commented Jun 27, 2023 • edited

fkiraly commented Jun 30, 2023 • edited

mdhaber commented Jun 30, 2023

BUG (?): box-cox related `BracketError`-s downstream in `sktime` #18761

BUG (?): box-cox related `BracketError`-s downstream in `sktime` #18761

fkiraly commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

mdhaber commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

fkiraly commented Jun 26, 2023 •

edited

fkiraly commented Jun 27, 2023 •

edited

fkiraly commented Jun 30, 2023 •

edited