BUG: fix negative overflow in stats.boxcox_normmax #19691

xuefeng-xu · 2023-12-13T14:06:04Z

Reference issue

Follow up #19604 (The prior fix positive overflow, this fix negative overflow)
Towards #19016

What does this implement/fix?

Negative overflow #19631 (comment)

There are two cases to check, see #19631 (comment) for more detail.

When xmax > 1 and lmb > 1, check if BC(xmax) > ymax
When xmin < 1 and lmb < 0, check if BC(xmin) < -ymax

Additional information

The second check is improved from xmin < 1 and lmb < 1 to xmin < 1 and lmb < 0
See #19631 (comment) for explanation and

from scipy.special import boxcox
print(boxcox(np.finfo(np.float64).smallest_normal, 0))
# -708.3964185322641

mdhaber · 2023-12-13T16:07:53Z

@xuefeng-xu can you explain the changes after the first commit? What cases did it not consider?

xuefeng-xu · 2023-12-13T16:55:38Z

The following mainly use the fact that the transformation function is increasing in both lmb and x

The possible positive overflow case is when xmax > 1 and lmb > 1
The other cases are impossible to cause positive overflow

xmax < 1, BC(xmax) < BC(lmb, x=1) = 0 (the upper bound is 0)
xmax > 1 and lmb < 1: BC(xmax) < BC(lmb=1, xmax) = xmax-1 (the upper bound is xmax-1)
The output is finite in these cases since the input x must be finite.

The possible negtive overflow case is when xmin < 1 and lmb < 0
The other cases are impossible to cause negative overflow

xmin > 1, BC(xmin) > BC(lmb, x=1) = 0 (the lower bound is 0)
xmin < 1 and lmb > 0, BC(xmin) > BC(lmb=0, xmin) = log xmin (the lower bound is log xmin)
Let xmin be the smallest positive representable value, we can see the minimum value is finite (approximatly -709).

xuefeng-xu · 2023-12-13T17:05:46Z

Actually, I want to improve the negative overflow case from xmin < 1 and lmb < 0 to xmin < 1 and lmb < -1
Since BC(xmin) > BC(lmb=-1, xmin) = 1-1/xmin, the lower bound is 1-1/xmin.

But because smallest_normal is not actually the smallest positive representable value in a NumPy floating point type, it might still cause negative overflow.

print(np.finfo(np.float64).smallest_normal) # 2.2250738585072014e-308
print(boxcox(np.finfo(np.float64).smallest_normal, -1)) # -4.4942328371556665e+307
print(boxcox(1e-323,-1)) # -inf

mdhaber · 2023-12-13T17:13:49Z

Thanks, but that says what you did, whereas I'm asking about the difference between what you did and what I had. For instance, is there an example case that your commits fixed that f393477 did not? Also, why move the test case from the class that tests boxcox_normmax, where these changes are actually made, to the class that tests boxcox?

xuefeng-xu · 2023-12-13T17:21:14Z

is there an example case that your commits fixed that f393477 did not?

No. The first commit consider the following two cases, I just narrow the scope.

When lmb > 0, check if BC(xmax) > ymax
When lmb < 0, check if BC(xmin) < -ymax

why move the test case from the class that tests boxcox_normmax, where these changes are actually made, to the class that tests boxcox?

Previously in #19604, I added the test in for boxcox. I will move this test to boxcox_normmax.

mdhaber · 2023-12-13T17:25:28Z

I just narrow the scope.

Do you mean it's more efficient (e.g. it doesn't perform an expensive calculation if it doesn't need to)? Before I review your code in detail, I need to better understand the motivation for the extra complexity. I try to avoid quadruply-nested if statements when possible. If there was a four-line change that fixed the bug, I need to understand why the 20-line change is better before I dig into it.

xuefeng-xu · 2023-12-13T17:34:59Z

it doesn't perform expensive calculations if it doesn't need to

Yes, that is my intention. Although it seems that the calculation might not be expensive.

mdhaber · 2023-12-13T17:53:41Z

Yes, that is my intention. Although it seems that the calculation might not be expensive.

I see. So you were using theory to avoid actually performing a boxcox transformation when you could tell just from the data that the boxcox transformation would not overflow.

Now that I understand the intent, I could take the time to check that this implementation is indeed faster, verify the mathematical argument for avoiding the direct overflow check, and then I could review the code to see that it correctly implements the math. But I think there is a balance to be struck between code performance, code complexity, and review time. If you'd like to go this route, can you demonstrate that the performance improvement is worth that extra work?

Considering that the function also performs a comparatively expensive optimization, my guess is that these extra checks to avoid a call to boxcox with scalar arguments turns will not noticably reduce execution time. If that's correct, I would prefer to go with a simpler solution that I have already verified.

xuefeng-xu · 2023-12-13T18:03:38Z

Thanks, I'll also do some checks. And I'm happy to switch to the simpler version if there's no noticable performance improvement.

xuefeng-xu · 2023-12-18T05:57:08Z

Hi @mdhaber, I've simplified the code. The main difference from f393477 is to check the two conditions that may cause overflow.

sign_lmbm1 = np.sign(res - 1)
x_treme = np.max(x) if np.any(sign_lmbm1 > 0) else np.min(x)

# There are two conditions of overflow to check
# 1. x>1, lmb>1; 2. x<1, lmb<1
mask = False
if np.any((x_treme - 1) * sign_lmbm1 > 0):
    mask = special.boxcox(x_treme, res) * sign_lmbm1 > ymax

For the negtive overflow case, I didn't use the previous proprosed condition (x<1, lmd<0) because it will introduce code complexity.

mdhaber · 2023-12-18T06:06:48Z

I see, but again, if f393477 worked and has already been reviewed by me, you need to explain why / demonstrate that the modifications are beneficial. We seem to be having trouble communicating since we've gone back and forth a few times on this point. I'll try a few different ways of phrasing my request. Answers to any of these questions might get at what I'm looking for.

What motivated you to change f393477? Why did that not solve the problem completely?
You said: "I'm happy to switch to the simpler version if there's no noticable performance improvement." So is there a notable performance improvement? (If so, show me, please!)
In your opinion, is 4c0c0c8 simpler, faster, more readable, or (insert good quality here) than f393477? How do these changes improve what was there?

Thanks!

xuefeng-xu · 2023-12-18T06:51:45Z

What motivated you to change f393477?

One advantage I think is that the overflow analysis is base on theory, i.e. the choice of x_treme and lmb is based on the lemma of Box-Cox function. First, check if lmb is large than 1 instead of 0 to decide positive or negative overflow (see #19631 (comment) lemma 2). Second, the if statement (x_treme - 1) * sign_lmbm1 > 0 didn't bring much complexity in the code.

Moreover, it can also applied in YJ with only little modifications. Since the YJ transformation function is more complicate. If we can leverage some nice properties, the analysis would be much eaiser.

if np.any(x_treme * sign_lmbm1 > 0): # 1. x>0, lmb>1; 2. x<0, lmb<1
    mask = _yeojohnson_transform(x_treme, res) * sign_lmbm1 > ymax

So is there a notable performance improvement?

I didn't find a notable speed improvement.

mdhaber · 2023-12-18T08:33:43Z

the overflow analysis is base on theory

Both are based on theory. I think that a consequence of lemma 5 is that the maximum value of $y$ with negative $\lambda$ occurs when $x$ is as large as possible and $\lambda \approx 0$. Likewise, the minimum value of $y$ with positive $\lambda$ occurs when $x$ is as small as possible and $\lambda \approx 0$.

import special
special.boxcox(1e300, -1e-300) # ~690
special.boxcox(1e-300, 1e-300) # ~-690

Neither of these are close to overflowing. Consequently, there cannot be positive overflow when $\lambda < 0$, and there cannot be negative overflow when $\lambda > 0$. This is why f393477 used the sign of lmbda.

Moreover, it can also applied in YJ with only little modifications.

We're trying backport a patch into 1.12. Since we're short on time, I want to do the simplest thing possible that is correct rather than exploring lots of options. Does that make sense?

In the interest of quickly backporting this fix before release of 1.12, please use f393477 if it is correct, making only necessary changes (like adjusting the warning). If it's correct, it seems only natural to do so, since it existed first, the diff is minimal, and i have already reviewed it. We can adapt to the case of user-specified ymax and harmonize with YJ later.

xuefeng-xu · 2023-12-18T08:55:19Z

Ok, let's get this move forward. I have reverted to f393477 and only updated the warning messge.

mdhaber · 2023-12-19T00:59:35Z

Thanks @xuefeng-xu for the improvements! If there is a better way to perform the checks, you can change it (with objective justification) in gh-19631, but it's good to keep this simple for the backport. I'll just run the Cirrus checks and squash merge when they pass.

* MAINT: stats.boxcox_normmax: avoid negative overflow Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

mdhaber and others added 3 commits December 8, 2023 10:48

MAINT: stats.boxcox_normmax: avoid negative overflow

f393477

BUG: fix negative overflow in stats.boxcox_normmax

fcf6ac3

fix lint issue

9cd247e

xuefeng-xu mentioned this pull request Dec 13, 2023

ENH: add parameter ymax in stats.boxcox_normmax #19631

Merged

lucascolley added maintenance Items related to regular maintenance tasks scipy.stats labels Dec 13, 2023

lucascolley requested a review from mdhaber December 13, 2023 14:26

xuefeng-xu added 4 commits December 15, 2023 11:40

move test to boxcox_normmax

0780fe8

simplify code

9d46c6c

fix bug and add comment

1f0dec7

fix bug and update test

4c0c0c8

revert to f393477

6ec9057

fix error

939d0b1

mdhaber added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Dec 19, 2023

mdhaber added this to the 1.12.0 milestone Dec 19, 2023

mdhaber merged commit a725b5a into scipy:main Dec 19, 2023
29 checks passed

tylerjereddy pushed a commit to tylerjereddy/scipy that referenced this pull request Dec 19, 2023

BUG: fix negative overflow in stats.boxcox_normmax (scipy#19691)

c0555e1

* MAINT: stats.boxcox_normmax: avoid negative overflow Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

tylerjereddy mentioned this pull request Dec 19, 2023

MAINT: version bounds for 1.12.0rc1 #19670

Merged

tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix negative overflow in stats.boxcox_normmax #19691

BUG: fix negative overflow in stats.boxcox_normmax #19691

xuefeng-xu commented Dec 13, 2023 •

edited

mdhaber commented Dec 13, 2023 •

edited

xuefeng-xu commented Dec 13, 2023

xuefeng-xu commented Dec 13, 2023

mdhaber commented Dec 13, 2023 •

edited

xuefeng-xu commented Dec 13, 2023 •

edited

mdhaber commented Dec 13, 2023 •

edited

xuefeng-xu commented Dec 13, 2023

mdhaber commented Dec 13, 2023 •

edited

xuefeng-xu commented Dec 13, 2023

xuefeng-xu commented Dec 18, 2023

mdhaber commented Dec 18, 2023 •

edited

xuefeng-xu commented Dec 18, 2023 •

edited

mdhaber commented Dec 18, 2023 •

edited

xuefeng-xu commented Dec 18, 2023

mdhaber commented Dec 19, 2023 •

edited

BUG: fix negative overflow in stats.boxcox_normmax #19691

BUG: fix negative overflow in stats.boxcox_normmax #19691

Conversation

xuefeng-xu commented Dec 13, 2023 • edited

Reference issue

What does this implement/fix?

Additional information

mdhaber commented Dec 13, 2023 • edited

xuefeng-xu commented Dec 13, 2023

xuefeng-xu commented Dec 13, 2023

mdhaber commented Dec 13, 2023 • edited

xuefeng-xu commented Dec 13, 2023 • edited

mdhaber commented Dec 13, 2023 • edited

xuefeng-xu commented Dec 13, 2023

mdhaber commented Dec 13, 2023 • edited

xuefeng-xu commented Dec 13, 2023

xuefeng-xu commented Dec 18, 2023

mdhaber commented Dec 18, 2023 • edited

xuefeng-xu commented Dec 18, 2023 • edited

mdhaber commented Dec 18, 2023 • edited

xuefeng-xu commented Dec 18, 2023

mdhaber commented Dec 19, 2023 • edited

xuefeng-xu commented Dec 13, 2023 •

edited

mdhaber commented Dec 13, 2023 •

edited

mdhaber commented Dec 13, 2023 •

edited

xuefeng-xu commented Dec 13, 2023 •

edited

mdhaber commented Dec 13, 2023 •

edited

mdhaber commented Dec 13, 2023 •

edited

mdhaber commented Dec 18, 2023 •

edited

xuefeng-xu commented Dec 18, 2023 •

edited

mdhaber commented Dec 18, 2023 •

edited

mdhaber commented Dec 19, 2023 •

edited