New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix negative overflow in stats.boxcox_normmax #19691
Conversation
@xuefeng-xu can you explain the changes after the first commit? What cases did it not consider? |
The following mainly use the fact that the transformation function is increasing in both lmb and x The possible positive overflow case is when xmax > 1 and lmb > 1
The possible negtive overflow case is when xmin < 1 and lmb < 0
|
Actually, I want to improve the negative overflow case from xmin < 1 and lmb < 0 to xmin < 1 and lmb < -1 But because smallest_normal is not actually the smallest positive representable value in a NumPy floating point type, it might still cause negative overflow. print(np.finfo(np.float64).smallest_normal) # 2.2250738585072014e-308
print(boxcox(np.finfo(np.float64).smallest_normal, -1)) # -4.4942328371556665e+307
print(boxcox(1e-323,-1)) # -inf |
Thanks, but that says what you did, whereas I'm asking about the difference between what you did and what I had. For instance, is there an example case that your commits fixed that f393477 did not? Also, why move the test case from the class that tests |
No. The first commit consider the following two cases, I just narrow the scope.
Previously in #19604, I added the test in for |
Do you mean it's more efficient (e.g. it doesn't perform an expensive calculation if it doesn't need to)? Before I review your code in detail, I need to better understand the motivation for the extra complexity. I try to avoid quadruply-nested |
Yes, that is my intention. Although it seems that the calculation might not be expensive. |
I see. So you were using theory to avoid actually performing a Now that I understand the intent, I could take the time to check that this implementation is indeed faster, verify the mathematical argument for avoiding the direct overflow check, and then I could review the code to see that it correctly implements the math. But I think there is a balance to be struck between code performance, code complexity, and review time. If you'd like to go this route, can you demonstrate that the performance improvement is worth that extra work? Considering that the function also performs a comparatively expensive optimization, my guess is that these extra checks to avoid a call to |
Thanks, I'll also do some checks. And I'm happy to switch to the simpler version if there's no noticable performance improvement. |
Hi @mdhaber, I've simplified the code. The main difference from f393477 is to check the two conditions that may cause overflow. sign_lmbm1 = np.sign(res - 1)
x_treme = np.max(x) if np.any(sign_lmbm1 > 0) else np.min(x)
# There are two conditions of overflow to check
# 1. x>1, lmb>1; 2. x<1, lmb<1
mask = False
if np.any((x_treme - 1) * sign_lmbm1 > 0):
mask = special.boxcox(x_treme, res) * sign_lmbm1 > ymax For the negtive overflow case, I didn't use the previous proprosed condition (x<1, lmd<0) because it will introduce code complexity. |
I see, but again, if f393477 worked and has already been reviewed by me, you need to explain why / demonstrate that the modifications are beneficial. We seem to be having trouble communicating since we've gone back and forth a few times on this point. I'll try a few different ways of phrasing my request. Answers to any of these questions might get at what I'm looking for.
Thanks! |
One advantage I think is that the overflow analysis is base on theory, i.e. the choice of Moreover, it can also applied in YJ with only little modifications. Since the YJ transformation function is more complicate. If we can leverage some nice properties, the analysis would be much eaiser. if np.any(x_treme * sign_lmbm1 > 0): # 1. x>0, lmb>1; 2. x<0, lmb<1
mask = _yeojohnson_transform(x_treme, res) * sign_lmbm1 > ymax
I didn't find a notable speed improvement. |
Both are based on theory. I think that a consequence of lemma 5 is that the maximum value of import special
special.boxcox(1e300, -1e-300) # ~690
special.boxcox(1e-300, 1e-300) # ~-690 Neither of these are close to overflowing. Consequently, there cannot be positive overflow when
We're trying backport a patch into 1.12. Since we're short on time, I want to do the simplest thing possible that is correct rather than exploring lots of options. Does that make sense? In the interest of quickly backporting this fix before release of 1.12, please use f393477 if it is correct, making only necessary changes (like adjusting the warning). If it's correct, it seems only natural to do so, since it existed first, the diff is minimal, and i have already reviewed it. We can adapt to the case of user-specified |
Ok, let's get this move forward. I have reverted to f393477 and only updated the warning messge. |
Thanks @xuefeng-xu for the improvements! If there is a better way to perform the checks, you can change it (with objective justification) in gh-19631, but it's good to keep this simple for the backport. I'll just run the Cirrus checks and squash merge when they pass. |
* MAINT: stats.boxcox_normmax: avoid negative overflow Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Reference issue
Follow up #19604 (The prior fix positive overflow, this fix negative overflow)
Towards #19016
What does this implement/fix?
Negative overflow #19631 (comment)
There are two cases to check, see #19631 (comment) for more detail.
xmax
> 1 andlmb
> 1, check ifBC(xmax)
>ymax
xmin
< 1 andlmb
< 0, check ifBC(xmin)
<-ymax
Additional information
The second check is improved from
xmin
< 1 andlmb
< 1 toxmin
< 1 andlmb
< 0See #19631 (comment) for explanation and