Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: fix domain edge error for betainc a or b ==0 case is 1. #11788

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rlucas7
Copy link
Member

@rlucas7 rlucas7 commented Apr 4, 2020

Reference issue

Closes gh-8411

What does this implement/fix?

the betainc function was not providing correct values at boundary values for a and b, when either are 0.0. In the described boundary case the values returned should be 1.0. This PR implements the change. Also adds a test module to confirm the boundary values are as expected.

Additional information

Previously (taken from gh-8411):

import numpy as np
from scipy.special import betainc

X = 0.5
Z = np.array(range(0,11))
W = 3

print(betainc(Z,W,X))
[       nan 0.875      0.6875     0.5        0.34375    0.2265625
 0.14453125 0.08984375 0.0546875  0.03271484 0.01928711]

Now:

(scipy-dev) Lucass-MacBook:scipy rlucas$ python 
Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 14:01:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> from scipy.special import betainc
>>> X = 0.5
>>> Z = np.array(range(0,11))
>>> W = 3
>>> print(betainc(Z,W,X))
[1.         0.875      0.6875     0.5        0.34375    0.2265625
 0.14453125 0.08984375 0.0546875  0.03271484 0.01928711]
>>> 
(scipy-dev) Lucass-MacBook:scipy rlucas$

@rlucas7 rlucas7 added C/C++ Items related to the internal C/C++ code base scipy.special labels Apr 4, 2020
@rlucas7 rlucas7 requested a review from person142 April 4, 2020 00:37
@tylerjereddy tylerjereddy added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Apr 4, 2020
@tylerjereddy tylerjereddy added this to the 1.5.0 milestone Apr 4, 2020
Copy link
Member

@person142 person142 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think through these edge cases more carefully-some theoretical justification is going to be needed to make sure we are correct.

scipy/special/tests/test_betainc.py Outdated Show resolved Hide resolved
goto domerr;

if (aa == 0.0 || bb == 0.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be quite right-now you're going to get

sc.betainc(1e-300, 1e-300, 0) == 0

but

sc.betainc(0, 0, 0) == 1

i.e. there's a discontinuity, which should result in NaN.

I think we're also going to need some justification that this does the right thing for a == 0 but b == inf say. Generally speaking there's a cube

a         b         x
[0, oo] x [0, oo] x [0, 1]

and each of the 6 faces, 12 edges, and 8 vertices needs to be thought about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@person142 Thanks! Your right, I'd been thinking this was just a fix for an isolated point but will require some more thinking, and maybe I need to look at a contour integral or two to understand the behavior along the faces of this space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a consistent way to define the values for a == 0, b > 0 or for a > 0, b == 0, but for a == 0, b == 0, the value is not well-defined. You can see this numerically by computing the function for a series of parameter values (a(t), b(t)) that approach (0, 0) along different straight lines:

In [140]: t                                                                               
Out[140]: array([1.e-06, 1.e-07, 1.e-08, 1.e-09, 1.e-10, 1.e-11, 1.e-12, 1.e-13])

In [141]: betainc(t, t, 0.3)
Out[141]: 
array([0.49999958, 0.49999996, 0.5       , 0.5       , 0.5       ,
       0.5       , 0.5       , 0.5       ])

In [142]: betainc(t, 3*t, 0.3)
Out[142]: 
array([0.74999936, 0.74999994, 0.74999999, 0.75      , 0.75      ,
       0.75      , 0.75      , 0.75      ])

In [143]: betainc(2*t, 3*t, 0.3)
Out[143]: 
array([0.59999898, 0.5999999 , 0.59999999, 0.6       , 0.6       ,
       0.6       , 0.6       , 0.6       ])

Copy link
Member Author

@rlucas7 rlucas7 Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're all in agreement on this one, consistent values here:
a == 0, b > 0, x > 0 -> 1
a > 0, b == 0, x > 0 -> 1
a == 0, b == 0, x> 0 -> Nan

I know Josh would like the inf cases handled too but not clear to me the limit value(s) for those cases so calling those 'out of scope' for this PR.

EDIT was to remove the x==0 case from a == 0, b == 0, x>= 0, It sounds like there was another domain edge error that hadn't been noticed previously that had a == 0, b == 0, x== 0
returning 0 when it seems like the value should be Nan.

@rlucas7
Copy link
Member Author

rlucas7 commented May 16, 2020

@person142 I restricted the fixes the both a and b finite. It's not obvious to be how the regularization interacts in the limit and whether is finite or nan.

@WarrenWeckesser I fixed the edge cases to be a=0, b>0, with b finite and a>0, b=0 with a finite.

I also added a separate branch for the a=b=0 case which returns Nan.

@rlucas7
Copy link
Member Author

rlucas7 commented May 17, 2020

The build failures on travis are real:
https://travis-ci.org/github/scipy/scipy/jobs/687885706#L1379

not clear to me what is the cause though, I always forget about the mpmath tests for special and can never seem to decipher the output, anyone have an idea on the fix for the failing test?

@tylerjereddy
Copy link
Contributor

For the special high-precision tests maybe: @person142 ?

@person142
Copy link
Member

not clear to me what is the cause though

You're hitting a real edge case-when x = 0 you're taking a double limit in a -> 0/x -> 0 or b -> 0/x -> 0. And e.g. as a -> 0 the result goes to 1, but as x -> 0 the result goes to 0.

@person142
Copy link
Member

The reason we weren't hitting this before is because the test:

https://github.com/scipy/scipy/blob/master/scipy/special/tests/test_mpmath.py#L872

uses the dreaded nan_ok=True:

https://github.com/scipy/scipy/blob/master/scipy/special/_mptestutils.py#L183

which hides all sorts of inconvenient truths.

@rlucas7
Copy link
Member Author

rlucas7 commented May 23, 2020

@person142 thanks for the pointers, I'll not have further time this weekend for this one.
@tylerjereddy I will pick back up after the 1.5.0 branch. I'll punt the milestone to 1.6.0

@rlucas7 rlucas7 modified the milestones: 1.5.0, 1.6.0 May 23, 2020
@rlucas7
Copy link
Member Author

rlucas7 commented Jul 2, 2020

You're hitting a real edge case-when x = 0 you're taking a double limit in a -> 0/x -> 0 or b -> 0/x -> 0. And >e.g. as a -> 0 the result goes to 1, but as x -> 0 the result goes to 0.

Right, so there are a couple cases here, IIUC, if x>0, a=0, b=0 the value is nan on that face of the space.

Not clear to me the statement:

And e.g. as a -> 0 the result goes to 1, but as x -> 0 the result goes to 0.
this suggests either a jump discontinuity or a code error not clear to me which is being referred to,
@person142 can you clarify?

My (working) understanding is that the code has (another) edge case here and the value the current code returns is 0 but should, in fact, be nan when lim x-> 0+, a=0,b=0, is that your understanding as well?

The reason we weren't hitting this before is because the test:

https://github.com/scipy/scipy/blob/master/scipy/special/tests/test_mpmath.py#L872

uses the dreaded nan_ok=True:

https://github.com/scipy/scipy/blob/master/scipy/special/_mptestutils.py#L183

which hides all sorts of inconvenient truths.

Yes, I see how that bit of code works now, thanks. If my understanding above is correct (waiting for Josh to confirm or clarify) then we'll need to remove the edge case x=0, a=0,b=0 from this part of the tests.

@tylerjereddy
Copy link
Contributor

@rlucas7 @person142 Sounds like some progress was being made here, but also some potentially tricky things to solve still? As we approach the branching for 1.6.0 let me know if you think you can get this ready in time.

@rlucas7 rlucas7 removed this from the 1.6.0 milestone Nov 14, 2020
@rlucas7
Copy link
Member Author

rlucas7 commented Nov 14, 2020

@rlucas7 @person142 Sounds like some progress was being made here, but also some potentially tricky things to solve still? As we approach the branching for 1.6.0 let me know if you think you can get this ready in time.

Thanks for the ping, I removed the 1.6.0 milestone, I'll come back to this but not likely within the next week or two.

@j-bowhay j-bowhay added the needs-work Items that are pending response from the author label Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C/C++ Items related to the internal C/C++ code base defect A clear bug or issue that prevents SciPy from being installed or used as expected needs-work Items that are pending response from the author scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nan with betainc for a=0, b=3 and x=0.5
5 participants