New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: improve performance of the skewnorm cdf computation #15486
Conversation
PS: I cant run the tests locally because of the build issues outlined here: #8325 (comment), but the manual tests I ran as standalone function suggest that the implementation is correct. |
Currently failing for the values (x=-4, a=2):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this could provide a nice speedup!
Is skewnorm
accuracy well tested already? If not, I suppose it's OK not to add new tests. But since this is for speed, can you show some time comparisons in the PR?
Next time I review I'll need to check this against the reference and look at the test failure. Usually nonzero atol
is OK, so I wonder why that test doesn't have one. Depends on what it's testing.
* we use equation 14 when Q < 0, which is an approximation for the | ||
lower tail of the distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i haven't read the paper yet, but is it more accurate or much faster than equation 3? Is it a good approximation for all values of the shape parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several algorithms outlined in the paper that have differing accuracy and speed, depending on the sign and absolute values of the shape parameter. Unfortunately, It seems rather hard or impractical to account for all different values while choosing the "best" algorithm. Equation 14 is just as accurate as the current implementation but as seen in my previous comment here: #15486 (comment), the accuracy isn't equal. The hard part is knowing which algorithm to choose from given the combination of x
and shape
and whether or not that combination represents a point at the tail of the distribution.
@zoj613 I started working on this to help finish it up, but I noticed that it fails our existing accuracy checks - partially due to the use of the approximation, and partially because of It turns out that Boost has an implementation of the skewnorm distribution that is much faster than we can do here, so I am proposing we use it in gh-16770. We could still use some of the ideas from the paper, but I think it would need to be in a new PR. Would it be ok if we close this one in favor of gh-16770? Boost also has the inverse Gaussian. If it looks like it would take care of gh-13665/gh-13666, shall I close gh-13665? |
Reference issue
closes #15473
What does this implement/fix?
Implements a method described in [1]
Additional information
As noted in the corresponding issue, this approach is much faster since it does not rely on integrating the pdf directly. Plus it does not suffer from the issues of the alternative implementation using Owen's T function, discussed in #8473.
References
[1] Amsler, C., Papadopoulos, A. & Schmidt, P. Evaluating the cdf of the Skew Normal distribution. Empir Econ 60,3171–3202 (2021). https://doi.org/10.1007/s00181-020-01868-6
CC @WarrenWeckesser @mdhaber @ev-br