ENH: improve performance of the skewnorm cdf computation #15486

zoj613 · 2022-01-29T22:34:56Z

Reference issue

What does this implement/fix?

Implements a method described in [1]

Additional information

As noted in the corresponding issue, this approach is much faster since it does not rely on integrating the pdf directly. Plus it does not suffer from the issues of the alternative implementation using Owen's T function, discussed in #8473.

References

[1] Amsler, C., Papadopoulos, A. & Schmidt, P. Evaluating the cdf of the Skew Normal distribution. Empir Econ 60,3171–3202 (2021). https://doi.org/10.1007/s00181-020-01868-6

CC @WarrenWeckesser @mdhaber @ev-br

zoj613 · 2022-01-29T22:37:34Z

PS: I cant run the tests locally because of the build issues outlined here: #8325 (comment), but the manual tests I ran as standalone function suggest that the implementation is correct.

zoj613 · 2022-01-29T23:34:27Z

Currently failing for the values (x=-4, a=2):

AssertionError: 
Not equal to tolerance rtol=1e-08, atol=0

Mismatched elements: 1 / 1 (100%)
Max absolute difference: 3.21992328e-22
Max relative difference: 0.03960623
 x: array(8.451832e-21) # got
 y: array(8.12984e-21)  # expected

mdhaber

Looks like this could provide a nice speedup!

Is skewnorm accuracy well tested already? If not, I suppose it's OK not to add new tests. But since this is for speed, can you show some time comparisons in the PR?

Next time I review I'll need to check this against the reference and look at the test failure. Usually nonzero atol is OK, so I wonder why that test doesn't have one. Depends on what it's testing.

scipy/stats/_continuous_distns.py

mdhaber · 2022-02-21T07:49:45Z

scipy/stats/_continuous_distns.py

+        * we use equation 14 when Q < 0, which is an approximation for the
+            lower tail of the distribution.


i haven't read the paper yet, but is it more accurate or much faster than equation 3? Is it a good approximation for all values of the shape parameter?

There are several algorithms outlined in the paper that have differing accuracy and speed, depending on the sign and absolute values of the shape parameter. Unfortunately, It seems rather hard or impractical to account for all different values while choosing the "best" algorithm. Equation 14 is just as accurate as the current implementation but as seen in my previous comment here: #15486 (comment), the accuracy isn't equal. The hard part is knowing which algorithm to choose from given the combination of x and shape and whether or not that combination represents a point at the tail of the distribution.

scipy/stats/_continuous_distns.py

mdhaber · 2022-08-03T22:07:28Z

@zoj613 I started working on this to help finish it up, but I noticed that it fails our existing accuracy checks - partially due to the use of the approximation, and partially because of (1 - cdf) if orig_shape < 0.

It turns out that Boost has an implementation of the skewnorm distribution that is much faster than we can do here, so I am proposing we use it in gh-16770. We could still use some of the ideas from the paper, but I think it would need to be in a new PR. Would it be ok if we close this one in favor of gh-16770?

Boost also has the inverse Gaussian. If it looks like it would take care of gh-13665/gh-13666, shall I close gh-13665?

zoj613 · 2022-08-03T22:12:22Z

@mdhaber Yes im fine with closing this. #16770 looks a lot simpler.

zoj613 force-pushed the skewnorm_cdf branch from a40d702 to 84e19b8 Compare January 29, 2022 22:44

ENH: improve performance of the skewnorm cdf computation

be1ae99

zoj613 force-pushed the skewnorm_cdf branch from 84e19b8 to be1ae99 Compare January 29, 2022 22:51

tupui added enhancement A new feature or improvement scipy.stats labels Jan 30, 2022

mdhaber reviewed Feb 21, 2022

View reviewed changes

mdhaber mentioned this pull request Aug 3, 2022

ENH: stats: use Boost implementation of skewnorm cdf/ppf #16770

Merged

mdhaber closed this Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: improve performance of the skewnorm cdf computation #15486

ENH: improve performance of the skewnorm cdf computation #15486

zoj613 commented Jan 29, 2022

zoj613 commented Jan 29, 2022 •

edited

zoj613 commented Jan 29, 2022 •

edited

mdhaber left a comment •

edited

mdhaber Feb 21, 2022

zoj613 Feb 21, 2022

mdhaber commented Aug 3, 2022 •

edited

zoj613 commented Aug 3, 2022

		* we use equation 14 when Q < 0, which is an approximation for the
		lower tail of the distribution.

ENH: improve performance of the skewnorm cdf computation #15486

ENH: improve performance of the skewnorm cdf computation #15486

Conversation

zoj613 commented Jan 29, 2022

Reference issue

What does this implement/fix?

Additional information

References

zoj613 commented Jan 29, 2022 • edited

zoj613 commented Jan 29, 2022 • edited

mdhaber left a comment • edited

Choose a reason for hiding this comment

mdhaber Feb 21, 2022

Choose a reason for hiding this comment

zoj613 Feb 21, 2022

Choose a reason for hiding this comment

mdhaber commented Aug 3, 2022 • edited

zoj613 commented Aug 3, 2022

zoj613 commented Jan 29, 2022 •

edited

zoj613 commented Jan 29, 2022 •

edited

mdhaber left a comment •

edited

mdhaber commented Aug 3, 2022 •

edited