New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement ppf and isf for invgauss #13665
Conversation
base branch values below. The ones from this PR are shown in the unittests: In [4]: invgauss.pdf([-1, 0, 1, 2, np.inf, np.nan], mu=np.inf)
/home/scipy/scipy/stats/_continuous_distns.py:3495: RuntimeWarning: invalid value encountered in true_divide
return 1.0/np.sqrt(2*np.pi*x**3.0)*np.exp(-1.0/(2*x)*((x-mu)/mu)**2)
Out[4]: array([ 0., 0., nan, nan, 0., nan])
In [12]: p1 = invgauss.cdf(invgauss.ppf(p, mu=1), mu=1)
...: abs_error = np.abs(p - p1)
...: # five-number summary of the absolute error
...: np.percentile(abs_error, [0, 25, 50, np.mean(abs_error), 75, 100])
Out[12]:
array([0.00000000e+00, 0.00000000e+00, 2.16840434e-19, 0.00000000e+00,
3.46944695e-18, 6.66133815e-16])
In [14]: q = invgauss.ppf(p, mu=1)
...: q1 = invgauss.ppf(invgauss.cdf(q, mu=1), mu=1)
...: rel_error = np.abs(q1 - q) / q
...: np.percentile(rel_error, [0, 25, 50, np.mean(rel_error), 75, 100])
Out[14]:
array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
2.35639297e-16, 3.20712499e-14])
In [18]: invgauss.isf(1e-16, mu=1.05) / 0.7
Out[18]: 99.96907854049056 # incorrect, it is supposed to be 98.47
In [19]: invgauss.isf(1e-17, mu=1.05) / 0.7
Out[19]: 142.85714285714286 # does not overflow but incorrect value. it should be 105.4208 speed In [23]: %timeit invgauss.ppf(p, mu=1)
52.3 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [24]: %timeit invgauss.isf(p, mu=1)
53.8 ms ± 2.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) |
Speed of this PR compared to base branch: In [4]: %timeit invgauss.ppf(p, mu=1)
15.7 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit invgauss.isf(p, mu=1)
14.2 ms ± 99.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) about 3.3 times faster on my machine. |
@mdhaber I think the latest commit should fix the build errors. Please take a look when you can. The issue was with a RuntimeWarning being raised because in the test I pass a NaN when ppf expects a number between 0 and 1. Since passing in a nan was intentional, I ignored the runtimewarning in the test case. See scipy/scipy/stats/_distn_infrastructure.py Line 2126 in 15bf22e
|
|
are you referring to them not raising the warning in the build environment? I did not see a tests with nan input in there for cauchy. Note that there is no runtime warning raised when calling the method in a python session locally, only in the build environment. |
the circleCI failure is not related to this PR. Seems like an issue with pythran not being installed in the build environment. |
Yes, I've gotten that, too. |
Seems like all it needed was a rebase, but the rebase introduced other unrelated failures on windows. |
That one is intermittent, so I re-ran it for you. Fingers crossed. Update: tests passing. I think that will encourage another maintainer to take a look. I will pass on this one as I reviewed/merged gh-13616 independently, and it would be good for another stats maintainer to become familiar with these changes before the next release. |
Thanks for the update. In that case I might as well ping @andyfaff and @ev-br for a possible review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some non-stats related code comments.
@ilayn I addressed all the comments made, please take a look again when you can. |
Also cant build locally due to error:
Any idea how I can fix this @tupui ? |
You might just have to merge master here. I am on master and I can build. |
Maybe im missing a step here because even after rebasing on the main branch + clearing build + re-building I get this error when importing ----> 1 from scipy.stats import invgauss
~/scipy/scipy/stats/__init__.py in <module>
441 from .kde import gaussian_kde
442 from . import mstats
--> 443 from . import qmc
444 from ._multivariate import *
445 from . import contingency
~/scipy/scipy/stats/qmc.py in <module>
232
233 """
--> 234 from ._qmc import *
~/scipy/scipy/stats/_qmc.py in <module>
14 _categorize, initialize_direction_numbers, _MAXDIM, _MAXBIT
15 )
---> 16 from scipy.stats._qmc_cy import (
17 _cy_wrapper_centered_discrepancy,
18 _cy_wrapper_wrap_around_discrepancy,
ModuleNotFoundError: No module named 'scipy.stats._qmc_cy' I suppose it doesn't matter since it has nothing to do with this PR. |
How are you building and which platform are you on? I just checked out your branch, built and I could import |
I think I did pull everything because I rebased using the upstream master branch and it says this branch is up to date on the forked repo. I'm on Manjaro Linux 64bit. I guess it's not too big of a deal because the tests run, just unable to import on ipython locally. Another issue is the faliure to build docs. I noticed this is the case for other open PR's as well. |
The doc build error is unrelated. We are having timeouts in the console. We still need to figure this out. Do you actually have the files |
yes, a shell search returns:
|
…bility is very small
@mdhaber i've done some initial updates to the code after going over the paper again. The tests in |
Before working too much, maybe try out Boost's implementation? You can follow the example of the new skewnorm PR, or I can try it if you'd prefer. |
assert np.isclose(stats.invgauss.ppf(0.00013, mu=1, scale=3), | ||
0.15039762631802803, atol=2.22e-16) | ||
# test if it returns right tail values accurately | ||
assert np.isclose(stats.invgauss.isf(1e-20, mu=1.5, scale=1 / 0.7), | ||
126.3493, atol=2.22e-16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a different meaning of scale between the paper and SciPy, because these aren't passing, right? SciPy's distribution infrastructure handles scale
for all distributions. There is no need to test with scales other than 1. If you try a scale other than 1 and it doesn't match what is in a paper, it's more likely that the parameterization is different.
@zoj613 For this one, I'd also suggest that we close this and I'll open a PR that would use Boost for The other part of this PR is behavior when |
If the boost implementation will provide comparable performance then i'm fine with closing this in favor of a simpler implementation. I do wonder if boost uses the same implementation from the paper because if not then accuracy issues might need to be addressed. |
4.92818217e-16], atol=1e-12) | ||
|
||
# tests if algorithm does not diverge for small probabilities. | ||
assert np.isclose(stats.invgauss.ppf(0.00013, mu=1, scale=3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess for why these fail is that the parameterization used in the paper may be different than ours. That is, scale means something different in the paper. Ours is just a linear stretching of the real line, and the implementation is common to all distributions.
assert_allclose(fns, [0., 0., 1.27054940e-21, 1.96413583e-17, | ||
2.16840434e-19, 2.22044605e-16], atol=1e-08) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Boost, fns
is:
array([0.00000000e+00, 0.00000000e+00, 2.96461532e-21, 0.00000000e+00,
2.71050543e-19, 1.66533454e-16])
abs_error
is:
array([2.96461532e-21, 5.59041745e-20, 2.71050543e-19, 2.16840434e-19,
5.20417043e-18, 2.77555756e-17, 1.66533454e-16, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00])
assert_allclose(fns2, [0., 0., 0., 9.15882970e-17, 1.16804249e-16, | ||
4.92818217e-16], atol=1e-08) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Boost, fns2
, is:
array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
3.53458946e-16, 6.57090956e-16])
rel_error
is:
array([3.58337988e-16, 4.45142890e-16, 3.53458946e-16, 0.00000000e+00,
1.15801436e-16, 1.16804249e-16, 6.57090956e-16, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00])
) | ||
assert_allclose(fns2, [0., 0., 1.17819649e-16, 0., | ||
3.50412747e-16, 1.68867330e-11], atol=1e-08) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Boost would pass all the tests added here (except it doesn't handle the special case of mu='inf'
, which is a separate thing). It's also quite fast:
from scipy import stats
rng = np.random.default_rng(0)
rvs = stats.invgauss.rvs(1, size=1000, random_state=rng)
%timeit stats.invgauss.ppf(rvs, mu) # 885 µs with Boost, 7.26 ms this PR
Boost seems like a big improvement especially given the simplicity it provides. So i'm happy for us to close this PR in favor of the boost implementation. |
Reference issue
Follow up to #13616 and closes #13666
What does this implement/fix?
This PR adds methods
ppf
andisf
toinvgauss
distribution.Additional information
This PR adds an implementation of
ppf
andisf
forinvgauss
and while also improving thepdf
method so that it is able to return the correct value when the mean is infinity. The implementation is described in [1]. The tests are based on the examples presented in the article.References
[1] Giner, Goknur and G. Smyth. “statmod: Probability Calculations for the Inverse Gaussian Distribution.” R J. 8 (2016): 339.