ENH: Implement ppf and isf for invgauss #13665

zoj613 · 2021-03-09T12:27:07Z

Reference issue

Follow up to #13616 and closes #13666

What does this implement/fix?

This PR adds methods ppf and isf to invgauss distribution.

Additional information

This PR adds an implementation of ppf and isf for invgauss and while also improving the pdf method so that it is able to return the correct value when the mean is infinity. The implementation is described in [1]. The tests are based on the examples presented in the article.

References

[1] Giner, Goknur and G. Smyth. “statmod: Probability Calculations for the Inverse Gaussian Distribution.” R J. 8 (2016): 339.

zoj613 · 2021-03-09T15:05:44Z

base branch values below. The ones from this PR are shown in the unittests:

pdf

In [4]: invgauss.pdf([-1, 0, 1, 2, np.inf, np.nan], mu=np.inf)
/home/scipy/scipy/stats/_continuous_distns.py:3495: RuntimeWarning: invalid value encountered in true_divide
  return 1.0/np.sqrt(2*np.pi*x**3.0)*np.exp(-1.0/(2*x)*((x-mu)/mu)**2)
Out[4]: array([ 0.,  0., nan, nan,  0., nan])

ppf

In [12]:         p1 = invgauss.cdf(invgauss.ppf(p, mu=1), mu=1)
    ...:         abs_error = np.abs(p - p1)
    ...:         # five-number summary of the absolute error
    ...:         np.percentile(abs_error, [0, 25, 50, np.mean(abs_error), 75, 100])
Out[12]: 
array([0.00000000e+00, 0.00000000e+00, 2.16840434e-19, 0.00000000e+00,
       3.46944695e-18, 6.66133815e-16])

In [14]:         q = invgauss.ppf(p, mu=1)
    ...:         q1 = invgauss.ppf(invgauss.cdf(q, mu=1), mu=1)
    ...:         rel_error = np.abs(q1 - q) / q
    ...:         np.percentile(rel_error, [0, 25, 50, np.mean(rel_error), 75, 100])
Out[14]: 
array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       2.35639297e-16, 3.20712499e-14])

isf

In [18]: invgauss.isf(1e-16, mu=1.05) / 0.7
Out[18]: 99.96907854049056  # incorrect, it is supposed to be 98.47

In [19]: invgauss.isf(1e-17, mu=1.05) / 0.7
Out[19]: 142.85714285714286  # does not overflow but incorrect value. it should be 105.4208

speed

In [23]: %timeit invgauss.ppf(p, mu=1)
52.3 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [24]: %timeit invgauss.isf(p, mu=1)
53.8 ms ± 2.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

zoj613 · 2021-03-09T15:06:44Z

Speed of this PR compared to base branch:

In [4]: %timeit invgauss.ppf(p, mu=1)
15.7 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit invgauss.isf(p, mu=1)
14.2 ms ± 99.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

about 3.3 times faster on my machine.

scipy/stats/tests/test_distributions.py

scipy/stats/_continuous_distns.py

scipy/stats/tests/test_distributions.py

zoj613 · 2021-03-10T21:06:00Z

@mdhaber I think the latest commit should fix the build errors. Please take a look when you can. The issue was with a RuntimeWarning being raised because in the test I pass a NaN when ppf expects a number between 0 and 1. Since passing in a nan was intentional, I ignored the runtimewarning in the test case.

See

scipy/scipy/stats/_distn_infrastructure.py

Line 2126 in 15bf22e

cond1 = (0 < q) & (q < 1)

mdhaber · 2021-03-10T21:16:29Z

norm and cauchy don't seem to produce warnings with nan input, so it might be better to do the comparison with np.errstate(invalid='ignore') so to avoid the warning being raised.

zoj613 · 2021-03-10T21:27:23Z

norm and cauchy don't seem to produce warnings with nan input, so it might be better to do the comparison with np.errstate(invalid='ignore') so to avoid the warning being raised.

are you referring to them not raising the warning in the build environment? I did not see a tests with nan input in there for cauchy. Note that there is no runtime warning raised when calling the method in a python session locally, only in the build environment.

zoj613 · 2021-03-14T06:40:20Z

the circleCI failure is not related to this PR. Seems like an issue with pythran not being installed in the build environment.

mdhaber · 2021-03-14T14:37:35Z

Yes, I've gotten that, too.

zoj613 · 2021-03-14T18:39:15Z

Yes, I've gotten that, too.

Seems like all it needed was a rebase, but the rebase introduced other unrelated failures on windows.

mdhaber · 2021-03-16T01:07:49Z

That one is intermittent, so I re-ran it for you. Fingers crossed.

Update: tests passing. I think that will encourage another maintainer to take a look. I will pass on this one as I reviewed/merged gh-13616 independently, and it would be good for another stats maintainer to become familiar with these changes before the next release.

zoj613 · 2021-04-09T08:09:55Z

That one is intermittent, so I re-ran it for you. Fingers crossed.

Update: tests passing. I think that will encourage another maintainer to take a look. I will pass on this one as I reviewed/merged gh-13616 independently, and it would be good for another stats maintainer to become familiar with these changes before the next release.

Thanks for the update. In that case I might as well ping @andyfaff and @ev-br for a possible review.

ilayn

Some non-stats related code comments.

scipy/stats/_continuous_distns.py

zoj613 · 2021-04-10T00:27:23Z

@ilayn I addressed all the comments made, please take a look again when you can.

zoj613 · 2021-04-10T00:36:48Z

Also cant build locally due to error:

---> 16 from scipy.stats._qmc_cy import (
     17     _cy_wrapper_centered_discrepancy,
     18     _cy_wrapper_wrap_around_discrepancy,

ModuleNotFoundError: No module named 'scipy.stats._qmc_cy'

Any idea how I can fix this @tupui ?

tupui · 2021-04-10T07:28:24Z

ModuleNotFoundError: No module named 'scipy.stats._qmc_cy'

Any idea how I can fix this @tupui ?

You might just have to merge master here. I am on master and I can build.

zoj613 · 2021-04-10T08:55:13Z

ModuleNotFoundError: No module named 'scipy.stats._qmc_cy'
Any idea how I can fix this @tupui ?

You might just have to merge master here. I am on master and I can build.

Maybe im missing a step here because even after rebasing on the main branch + clearing build + re-building I get this error when importing invgauss locally.

----> 1 from scipy.stats import invgauss

~/scipy/scipy/stats/__init__.py in <module>
    441 from .kde import gaussian_kde
    442 from . import mstats
--> 443 from . import qmc
    444 from ._multivariate import *
    445 from . import contingency

~/scipy/scipy/stats/qmc.py in <module>
    232 
    233 """
--> 234 from ._qmc import *

~/scipy/scipy/stats/_qmc.py in <module>
     14     _categorize, initialize_direction_numbers, _MAXDIM, _MAXBIT
     15 )
---> 16 from scipy.stats._qmc_cy import (
     17     _cy_wrapper_centered_discrepancy,
     18     _cy_wrapper_wrap_around_discrepancy,

ModuleNotFoundError: No module named 'scipy.stats._qmc_cy'

I suppose it doesn't matter since it has nothing to do with this PR.

tupui · 2021-04-10T09:11:23Z

How are you building and which platform are you on?

I just checked out your branch, built and I could import from scipy.stats import invgauss.
Are you certain you have pulled everything from master?

zoj613 · 2021-04-10T09:15:46Z

How are you building and which platform are you on?

I just checked out your branch, built and I could import from scipy.stats import invgauss.
Are you certain you have pulled everything from master?

I think I did pull everything because I rebased using the upstream master branch and it says this branch is up to date on the forked repo. I'm on Manjaro Linux 64bit. I guess it's not too big of a deal because the tests run, just unable to import on ipython locally. Another issue is the faliure to build docs. I noticed this is the case for other open PR's as well.

tupui · 2021-04-10T09:20:34Z

How are you building and which platform are you on?
I just checked out your branch, built and I could import from scipy.stats import invgauss.
Are you certain you have pulled everything from master?

I think I did pull everything because I rebased using the upstream master branch and it says this branch is up to date on the forked repo. I'm on Manjaro Linux 64bit. I guess it's not too big of a deal because the tests run, just unable to import on ipython locally. Another issue is the faliure to build docs. I noticed this is the case for other open PR's as well.

The doc build error is unrelated. We are having timeouts in the console. We still need to figure this out.

Do you actually have the files scipy/stats/_gmc_cy.pyx and the cxx, pyi ones?

zoj613 · 2021-04-10T09:23:06Z

Do you actually have the files scipy/stats/_gmc_cy.pyx and the cxx, pyi ones?

yes, a shell search returns:

$ ls scipy/stats/_qmc
_qmc_cy.cxx  _qmc_cy.pyi  _qmc_cy.pyx  _qmc.py      _qmc.pyi

…bility is very small

…xplicit

zoj613 · 2022-08-03T21:03:40Z

@mdhaber i've done some initial updates to the code after going over the paper again. The tests in test_ppf_isf pass except for those cases starting at line 2276. For some reason I don't get the correct values. I will investigate why this is a bit later in the week and also address the comments you left.

mdhaber · 2022-08-03T22:15:38Z

Before working too much, maybe try out Boost's implementation? You can follow the example of the new skewnorm PR, or I can try it if you'd prefer.

mdhaber · 2022-08-03T23:29:24Z

scipy/stats/tests/test_distributions.py

+        assert np.isclose(stats.invgauss.ppf(0.00013, mu=1, scale=3),
+                          0.15039762631802803, atol=2.22e-16)
+        # test if it returns right tail values accurately
+        assert np.isclose(stats.invgauss.isf(1e-20, mu=1.5, scale=1 / 0.7),
+                          126.3493, atol=2.22e-16)


I think there is a different meaning of scale between the paper and SciPy, because these aren't passing, right? SciPy's distribution infrastructure handles scale for all distributions. There is no need to test with scales other than 1. If you try a scale other than 1 and it doesn't match what is in a paper, it's more likely that the parameterization is different.

mdhaber · 2022-08-03T23:37:37Z

@zoj613 For this one, I'd also suggest that we close this and I'll open a PR that would use Boost for ppf/isf. I don't think I'd add the tests because the distribution infrastructure already performs the basic test to confirm that the methods are consistent with one another (e.g. see the ppf/cdf test here), tests of the domain (e.g. what happens with input NaNs), etc. Beyond that, it's a test of whether Boost's distributions are accurate or not. We can add additional tests, if we are concerned about the accuracy, but we don't need to add all the basic checks for each distribution separately.

The other part of this PR is behavior when np.isposinf(mu). This PR only adds it for pdf, ppf, and isf, it looks like? What about the cdf, moments, rvs, etc? I'd suggest that we not worry about making this change. It is very time-consuming and error-prone to add special cases like this to all methods of all distributions, especially with the changes are mixed in with others (like adding ppf/isf - it should be a separate PR, if anything). I think the time is better spent fixing other issues and higher-impact enhancements for now, then build a mechanism for these special cases into the overhauled distribution infrastructure (see gh-15928). When that's done, we can define special cases like this for the distribution at a high level (e.g. if np.isposinf, use invgauss methods instead) so that we won't need to add _lazywhere to every method and we won't have to write tests for every distribution separately.

zoj613 · 2022-08-04T08:11:57Z

@zoj613 For this one, I'd also suggest that we close this and I'll open a PR that would use Boost for ppf/isf. I don't think I'd add the tests because the distribution infrastructure already performs the basic test to confirm that the methods are consistent with one another (e.g. see the ppf/cdf test here), tests of the domain (e.g. what happens with input NaNs), etc. Beyond that, it's a test of whether Boost's distributions are accurate or not. We can add additional tests, if we are concerned about the accuracy, but we don't need to add all the basic checks for each distribution separately.

The other part of this PR is behavior when np.isposinf(mu). This PR only adds it for pdf, ppf, and isf, it looks like? What about the cdf, moments, rvs, etc? I'd suggest that we not worry about making this change. It is very time-consuming and error-prone to add special cases like this to all methods of all distributions, especially with the changes are mixed in with others (like adding ppf/isf - it should be a separate PR, if anything). I think the time is better spent fixing other issues and higher-impact enhancements for now, then build a mechanism for these special cases into the overhauled distribution infrastructure (see gh-15928). When that's done, we can define special cases like this for the distribution at a high level (e.g. if np.isposinf, use invgauss methods instead) so that we won't need to add _lazywhere to every method and we won't have to write tests for every distribution separately.

If the boost implementation will provide comparable performance then i'm fine with closing this in favor of a simpler implementation. I do wonder if boost uses the same implementation from the paper because if not then accuracy issues might need to be addressed.

mdhaber · 2022-08-04T14:29:21Z

scipy/stats/tests/test_distributions.py

+                        4.92818217e-16], atol=1e-12)
+
+        # tests if algorithm does not diverge for small probabilities.
+        assert np.isclose(stats.invgauss.ppf(0.00013, mu=1, scale=3),


My guess for why these fail is that the parameterization used in the paper may be different than ours. That is, scale means something different in the paper. Ours is just a linear stretching of the real line, and the implementation is common to all distributions.

mdhaber · 2022-08-05T06:13:20Z

scipy/stats/tests/test_distributions.py

+        assert_allclose(fns, [0., 0., 1.27054940e-21, 1.96413583e-17,
+                        2.16840434e-19, 2.22044605e-16], atol=1e-08)


Using Boost, fns is:

array([0.00000000e+00, 0.00000000e+00, 2.96461532e-21, 0.00000000e+00, 2.71050543e-19, 1.66533454e-16])

abs_error is:

array([2.96461532e-21, 5.59041745e-20, 2.71050543e-19, 2.16840434e-19, 5.20417043e-18, 2.77555756e-17, 1.66533454e-16, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00])

mdhaber · 2022-08-05T06:14:24Z

scipy/stats/tests/test_distributions.py

+        assert_allclose(fns2, [0., 0., 0., 9.15882970e-17, 1.16804249e-16,
+                        4.92818217e-16], atol=1e-08)


Using Boost, fns2, is:

array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.53458946e-16, 6.57090956e-16])

rel_error is:

array([3.58337988e-16, 4.45142890e-16, 3.53458946e-16, 0.00000000e+00, 1.15801436e-16, 1.16804249e-16, 6.57090956e-16, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00])

mdhaber · 2022-08-05T06:17:01Z

scipy/stats/tests/test_distributions.py

+        )
+        assert_allclose(fns2, [0., 0., 1.17819649e-16, 0.,
+                        3.50412747e-16, 1.68867330e-11], atol=1e-08)
+


Yes, Boost would pass all the tests added here (except it doesn't handle the special case of mu='inf', which is a separate thing). It's also quite fast:

from scipy import stats rng = np.random.default_rng(0) rvs = stats.invgauss.rvs(1, size=1000, random_state=rng) %timeit stats.invgauss.ppf(rvs, mu) # 885 µs with Boost, 7.26 ms this PR

zoj613 · 2022-08-05T08:04:19Z

Boost seems like a big improvement especially given the simplicity it provides. So i'm happy for us to close this PR in favor of the boost implementation.

mdhaber · 2022-08-05T22:17:08Z

Here's a somewhat brutal test:

import numpy as np
from scipy import stats

az = np.logspace(-3, 3, 11)
pz = np.logspace(-8, -1, 8)
pz = np.concatenate([pz, [0.5], 1-pz[::-1]])[:, np.newaxis]

def relative_error(res, ref):
    return np.float64(abs((res-ref)/ref))

xz = stats.invgauss.ppf(pz, az)  # time this step
res = stats.invgauss.cdf(xz, az)
err = relative_error(res, pz)

Times:
Main: 399 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This PR: 151 ms ± 891 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Boost: 482 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Histogram of the log of the roundtrip relative error:

Number of NaNs produced by ppf:
Main: 0
This PR: 65
Boost: 26 # it has trouble when mu is too small

I think I'll open the Boost PR and when the result is NaN revert to the default implementation.

zoj613 marked this pull request as ready for review March 9, 2021 14:48

zoj613 commented Mar 9, 2021

View reviewed changes

scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved

zoj613 mentioned this pull request Mar 9, 2021

FIX: Return correct output for invgauss.cdf when mu is very small #13616

Merged

zoj613 commented Mar 9, 2021

View reviewed changes

scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved

zoj613 commented Mar 9, 2021

View reviewed changes

scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved

zoj613 mentioned this pull request Mar 9, 2021

ENH: invgauss.pdf should return correct output when mu=infinity #13666

Open

zoj613 force-pushed the invgauss_pdf branch from d8ba653 to 30d8703 Compare March 14, 2021 16:24

rgommers added enhancement A new feature or improvement scipy.stats labels Mar 20, 2021

ilayn requested changes Apr 9, 2021

View reviewed changes

zoj613 force-pushed the invgauss_pdf branch from b8e1b48 to 3c5ab3f Compare April 10, 2021 00:25

zoj613 force-pushed the invgauss_pdf branch from 3c5ab3f to 3df5fd0 Compare April 10, 2021 08:29

zoj613 and others added 14 commits June 18, 2022 13:35

TST: use errstate to supress runtime warning

e532db3

BUG: address the overflow bug of inverse sf of gamma when input proba…

82b9397

…bility is very small

Address some comments about lazywhere

9c81b98

Simplify branches to improve readability based on recommendations

e84bbb0

STY: fix code formatting to make the CI linter happy

de3f25c

simplify _logpdf implementation so it is more readable

28c05ca

Correct inverse cdf-sf implementation and simplify make things more e…

1014b37

…xplicit

uddate only entries that have not converged on each iteration

da579cd

fix lint and indexing error

091d097

more linting...

c703aac

another error fix

dd43c1a

Dont divide by the pdf, compute in log scale to avoid underflow

a9fee93

refactor for readability

ce26dca

Update the ppf and isf code

9935750

zoj613 force-pushed the invgauss_pdf branch from 970b6c9 to 9935750 Compare August 3, 2022 20:55

mdhaber mentioned this pull request Aug 3, 2022

ENH: improve performance of the skewnorm cdf computation #15486

Closed

mdhaber reviewed Aug 3, 2022

View reviewed changes

zoj613 added 2 commits August 4, 2022 10:37

handle mu=inf for cdf and sf

10686ae

update tests

3be7485

mdhaber reviewed Aug 4, 2022

View reviewed changes

use correct reparametrization inside loop for cdf and pdf calls

fec4ede

mdhaber reviewed Aug 5, 2022

View reviewed changes

mdhaber closed this Aug 5, 2022

mdhaber mentioned this pull request Aug 5, 2022

ENH: stats.invgauss: use Boost implementation of ppf/isf #16790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Implement ppf and isf for invgauss #13665

ENH: Implement ppf and isf for invgauss #13665

zoj613 commented Mar 9, 2021 •

edited

zoj613 commented Mar 9, 2021 •

edited

zoj613 commented Mar 9, 2021 •

edited

zoj613 commented Mar 10, 2021 •

edited

mdhaber commented Mar 10, 2021 •

edited

zoj613 commented Mar 10, 2021 •

edited

zoj613 commented Mar 14, 2021

mdhaber commented Mar 14, 2021

zoj613 commented Mar 14, 2021 •

edited

mdhaber commented Mar 16, 2021 •

edited

zoj613 commented Apr 9, 2021

ilayn left a comment

zoj613 commented Apr 10, 2021

zoj613 commented Apr 10, 2021 •

edited

tupui commented Apr 10, 2021

zoj613 commented Apr 10, 2021

tupui commented Apr 10, 2021

zoj613 commented Apr 10, 2021

tupui commented Apr 10, 2021

zoj613 commented Apr 10, 2021 •

edited

zoj613 commented Aug 3, 2022

mdhaber commented Aug 3, 2022

mdhaber Aug 3, 2022

mdhaber commented Aug 3, 2022 •

edited

zoj613 commented Aug 4, 2022

mdhaber Aug 4, 2022

mdhaber Aug 5, 2022

mdhaber Aug 5, 2022 •

edited

mdhaber Aug 5, 2022 •

edited

zoj613 commented Aug 5, 2022

mdhaber commented Aug 5, 2022

		assert_allclose(fns, [0., 0., 1.27054940e-21, 1.96413583e-17,
		2.16840434e-19, 2.22044605e-16], atol=1e-08)

		assert_allclose(fns2, [0., 0., 0., 9.15882970e-17, 1.16804249e-16,
		4.92818217e-16], atol=1e-08)

ENH: Implement ppf and isf for invgauss #13665

ENH: Implement ppf and isf for invgauss #13665

Conversation

zoj613 commented Mar 9, 2021 • edited

Reference issue

What does this implement/fix?

Additional information

References

zoj613 commented Mar 9, 2021 • edited

zoj613 commented Mar 9, 2021 • edited

zoj613 commented Mar 10, 2021 • edited

mdhaber commented Mar 10, 2021 • edited

zoj613 commented Mar 10, 2021 • edited

zoj613 commented Mar 14, 2021

mdhaber commented Mar 14, 2021

zoj613 commented Mar 14, 2021 • edited

mdhaber commented Mar 16, 2021 • edited

zoj613 commented Apr 9, 2021

ilayn left a comment

Choose a reason for hiding this comment

zoj613 commented Apr 10, 2021

zoj613 commented Apr 10, 2021 • edited

tupui commented Apr 10, 2021

zoj613 commented Apr 10, 2021

tupui commented Apr 10, 2021

zoj613 commented Apr 10, 2021

tupui commented Apr 10, 2021

zoj613 commented Apr 10, 2021 • edited

zoj613 commented Aug 3, 2022

mdhaber commented Aug 3, 2022

mdhaber Aug 3, 2022

Choose a reason for hiding this comment

mdhaber commented Aug 3, 2022 • edited

zoj613 commented Aug 4, 2022

mdhaber Aug 4, 2022

Choose a reason for hiding this comment

mdhaber Aug 5, 2022

Choose a reason for hiding this comment

mdhaber Aug 5, 2022 • edited

Choose a reason for hiding this comment

mdhaber Aug 5, 2022 • edited

Choose a reason for hiding this comment

zoj613 commented Aug 5, 2022

mdhaber commented Aug 5, 2022

zoj613 commented Mar 9, 2021 •

edited

zoj613 commented Mar 9, 2021 •

edited

zoj613 commented Mar 9, 2021 •

edited

zoj613 commented Mar 10, 2021 •

edited

mdhaber commented Mar 10, 2021 •

edited

zoj613 commented Mar 10, 2021 •

edited

zoj613 commented Mar 14, 2021 •

edited

mdhaber commented Mar 16, 2021 •

edited

zoj613 commented Apr 10, 2021 •

edited

zoj613 commented Apr 10, 2021 •

edited

mdhaber commented Aug 3, 2022 •

edited

mdhaber Aug 5, 2022 •

edited

mdhaber Aug 5, 2022 •

edited