ENH: special: use boost in `nctdtr` #21728

dschmitz89 · 2024-10-20T20:42:03Z

Reference issue

What does this implement/fix?

Uses boost for nctdtr. The PR is modeled after #21505. The trickiest part was the mpmath reference implementation,.

Additional information

I did not migrate the noncentral t quantile nctdtrit yet as all default tests (no xslow etc.) ran successfully locally and I have to adjust the size of my PRs to the sleeping cycles of my little one, so trying to keep the small.

…assed to nctdtr

dschmitz89 · 2024-10-20T20:44:23Z

scipy/special/tests/test_cdflib.py

+
+@pytest.mark.parametrize(
+    "df,nc,x,expected,rtol",
+    [[3000., 3., 0.1, 0.0018657780826323328, 1e-13],


This is the test of the parameter range in the linked issue.

fancidev

Looks good to me! Just some small comments below.

scipy/special/boost_special_functions.h

Co-authored-by: Albert Steppi <albert.steppi@gmail.com>

dschmitz89 · 2024-10-21T19:15:08Z

@josef-pkt @bashtage : would migrating nctdtr to boost potentially cause issues for statsmodels? Like with ncfdtr, for the vast majority of cases we get much better precision.

In case you have roundtrip tests for nctdtrit, nctdtridf or nctdtrinc, they might fail for extreme inputs though. Don't worry though, scipy.special contains its own low level root finder now, so whatever CDF we use, we can invert it with respect to any parameter we want in principle.

josef-pkt · 2024-10-21T19:31:39Z

I switched away from scipy stats distributions for both nct and ncf to use scipy special
statsmodels/statsmodels#8659

AFAIR, AFAICS for nct it was only a runtime warning that showed up during rootfinding in sample size computation.
It did not affect the final results. So, I don't know which corner/extreme case evaluated during the rootfinding caused the Runtime Warning.

(Because I always use older versions of scipy, lagging 1 to 3 years behind, I cannot investigate new problems when they show up.)

dschmitz89 · 2024-10-21T19:48:35Z

I switched away from scipy stats distributions for both nct and ncf to use scipy special statsmodels/statsmodels#8659

AFAIR, AFAICS for nct it was only a runtime warning that showed up during rootfinding in sample size computation. It did not affect the final results. So, I don't know which corner/extreme case evaluated during the rootfinding caused the Runtime Warning.

(Because I always use older versions of scipy, lagging 1 to 3 years behind, I cannot investigate new problems when they show up.)

Looking at statsmodels/statsmodels#8624, it would be interesting to know which parameters caused Boost's noncentral t CDF to go wild.

josef-pkt · 2024-10-21T20:34:31Z

I have now scipy 0.10

>>> stats.nct.cdf(4.30265273, 2., 0.978343)
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    stats.nct.cdf(4.30265273, 2., 0.978343)
  File "...\WPy64-31140\python-3.11.4.amd64\Lib\site-packages\scipy\stats\_distn_infrastructure.py", line 2186, in cdf
    place(output, cond, self._cdf(*goodargs))
  File "...\WPy64-31140\python-3.11.4.amd64\Lib\site-packages\scipy\stats\_continuous_distns.py", line 6826, in _cdf
    return np.clip(_boost._nct_cdf(x, df, nc), 0, 1)
RuntimeWarning: invalid value encountered in _nct_cdf

this is just looking at the values in the debugger of reported case in statsmodels issue
(I change the statsmodels code to use scipy.stats nct.cdf)

update
If I don't force the warning to error, then I get the number after the warning

>>> stats.nct.cdf(4.30265273, 2., 0.978343)

Warning (from warnings module):
  File "...\WPy64-31140\python-3.11.4.amd64\Lib\site-packages\scipy\stats\_continuous_distns.py", line 6826
    return np.clip(_boost._nct_cdf(x, df, nc), 0, 1)
RuntimeWarning: invalid value encountered in _nct_cdf
0.9107619358152059

correction: initial warning in statsmodels example is in sf, but that raises the same warning as cdf
stats.nct.sf(4.30265273, 2., 0.978343)

...
    return np.clip(_boost._nct_sf(x, df, nc), 0, 1)
RuntimeWarning: invalid value encountered in _nct_sf

bashtage · 2024-10-21T21:46:29Z

I don't recall the specifics TBH. It was failing in pre-release testing which led to the previous bug report.

dschmitz89 · 2024-10-22T04:45:10Z

stats.nct.cdf(4.30265273, 2., 0.978343)

Both warnings do not appear anymore with a recent scipy version. Maybe you could just suppress it.

This PR is ready from my side.

josef-pkt · 2024-10-22T15:35:01Z

Thanks,
We can keep using the scipy special version of nct in statsmodels for now, then users that upgrade to the next scipy will get the fixed nct boost version.

dschmitz89 · 2024-10-25T04:36:53Z

@steppi: if you could provide an example of how to use the new rootfinder with boost functions, I could tackle migrating a lot of cdflib functions over to boost over the next months piece by piece. Such PRs are not really popular as they are tedious but they fit my current time budget for individual PRs reasonably well.

steppi · 2024-10-28T20:36:35Z

scipy/special/tests/test_cdflib.py

               82.35640899964173, 84.45263768373256]
    assert_allclose(res, res_exp)

-@pytest.mark.xfail_on_32bit("32bit fails due to algorithm threshold")


Why are we removing these test cases? Were they failing because the old cdflib reference is inaccurate for some of them? The reference values could be replaced with mpmath then. This seems like a good opportunity to remove the xfail_on_32bit and show that they now pass.

Yes, the old reference values were very inaccurate for some cases. Allright, I will integrate them in the new tests against the mpmath reference. I might make a subjective selection though if that's okay with you.

For high noncentrality values of 38 such as in the test originally, the mpmath reference computes CDF values of the order of 1e-310. There also boost fails to get close or returns nan, so I dropped these for noncentrality of 15 instead.

Cases like this where the reference output is subnormal are why it makes sense to use an atol in tests. In cases like this, it's fine to return zero and signal underflow through sf_error. Is Boost returning nan where the previous method used to return 0? I'd count that as a bug.

Fully agree that the nan feels like a bug.

@steppi : I do , am just short on time at the moment. Not sure I can make it till the 1.15 release as I am also going on holiday at the end of the month. I want to spend my scipy activities until then mostly on review. If you can/want to take over the last todos here, that would be much appreciated as I think this PR is almost ready to merge.

Edit: I can also file an issue upstream about the nan values but I doubt that they will be fixed in a short time frame.

@steppi : I just had another look at the test cases for nc=38. Even for small values within the normal floating point range, we get high discrepancies.

For df, nc, t = 0.98, 38, 1.5 the mpmath reference computes 1.546906350750292e-295 but boost results in 2.5919953604832226e-97. Would you prefer to get the test to pass with something like atol=1e-90? I would simply ignore these edge cases for now to be honest. In the parameter ranges that matter most we get much better results with boost.

It looks like you set mp.dps too low to evaluate the sum. I set mp.dps=500 and got the result 2.591995360483094e-97 from your reference implementation. Much closer to Boost. If you're evaluating a sum with mpmath, you may need to set dps pretty high.

Ouch, thanks for spotting this. With mp.dps=500, the reference calculation did not finish within 10 minutes on my laptop though. It has performance issues at the moment but this is crazy. How long does it take on your machine? Might try out my Mac instead.

With 400 significant digits I also get 2.591995360483094e-97 within 12 minutes of computation time.

steppi · 2024-10-31T22:23:36Z

@steppi: if you could provide an example of how to use the new rootfinder with boost functions, I could tackle migrating a lot of cdflib functions over to boost over the next months piece by piece. Such PRs are not really popular as they are tedious but they fit my current time budget for individual PRs reasonably well.

Sure, that would be great. I'll do one when I get bandwidth for it, and will ping you on the PR. This would likely be after the xsf migration, which I hope to complete before we branch.

dschmitz89 · 2024-11-18T07:15:15Z

Allright, stars aligned and I had a bit of time yesterday.

The nan value originates from nctdtr(980, 38, 1.5), here boost computes a tiny negative value of app. -1e-70. In the wrapper, this is set to nan. I worked around it in the test by ignoring it. We could theoretically also clip for tiny negative values to zero but who knows if that is the better option for other cases.

There is now one failing parameter combination for 32bit CPUs. Let's skip the test again on 32bit then?

@steppi : let me know if you have a strong opinion here about any of the above points, otherwise I will add the test skip.

steppi · 2024-11-18T17:22:02Z

@dschmitz89, instead of removing the failing test case, mark it as xfail with a reason, and then do a conditional xfail for only the one failing case on 32bit.

Syntax is, instead of having a test case

[0.98, -3.8, 0.15, 0.9999528361700505, 1e-13], in the pytest.markparametrize` list, put

            pytest.param(
                [0.98, -3.8, 0.15, 0.9999528361700505, 1e-13],
                marks=pytest.mark.xfail(reason="Bug in underlying Boost math implementation"),
            )

for the failing test case on 32bit, your pytest.param will look like

pytest.param(
    test_case,  # Subsitute in your actual test case
    marks=pytest.mark.xfail(condition=sys.maxsize < 2**32, reason="Fails on 32 bit.")
)

I don't think it's a good idea to just get rid of failing test cases, or skip the whole test on 32bit because of only one failing case.

…ns negative value

scipy/special/tests/test_cdflib.py

dschmitz89 · 2024-11-21T07:11:46Z

Interesting that the 32bit job succeeds now. Maybe it had an issue in the way I handled the nan case which is now xfailed. Will push the latest stylistic updates without rerunning CI.

[skip ci]

dschmitz89 · 2024-11-21T11:37:19Z

The green CI results do not show up here anymore due to my last commit: for the reviewer, they can easily be found here.

dschmitz89 added 4 commits October 19, 2024 20:40

MAINT: migrate nctdtr to boost

ee3b896

MAINT: simplify unlikely case that neither float or single input is p…

1d1c896

…assed to nctdtr

DOC: update nctdtr acc. to boost capabilities

571c395

TST: add nctdtr tests

ffc69f4

dschmitz89 requested review from person142 and steppi as code owners October 20, 2024 20:42

github-actions bot added scipy.stats scipy.special C/C++ Items related to the internal C/C++ code base Cython Issues with the internal Cython code base enhancement A new feature or improvement labels Oct 20, 2024

dschmitz89 commented Oct 20, 2024

View reviewed changes

fancidev reviewed Oct 21, 2024

View reviewed changes

scipy/special/boost_special_functions.h Show resolved Hide resolved

scipy/special/boost_special_functions.h Show resolved Hide resolved

scipy/special/boost_special_functions.h Show resolved Hide resolved

steppi reviewed Oct 21, 2024

View reviewed changes

scipy/special/boost_special_functions.h Outdated Show resolved Hide resolved

dschmitz89 and others added 3 commits October 21, 2024 16:25

Fix behaviour for -inf [skip ci]

0b33349

Co-authored-by: Albert Steppi <albert.steppi@gmail.com>

TST: add more paramter combinations for nctdtr

4986d3a

MAINT: remove tests against unused old Fortran code

7ddeb68

TST: bump tolerance to pass on some platforms

a91a12a

steppi reviewed Oct 28, 2024

View reviewed changes

TST: refactor nctdtr tests in own class and readded gh19896 tests

6551972

lucascolley changed the title ~~ENH:use boost in special.nctdtr~~ ENH: special: use boost in nctdtr Nov 15, 2024

dschmitz89 added 2 commits November 16, 2024 08:57

MAINT: remove errorstate

f970ded

Merge branch 'main' into noncentralt_special_boost_migration

99e913f

dschmitz89 added this to the 1.15.0 milestone Nov 16, 2024

TST: fix mpmath reference values and ignore edge case

f763e4a

TST: use pytest parametrization and xfail test case where boost retur…

c2e000c

…ns negative value

dschmitz89 commented Nov 21, 2024

View reviewed changes

scipy/special/tests/test_cdflib.py Outdated Show resolved Hide resolved

scipy/special/tests/test_cdflib.py Outdated Show resolved Hide resolved

scipy/special/tests/test_cdflib.py Outdated Show resolved Hide resolved

scipy/special/tests/test_cdflib.py Outdated Show resolved Hide resolved

Apply suggestions from code review

1de5e38

[skip ci]

dschmitz89 mentioned this pull request Nov 21, 2024

Noncentral F distribution CDF edge cases boostorg/math#1198

Closed

steppi merged commit c75f588 into scipy:main Nov 22, 2024

dschmitz89 deleted the noncentralt_special_boost_migration branch November 22, 2024 17:00

dschmitz89 mentioned this pull request Nov 28, 2024

ENH: special: migrate cdflib functions to boost #21966

Open

26 tasks

Uh oh!

ENH: special: use boost in nctdtr #21728

ENH: special: use boost in nctdtr #21728

Uh oh!

Conversation

dschmitz89 commented Oct 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fancidev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschmitz89 commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josef-pkt commented Oct 21, 2024

Uh oh!

dschmitz89 commented Oct 21, 2024

Uh oh!

josef-pkt commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bashtage commented Oct 21, 2024

Uh oh!

dschmitz89 commented Oct 22, 2024

Uh oh!

josef-pkt commented Oct 22, 2024

Uh oh!

dschmitz89 commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steppi Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschmitz89 Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschmitz89 Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steppi commented Oct 31, 2024

Uh oh!

dschmitz89 commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steppi commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschmitz89 commented Nov 21, 2024

Uh oh!

dschmitz89 commented Nov 21, 2024

ENH: special: use boost in `nctdtr` #21728

ENH: special: use boost in `nctdtr` #21728

dschmitz89 commented Oct 20, 2024 •

edited

Loading

dschmitz89 commented Oct 21, 2024 •

edited

Loading

josef-pkt commented Oct 21, 2024 •

edited

Loading

dschmitz89 commented Oct 25, 2024 •

edited

Loading

steppi Oct 28, 2024 •

edited

Loading

dschmitz89 Oct 29, 2024 •

edited

Loading

dschmitz89 Nov 12, 2024 •

edited

Loading

dschmitz89 commented Nov 18, 2024 •

edited

Loading

steppi commented Nov 18, 2024 •

edited

Loading