ENH: add stirling2 function to `scipy.special` #18103

rlucas7 · 2023-03-05T20:30:10Z

Reference issue

Closes gh-17890

What does this implement/fix?

Implements a ufunc in scipy.special for Stirling numbers of the second kind, in scipy/special/cephes.

Additional information

The method follows the discussion in the linked issue.

rlucas7 · 2023-03-06T23:53:44Z

it looks like the windows builds are failing:

2023-03-06T02:27:11.4935123Z error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\HostX86\x86\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -D_USE_MATH_DEFINES -ID:\a\1\s\scipy\special -IC:\hostedtoolcache\windows\Python\3.9.13\x86\lib\site-packages\numpy\core\include -IC:\hostedtoolcache\windows\Python\3.9.13\x86\include -Iscipy\_lib -Iscipy\_lib\boost_math\include -Iscipy\_build_utils\src -IC:\hostedtoolcache\windows\Python\3.9.13\x86\lib\site-packages\numpy\core\include -Ibuild\src.win32-3.9\numpy\distutils\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include -IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt -IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE -IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt /Tcscipy\special\cephes\stirling.c /Fobuild\temp.win32-3.9\scipy\special\cephes\stirling.obj /arch:SSE2" failed with exit status 2
2023-03-06T02:27:11.4939270Z scipy\special\cephes\stirling.c(32): error C2057: expected constant expression
2023-03-06T02:27:11.4940435Z scipy\special\cephes\stirling.c(32): error C2466: cannot allocate an array of constant size 0
2023-03-06T02:27:11.4941418Z scipy\special\cephes\stirling.c(32): error C2133: 'prev': unknown size
2023-03-06T02:27:11.4942535Z scipy\special\cephes\stirling.c(33): error C2057: expected constant expression
2023-03-06T02:27:11.4943476Z scipy\special\cephes\stirling.c(33): error C2466: cannot allocate an array of constant size 0
2023-03-06T02:27:11.4944425Z scipy\special\cephes\stirling.c(33): error C2133: 'curr': unknown size
2023-03-06T02:27:11.4945275Z INFO:

I suspect this is because I'm using const int arraysize = n+1; as the array size inside the function. This works fine for me locally when I include the c99 flag to gcc but I'm using a mac and I don't have access to a windows machine. Is there something different that needs to be done for the windows compilers to handle variable length arrays in windows?

rgommers · 2023-03-07T11:31:35Z

const int arraySize = n + 1;

I suspect this is because I'm using const int arraysize = n+1; as the array size inside the function. This works fine for me locally when I include the c99 flag to gcc but I'm using a mac and I don't have access to a windows machine. Is there something different that needs to be done for the windows compilers to handle variable length arrays in windows?

The const here is wrong - the value must be constant for the duration of the program, and here it depends on n which is a parameter to the function. So just drop the const I'd say. Why it's different across platforms is probably just because it's undefined behavior.

Also note this failure:

 Exception: /home/runner/work/scipy/scipy/scipy/special/tests/test_stirling.py is not installed

You can fix it by adding the new file to the listing in scipy/special/tests/meson.build.

rlucas7 · 2023-03-07T14:50:05Z

The const here is wrong ... So just drop the const I'd say. Why it's different across platforms is probably just because it's undefined behavior.

Got it, removed const in latest commit e6bb72d...

You can fix it by adding the new file to the listing in scipy/special/tests/meson.build.

Done in commit e6bb72d...

steppi · 2023-03-07T15:31:18Z

scipy/special/cephes/stirling.c

+    int arraySize = n + 1;
+    int prev[arraySize];
+    int curr[arraySize];


Just a quick comment, I won't have time to review the math until the weekend.

We should calloc these instead of declaring them on the stack (making sure to call free before returning). VLAs are part of the C99 standard, but became optional as of C11. I checked the toolchain roadmap and it looks like we shouldn't be using optional features like this.

Also, by declaring them on the stack, we can blow up the stack and get a segfault even with sane input. On my 64bit Linux machine I found that

import scipy.special as sc stirling2(2000000, 1999999) # Segmentation fault (core dumped)

Also, by declaring them on the stack, we can blow up the stack and get a segfault even with sane input. On my 64bit Linux machine I found that

import scipy.special as sc stirling2(2000000, 1999999) # Segmentation fault (core dumped)

Ok. FWIW the calculation will overflow long before this point so mayvbe it makes sense to raise an error if n > X where X is the value of overflow (17 on my machine w/int being 32 bit IIRc)?

We should calloc these instead of declaring them on the stack (making sure to call free before returning). VLAs are part of the C99 standard, but became optional as of C11. I checked the toolchain roadmap and it looks like we shouldn't be using optional features like this.

Hmm, I'm somewhat confused by the toolchain roadmap... I'll look at the calloc implementation.

@steppi Thanks very much for your comments

rlucas7 · 2023-03-08T01:31:02Z

I switched to heap memory from @steppi 's comments (and free) before returning. I ran the new version through valgrind locally:

:==12039== 
==12039== HEAP SUMMARY:
==12039==     in use at exit: 0 bytes in 0 blocks
==12039==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==12039== 
==12039== All heap blocks were freed -- no leaks are possible
==12039== 
==12039== Use --track-origins=yes to see where uninitialised values come from
==12039== For lists of detected and suppressed errors, rerun with: -s

so I think everything should be good from memory leak perspective AFAICS.
All returned values look the same as using VLAs (this is good).

Still not sure what we want to do about the overflow issue (large n or k input) though?

rlucas7 · 2023-03-08T03:05:21Z

FWIW it looks like the failing windows builds are the same issue that Tyler opened #18108

steppi · 2023-03-08T03:13:39Z

Actually, I'm pretty frightened by having all these allocations in the inner loop of a ufunc. This stackexchange answer seems to give an efficient solution based on the inclusion exclusion based formula.

rlucas7 · 2023-03-08T03:38:24Z

Actually, I'm pretty frightened by having all these allocations in the inner loop of a ufunc. This stackexchange answer seems to give an efficient solution based on the inclusion exclusion based formula.

I had initially looked into IE-see the issue,. In practice IE doesn't seem to be faster because authors ignore the cost of computing the binomial coefficients-one for each entry from i=1 to k is needed. As for the memory allocations, it's really only 1 per call which doesn't seem too bad to me.

The alternative is to do everything in pure python and compute up through an array of (n,k) values. See this comment from the original issue for that computing values approach. Another benefit of the pure python approach is greater precision (overflow is less of a concern) if we stay with the python int type.

We are duplicating work in the ufunc approach though if an array contains two of the same n values, we'd recompute the array for each call. You can easily imagine cases where this becomes inefficient. The challenge here is that the ufunc mechanism (AFAIK) does not provide for sharing computation across function calls. The assumption is data parallelism for each entry in the array, which an array of n,k parameters for Stirling numbers of the second kind clearly violates.

steppi · 2023-03-08T09:37:23Z

The IE based implementation from this gist is recomputing the binomial coefficient from scratch at each iteration instead of using the recurrence from the above link to update it with a few arithmetic operations at each step. The algorithm from the compsci stack exchange link uses fewer overall options than the allocation heavy approach.

steppi · 2023-03-08T17:34:16Z

I've come with a plan that might work.

Ok. FWIW the calculation will overflow long before this point so mayvbe it makes sense to raise an error if n > X where X is the value of overflow (17 on my machine w/int being 32 bit IIRc)?

OK, so the size of an int is platform dependent, but for compatibility uses it usually will be 32 bits, even on 64 bit machines.

The expression

$${2000000 \brace 1999999} = {2000000 \choose 2} = 1999999000000$$

won't overflow when using 64 bit ints, but would overflow when using 32 bit ints. We will want to support 64 bit. The current special infrastructure allows types int and long. On 64 bit linux a long is 64 bits, but on windows it's still 32 bits. We could either support both int and long, maybe using Cython with fused types, or just support long.

I still think we should be using the fast IE based algorithm from here, not the allocation based one. If one wants to compute many Stirling numbers, the allocation based one will duplicate too much effort, so isn't very efficient actually.

For overflow. I think we can check for it inside the compiled function and return -1 if there's been overflow. We can also have a pure python implementation which can handle arbitrary sized ints. There would be a Python wrapper, which first calls the vectorized ufunc. We then check for -1s. If any exist we convert the array to type object, and use the pure Python implementation to fill the -1s with the correct values.

Thoughts?

steppi · 2023-03-08T18:36:28Z

Oh, I see. Even when speeding up the binomial coefficient calculation, the IE formula is still unworkable because you need to divide by factorial at the end, and the numerator and denominator will both overflow even for reasonable input.

steppi

I'm open to other opinions on this, but I think we should make the return type long instead of int, so we can handle larger return values on platforms where long is 64 bit.

Also, we don't actually need to compute the full triangle so we can save some effort and memory. There's a way to set it up so only the necessary values are computed. See the relevant comment. ~~Hopefully I'll have time to post a pure Python implementation this week that you can refer to~~ I've posted a pure Python implementation you can refer to.

I'm open to more suggestions on this too, but I think it would be good to return -1 in case of overflow or failure to allocate memory. Since the actual result should always be non-negative, this would signal to the users that the value is incorrect. We'd just have to document this behavior well.

scipy/special/_cephes.pxd

scipy/special/cephes.h

scipy/special/functions.json

scipy/special/cephes/stirling.c

steppi · 2023-03-08T21:22:15Z

scipy/special/cephes/stirling.c

+        for (int i=1; i<arraySize; i++){
+            // build up next row in curr
+            for (int j=1; j<arraySize; j++){
+                curr[j] = prev[j - 1];
+                curr[j] += prev[j] * j;
+            }


We're doing too much work here and allocating more than we need to. To compute ${n \brace k}$ we don't need the full triangle. We only need a parallelogram of dimensions $k \times (n - k + 1)$ like in the image below.

If $k \leq n - k + 1$ we can allocate two arrays of size $k$ and compute the values in the parallelogram left to right and then top to bottom. If $k > n - k + 1$ we can allocate two arrays of size $n - k + 1$ and compute the values top to bottom and then left to right.

This also has the benefit of making it so that if an overflow occurs at any time during execution, then the final result will overflow. If we compute the full triangle this won't be the case. We need to be able to detect overflow, and this will simplify that.

import numpy as np def stirling2(n, k): if k <= 0 or k > n or n < 0: if n == 0: return 1 return 0 if k <= n - k + 1: current = [1]*k for i in range(1, n - k + 1): for j in range(1, k): current[j] = (j+1) * current[j] + current[j-1] else: current = [1]*(n - k + 1) for i in range(1, k): for j in range(1, n - k + 1): current[j] = (i+1) * current[j-1] + current[j] return current[-1]

@rlucas7 Here's a Python implementation that only computes the values in the triangle that are actually needed. It should be relatively straightforward to translate into C. Note that we will want malloc instead of calloc now because we no longer initialize with zero entries.

I've just updated the implementation above to only use one array instead of two.

We're doing too much work here and allocating more than we need to. To compute {nk} we don't need the full triangle. ... If k≤n−k+1 we can allocate two arrays of size k and compute the values in the parallelogram left to right and then top to bottom. If k>n−k+1 we can allocate two arrays of size n−k+1 and compute the values top to bottom and then left to right.

I understand this as taking lhs or the top (angular) array of 1s in the Stirling triangle and proceeding to do the calculations these ways, along angled movement of a row.

Clever.

This also has the benefit of making it so that if an overflow occurs at any time during execution, then the final result will overflow.

I agree.

If we compute the full triangle this won't be the case. We need to be able to detect overflow, and this will simplify that.

I don't follow the reasoning of these last 2 sentences. It seems to me that either case it's needed to check for overflow. Unless I'm mistaken the same n,k values overflow in either case (using array based dynamic programs). Did I miss something?

I don't follow the reasoning of these last 2 sentences. It seems to me that either case it's relatively straightforward to check. Unless I'm mistaken the same n,k values overflow in either case (using array based dynamic programs). Did I miss something?

I just mean, in the reduced schedule where we only compute stirling numbers over the parallelogram, we just need to check if overflow occurs since every value computed will take part in the final result. If we do the whole triangle up to ${n \brace k}$, some of the entries of the array will overflow but these won't be used in the final result. It's not that difficult to check with the full triangle, it's just that we need to check for overflow and also that the value will be used in the final result. I like the simpler condition where we just have to check for overflow better.

... it's just that we need to check for overflow and also that the value will be used in the final result. I like the simpler condition where we just have to check for overflow better.

Got it-Thanks.

FWIW After switching to long outputs in C (on my mac) the overflow bumps up from first happening at 17-18 to 25-26,

stirling2(25,0) = 0 stirling2(25,1) = 1 stirling2(25,2) = 16777215 stirling2(25,3) = 141197991025 stirling2(25,4) = 46771289738810 stirling2(25,5) = 2436684974110751 stirling2(25,6) = 37026417000002430 stirling2(25,7) = 227832482998716310 stirling2(25,8) = 690223721118368580 stirling2(25,9) = 1167921451092973005 stirling2(25,10) = 1203163392175387500 stirling2(25,11) = 802355904438462660 stirling2(25,12) = 362262620784874680 stirling2(25,13) = 114485073343744260 stirling2(25,14) = 25958110360896000 stirling2(25,15) = 4299394655347200 stirling2(25,16) = 526655161695960 stirling2(25,17) = 48063331393110 stirling2(25,18) = 3275678594925 stirling2(25,19) = 166218969675 stirling2(25,20) = 6220194750 stirling2(25,21) = 168519505 stirling2(25,22) = 3200450 stirling2(25,23) = 40250 stirling2(25,24) = 300 stirling2(25,25) = 1 stirling2(25,26) = 0 stirling2(26,0) = 0 stirling2(26,1) = 1 stirling2(26,2) = 33554431 stirling2(26,3) = 423610750290 stirling2(26,4) = 187226356946265 stirling2(26,5) = 12230196160292565 stirling2(26,6) = 224595186974125331 stirling2(26,7) = 1631853797991016600 stirling2(26,8) = 5749622251945664950 stirling2(26,9) = -7245227292754425991 stirling2(26,10) = -5247188700862703611 stirling2(26,11) = -8417665732711074856 stirling2(26,12) = 5149507353856958820 stirling2(26,13) = 1850568574253550060 stirling2(26,14) = 477898618396288260 stirling2(26,15) = 90449030191104000 stirling2(26,16) = 12725877242482560 stirling2(26,17) = 1343731795378830 stirling2(26,18) = 107025546101760 stirling2(26,19) = 6433839018750 stirling2(26,20) = 290622864675 stirling2(26,21) = 9759104355 stirling2(26,22) = 238929405 stirling2(26,23) = 4126200 stirling2(26,24) = 47450 stirling2(26,25) = 325 stirling2(26,26) = 1 stirling2(26,27) = 0

rlucas7 · 2023-03-10T03:58:13Z

the IE formula is still unworkable ... overflow

Yes. I noticed this too, the +/-1s in the IE formulation indicate that something is too large (at some point) and needs to be subtracted, (or vice versa) so that overflows are likely to occur sooner. If arbitrary precision (gmp or pure python) we don't have this concern but not the case here.

rlucas7 · 2023-03-10T05:32:10Z

@steppi Latest commit has your memory optimization-using only 1 array. I'll think a bit more on the overflow checking.
From my local testing/spot-checking it seems like a check for curr[j] <=0 in the inner for loop would work ~~but I haven't tested it out yet so I'll maybe do those over the weekend~~ so I added that too.

Also, to disambiguate the potential-albeit unlikely-malloc error I'm thinking to return a -2 for the overflow whilst keeping the -1 return for the malloc error.

steppi

Overflow of unsigned integers in C leads to ~~unsigned~~ undefined behavior. See the suggestion for how to check properly. I also suggested a long comment to explain the reduced version of the dynamic programming algorithm. Let me know what you think of it.

Also, can you add some tests that check overflow is being handled properly?

scipy/special/cephes/stirling.c

scipy/special/_add_newdocs.py

steppi · 2023-03-11T16:39:55Z

@rlucas7, I thought it would be good to also support arbitrarily large ints using a pure Python implementation. I've submitted a nested PR rlucas7#1 to make stirling2 work similarly to scipy.special.comb. It is unfinished, it still needs an implementation of an asymptotic approximation for the exact=False case, is in need of updated documentation, and probably needs some polishing. Please take a look and let me know what you think.

steppi · 2023-03-13T00:14:03Z

I implemented the first and second order asymptotic approximations from Temme in Python just to see how well they work. I've been having fun with this. With the second order approximation relative error doesn't seem terrible. On the order of 1e-6 to 1e-7 for the few test cases below. I think we'll need to go to 3rd or 4th order to get reasonably close to machine precision.

To implement this, we' can use binom from special/orthogonal_eval.pxd and lambertw_scalar from special/_lambertw.pxd. I don't how to call these from C but we could call them from Cython. ~~@rlucas7, once we get everything sorted out with rlucas7#1, would you be willing to translate special/cephes/stirling.c into Cython at special/_stirling2.pxd?~~ nevermind, we can just keep your stirling2 in stirling.c, and I can move stirling2_approx into Cython in a future PR for the asymptotics

import numpy as np
import scipy.special as sc


def stirling2_asymptotic(n, k):
    mu = k/n
    k, n = float(k), float(n)

    def f(x):
        return mu * x - 1 + np.exp(-x)

    def df(x):
        return mu - np.exp(-x)

    def d2f(x):
        return np.exp(-x)

    delta = 1/mu * np.exp(-1/mu)
    x0 = 1/mu + sc.lambertw(-delta)
    t0 = (n - k) / k
    F = np.sqrt(t0/((1 + t0)*(x0 - t0)))
    A = -n*np.log(x0) + k*np.log(np.exp(x0) - 1) - k*t0 + (n - k)*np.log(t0)
    try:
        result = np.exp(A)*k**(n - k)*F*sc.binom(n, k)
    except OverflowError:
        result = np.inf
    return result


def stirling2_asymptotic_order2(n, k):
    mu = k/n
    k, n = float(k), float(n)

    def f(x):
        return mu * x - 1 + np.exp(-x)

    def df(x):
        return mu - np.exp(-x)

    def d2f(x):
        return np.exp(-x)

    delta = 1/mu * np.exp(-1/mu)
    x0 = 1/mu + sc.lambertw(-delta)
    t0 = (n - k) / k
    F = np.sqrt(t0/((1 + t0)*(x0 - t0)))
    A = -n*np.log(x0) + k*np.log(np.exp(x0) - 1) - k*t0 + (n - k)*np.log(t0)
    F1 = (
        -2*x0**3 + 2*t0**5 + 4*t0**3 + 4*t0**4 + 3*x0**2 * t0 - 6*x0*t0**4
        - 5*x0**2 * t0**2 + 2*x0**4 * t0 + x0**3*t0 - 6*x0**3 * t0**2
        + 8*x0**2 * t0**3
    ) / (24*F*(1 + t0)**2*(x0 - t0)**4)
    try:
        result = np.exp(A)*k**(n - k)*sc.binom(n, k)*(F - F1/k)
    except OverflowError:
        result = np.inf
    return result


for n, k in ((100, 20), (112, 45), (200, 33)):
    exact = sc.stirling2(n, k)
    asymptotic = stirling2_asymptotic(n, k)
    asymptotic2 = stirling2_asymptotic_order2(n, k)
    rel_error = abs(float(exact) - asymptotic) / abs(float(exact))
    order2_rel_error = abs(float(exact) - asymptotic2) / abs(float(exact))
    print("-------------------")
    print(f"stirling2({n}, {k})")
    print("--------------------")
    print("rounded exact val:", float(exact))
    print("asymptotic:", asymptotic)
    print("relative error:", rel_error)
    print("asymptotic order2:", asymptotic2)
    print("relative error order2:", order2_rel_error)

The output is

-------------------
stirling2(100, 20)
--------------------
rounded exact val: 4.619255470442159e+111
asymptotic: (4.621063648187713e+111+0j)
relative error: 0.0003914435469360925
asymptotic order2: (4.6192562713383553e+111+0j)
relative error order2: 1.7338209619962726e-07
-------------------
stirling2(112, 45)
--------------------
rounded exact val: 1.619249775669471e+127
asymptotic: (1.6200869069683281e+127+0j)
relative error: 0.0005169871328289221
asymptotic order2: (1.6192552623088327e+127+0j)
relative error order2: 3.3883835861936724e-06
-------------------
stirling2(200, 33)
--------------------
rounded exact val: 5.41301537436116e+266
asymptotic: (5.413628464612622e+266+0j)
relative error: 0.00011326224092513798
asymptotic order2: (5.413011555128007e+266+0j)
relative error order2: 7.055648079593478e-07

rlucas7

IIRC the docstring does need to be raw style.

steppi · 2023-06-23T12:07:37Z

Docs failures have been resolved. The arpack test, test_hermitian_mode, is still failing for Linux Meson tests / Meson build (3.11, false) (pull_request); I plan to look into that. The other failure in Linux Meson tests / Prerelease deps and 64-bit BLAS (3.9) (pull_request) is gh-18721.

alugowski · 2023-06-24T08:22:42Z

The arpack test, test_hermitian_mode, is still failing for Linux Meson tests / Meson build (3.11, false) (pull_request); I plan to look into that.

Hopefully this can help to get started #18736 :)

rlucas7 · 2023-06-26T02:17:57Z

@steppi I worked through Temme's paper this weekend and reproduced the results in that paper. In particular his equation 4.4 on page 10 matches what you have here in the details code dropdown. I reproduce the max error values with my own implementation (I missed the code in the dropdown tab in your comment) so I independently reproduced both the code and the max error values Temme has noted for n=10, 20, etc for the second order approximation. I think that's what we should use here-unless you've got a more accurate one. The choice of N where we use the approximation over the exact value needs to be made. It seems with the second order the max relative error for N=50 is already 0.00002 at m=7 and for N=100 at m=13 with max relative error of 4.7973885812281365e-06 so probably something at or above N=100 is fine. For larger values of N:

(max rel error, K that achieves max error, N)
(1.1865872292929588e-06, 26, 200)
(9.36707897293228e-07, 29, 225)
(1.62298310561457e-07, 20, 250)
(2.1612156823753847e-08, 14, 275)
(5.521646313770846e-11, 11, 300)

so somewhere between 275 and 300 seems ok as a threshold.

incidentally if you want me to look at a pr for the Temme approximation just drop me an email once it's up. I'm comfortable reviewing a PR on that now.

Here is the code I used to generate the values noted here:

from scipy import optimize
from math import sqrt, comb
## calculated via exact method using dp soln above...
# setting n=10 we can verify that max relative error occurs @ m=4 w/value =0.000466 (rounds to value given in Temme paper)
# also n=20 verifies max relative error in paper too @ m=3 w/value = (0.000116, rounds to 0.00012 in paper)
n = 275
actual = stirling2(n, range(n+1))
m2 = m1 = (-np.inf, -1)
for m in range(1, n):
    # calculate t0 as described on page 235 of temme
    # this is noted in the writing just after equation 2.6 on page 235
    t0 = (n-m)/m
    def func(x):
        return (m/n)*x - 1 + np.exp(-x)
    # calculate x0 by solving root finding problem
    # equation 2.5 in Temme
    res = optimize.root(func, n/m, method='broyden1', tol=1e-14)
    x0 = res.x.take(0)
    # now 
    def phi(x):
        return -n * np.log(x) + m*np.log(np.exp(x) - 1)
    A = phi(x0) - m*t0 + (n-m)*np.log(t0)
    # typically f() will be called w/argument t0
    def f(x):
        return sqrt(x / ((1+x) * (x0-x)) )
    # needed for second order approx 
    def f1(x):
        num = -2*x0**3 + 2*x**5 + 4*x**4 + 4*x**3 + 3*x*x0**2 - 6*x0*x**4 - 5*x0**2*x**2 + 2*x0**4*x + x0**3*x - 6*x0**3*x**2 + 8*x0**2*x**3
        denom = 24*f(x)*(1+x)**2*(x0-x)**4
        return num/denom
    # first order approx gives
    first = np.exp(A)*m**(n-m)*f(t0)*comb(n,m)
    # second order approx gives 
    second = np.exp(A)*m**(n-m)*(f(t0)-f1(t0)/m)*comb(n,m)
    rel_err1 = (actual[m] - first) / first
    rel_err2 = (actual[m] - second) / second
    if abs(rel_err1) > m1[0]:
        m1 = (abs(rel_err1), m)
    if abs(rel_err2) > m2[0]:
        m2 = (abs(rel_err2), m)
    print(f"n={n}, m={m}, first = {first}, second={second}, actual={actual[m]}, rel_err1={rel_err1}, rel_err2={rel_err2}")

print(m1, m2)

steppi · 2023-06-26T12:32:08Z

@rlucas7 Cool. Yeah, I agree we should start with the second order approximation. Later on if we get bandwidth we can try to independently work out higher order approximations.

I also have the following approximation based on the inclusion/exclusion formula, and using the Lanczos approximation to tame the factorial, cancelling as much as possible. This can be used for smaller values. It doesn't work as k approaches n, I think we can just do a floating point version of the recurrence in those cases when n is still too small for the asymptotic expansion to be usable.

def stirling2_double(n, k):
    result = 0
    k_choose_j = 1
    for j in range(k):
        term = (
            (-1)**j * k_choose_j * (1 - j/k)**(k + 0.5)
            * (k - j)**(n - k - 0.5)
        )
        result += term
        k_choose_j = k_choose_j * (k - j) / (j + 1)
    return (
        result
        * (np.e / (1 + (lanczos_g + 0.5)/k))**(k + 0.5)
        / _lanczos_sum_expg_scaled(k + 1)
    )

Let's merge the current PR this week, and I'll submit a PR for the approximation next weekend.

Update: Improved approximation above by pulling a constant factor out of the sum.

rlucas7 · 2023-07-17T21:54:51Z

It looks like the docs still aren't rendering the brace notation (and the r""" is now in the docs. Let me know if anyone has any ideas on how to fix that.

If you want to check yourself the link is:
https://output.circle-artifacts.com/output/job/17f05edd-f24b-4c0d-ad13-89a0030bf835/artifacts/0/html/reference/generated/scipy.special.stirling2.html
it looks like there are a few outstanding items on this one still-besides the docstring rendering properly.

There are also approximation method(s) to be added.

@steppi if you want I can put together what I had for Temme's second order and add it here to the existing PR?

I spent some time studying the Lanczos approaches in Pugh's thesis I'm still working through the Chebyshev approximation part but I get the gist of that particular approach. I didn't have access to the reference so I needed to work out the derivation for Lemma 3.2. In any case I feel comfortable reviewing a PR for that work if you wanted to put up something for lower values of (n,k), e.g. where n < 300. Using the inclusion-exclusion and Lanczos approximation you indicated. I haven't tested out either of the numerical approaches indicated in the thesis so I'm not sure of their strengths/weaknesses using actual values.

I'm also not sure what the consensus is on the 0-dimension numpy array, do we need to change the behavior in this PR? I think the answer is yes but want to confirm before adding another change.

scipy/special/_basic.py

steppi · 2023-07-17T22:12:10Z

I'm also not sure what the consensus is on the 0-dimension numpy array, do we need to change the behavior in this PR? I think the answer is yes but want to confirm before adding another change.

Let’s follow @h-vetinari’s precedent for the factorial functions. He made a good case for it in gh-18768.

@steppi if you want I can put together what I had for Temme's second order and add it here to the existing PR?

Let’s save it for a followup PR. Once the doc issues are resolved and the behavior lines up with the factorial functions regarding 0d arrays I think it’ll be ready to merge.

steppi · 2023-07-17T22:20:04Z

Besides the 0d array thing, we made stirling2 accept object array of int input and the factorial functions don’t. We should probably make stirling2 match the factorial functions with respect to this too. @h-vetinari do you have any thoughts?

rlucas7 · 2023-07-17T22:25:22Z

Besides the 0d array thing, we made stirling2 accept object array of int input and the factorial functions don’t. We should probably make stirling2 match the factorial functions with respect to this too. @h-vetinari do you have any thoughts?

The reason we used object as the dtype was for arbitrary precision integers b/c the stirling numbers of the second kind overflow quickly for both 32 and 64 bit integers. If we switched back we'd be dealing with the precision issue again and it would severely limit the usefulness of this function. E.g. IIRC all n>27 would overflow in 64 bit.

steppi · 2023-07-17T22:27:07Z

Besides the 0d array thing, we made stirling2 accept object array of int input and the factorial functions don’t. We should probably make stirling2 match the factorial functions with respect to this too. @h-vetinari do you have any thoughts?

The reason we used object as the dtype was for arbitrary precision integers b/c the stirling numbers of the second kind overflow quickly for both 32 and 64 bit integers. If we switched back we'd be dealing with the precision issue again and it would severely limit the usefulness of this function. E.g. all n>27 IIRC.

Sorry. That’s not what I mean. The factorial functions also work with Python ints internally and return an object array, but the input array must be some kind of numpy int dtype. I’d made a suggestion earlier that would let stirling2 accept object arrays as input which might have been bad from a consistency standpoint.

scipy/special/_basic.py

scipy/special/tests/test_basic.py

scipy/special/_basic.py

rlucas7 · 2023-07-17T23:14:03Z

Besides the 0d array thing, we made stirling2 accept object array of int input and the factorial functions don’t. We should probably make stirling2 match the factorial functions with respect to this too. @h-vetinari do you have any thoughts?

The reason we used object as the dtype was for arbitrary precision integers b/c the stirling numbers of the second kind overflow quickly for both 32 and 64 bit integers. If we switched back we'd be dealing with the precision issue again and it would severely limit the usefulness of this function. E.g. all n>27 IIRC.

Sorry. That’s not what I mean. The factorial functions also work with Python ints internally and return an object array, but the input array must be some kind of numpy int dtype. I’d made a suggestion earlier that would let stirling2 accept object arrays as input which might have been bad from a consistency standpoint.

Ah, I see what you mean. IIRC the current implementation raises an error in the case when the input is an object array in the input. There is a test that confirms this behavior in this PR. Is there something that needs to be done for the object input beyond what's already done?

should fix the 0-dim case and not break the handling for existing scalar case. Co-authored-by: Daniel Schmitz <40656107+dschmitz89@users.noreply.github.com> Co-authored-by: Albert Steppi <albert.steppi@gmail.com>

scipy/special/_basic.py

scipy/special/tests/test_basic.py

change test w/decision to *not* accept input array w/dtype object

h-vetinari · 2023-07-19T07:20:20Z

If it helps (for example with the whole 0d vs. scalar thing), you can probably take a look at the test infrastructure for factorial2 (factorial itself has a bunch of legacy stuff in there and is a bit more distracting), particularly the test for corner cases.

PS. The factorial functions (with exact=True) also switch to object dtype when the respective integer types are too small.

steppi

Looks good, I think this is ready. Thanks @rlucas7. If no one objects I'll merge tomorrow evening UTC-4.

rlucas7 · 2023-07-21T01:31:54Z

Looks good, I think this is ready. Thanks @rlucas7. If no one objects I'll merge tomorrow evening UTC-4.

@steppi SGTM would you like me to put together a separate PR for the Temme approximation?

steppi · 2023-07-21T03:04:37Z

@steppi SGTM would you like me to put together a separate PR for the Temme approximation?

Yes, that would be awesome.

rlucas7 requested a review from person142 as a code owner March 5, 2023 20:30

rlucas7 requested review from rgommers and removed request for rgommers March 5, 2023 20:30

tupui added enhancement A new feature or improvement scipy.special labels Mar 5, 2023

j-bowhay added the C/C++ Items related to the internal C/C++ code base label Mar 5, 2023

rlucas7 requested review from rgommers, josef-pkt and steppi March 6, 2023 23:54

steppi reviewed Mar 7, 2023

View reviewed changes

rgommers changed the title ~~add stirling2 function to scipy special cephes~~ ENH: add stirling2 function to scipy.special Mar 8, 2023

steppi requested changes Mar 9, 2023

View reviewed changes

steppi requested changes Mar 10, 2023

View reviewed changes

scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved

scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved

scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved

steppi reviewed Mar 10, 2023

View reviewed changes

scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved

rlucas7 commented Mar 11, 2023

View reviewed changes

scipy/special/_add_newdocs.py Outdated Show resolved Hide resolved

steppi mentioned this pull request Mar 13, 2023

ENH: Make stirling2 support pure python when overflow rlucas7/scipy#1

Closed

rlucas7 commented Jun 23, 2023

View reviewed changes

DOC: Add blank line after bullet list

bace058

steppi mentioned this pull request Jun 29, 2023

RFC: Ufunc return type for zero dimensional array input. #18768

Open

dschmitz89 reviewed Jul 17, 2023

View reviewed changes

scipy/special/_basic.py Outdated Show resolved Hide resolved

rlucas7 commented Jul 17, 2023

View reviewed changes

scipy/special/_basic.py Outdated Show resolved Hide resolved

rlucas7 commented Jul 17, 2023

View reviewed changes

scipy/special/tests/test_basic.py Outdated Show resolved Hide resolved

steppi reviewed Jul 17, 2023

View reviewed changes

scipy/special/_basic.py Outdated Show resolved Hide resolved

scipy/special/_basic.py Outdated Show resolved Hide resolved

Apply suggestions from code review

b573509

should fix the 0-dim case and not break the handling for existing scalar case. Co-authored-by: Daniel Schmitz <40656107+dschmitz89@users.noreply.github.com> Co-authored-by: Albert Steppi <albert.steppi@gmail.com>

steppi reviewed Jul 17, 2023

View reviewed changes

scipy/special/_basic.py Outdated Show resolved Hide resolved

rlucas7 commented Jul 18, 2023

View reviewed changes

scipy/special/_basic.py Outdated Show resolved Hide resolved

Update scipy/special/_basic.py

59134f6

rlucas7 commented Jul 19, 2023

View reviewed changes

scipy/special/tests/test_basic.py Outdated Show resolved Hide resolved

Update scipy/special/tests/test_basic.py

dbc6f3d

change test w/decision to *not* accept input array w/dtype object

steppi approved these changes Jul 19, 2023

View reviewed changes

steppi merged commit eff66d0 into scipy:main Jul 21, 2023
24 of 25 checks passed

steppi added this to the 1.12.0 milestone Dec 4, 2023

steppi mentioned this pull request Dec 4, 2023

DOC: 1.12.0 release notes #19628

Merged

9 tasks

ENH: add stirling2 function to scipy.special #18103

ENH: add stirling2 function to scipy.special #18103

Conversation

rlucas7 commented Mar 5, 2023 • edited by rgommers

Reference issue

What does this implement/fix?

Additional information

rlucas7 commented Mar 6, 2023

rgommers commented Mar 7, 2023

rlucas7 commented Mar 7, 2023

steppi Mar 7, 2023 • edited

Choose a reason for hiding this comment

rlucas7 Mar 8, 2023 • edited

Choose a reason for hiding this comment

rlucas7 commented Mar 8, 2023

rlucas7 commented Mar 8, 2023

steppi commented Mar 8, 2023 • edited

rlucas7 commented Mar 8, 2023 • edited

steppi commented Mar 8, 2023 • edited

steppi commented Mar 8, 2023 • edited

steppi commented Mar 8, 2023

steppi left a comment • edited

Choose a reason for hiding this comment

steppi Mar 8, 2023 • edited

Choose a reason for hiding this comment

steppi Mar 9, 2023 • edited

Choose a reason for hiding this comment

steppi Mar 9, 2023

Choose a reason for hiding this comment

rlucas7 Mar 10, 2023 • edited

Choose a reason for hiding this comment

steppi Mar 10, 2023

Choose a reason for hiding this comment

rlucas7 Mar 10, 2023

Choose a reason for hiding this comment

rlucas7 Mar 10, 2023

Choose a reason for hiding this comment

rlucas7 commented Mar 10, 2023 • edited

rlucas7 commented Mar 10, 2023 • edited

steppi left a comment • edited

Choose a reason for hiding this comment

steppi commented Mar 11, 2023

steppi commented Mar 13, 2023 • edited

rlucas7 left a comment

Choose a reason for hiding this comment

steppi commented Jun 23, 2023

alugowski commented Jun 24, 2023

rlucas7 commented Jun 26, 2023

steppi commented Jun 26, 2023 • edited

rlucas7 commented Jul 17, 2023

steppi commented Jul 17, 2023

steppi commented Jul 17, 2023

rlucas7 commented Jul 17, 2023 • edited

steppi commented Jul 17, 2023 • edited

rlucas7 commented Jul 17, 2023

h-vetinari commented Jul 19, 2023

steppi left a comment

Choose a reason for hiding this comment

rlucas7 commented Jul 21, 2023

steppi commented Jul 21, 2023

ENH: add stirling2 function to `scipy.special` #18103

ENH: add stirling2 function to `scipy.special` #18103

rlucas7 commented Mar 5, 2023 •

edited by rgommers

steppi Mar 7, 2023 •

edited

rlucas7 Mar 8, 2023 •

edited

steppi commented Mar 8, 2023 •

edited

rlucas7 commented Mar 8, 2023 •

edited

steppi commented Mar 8, 2023 •

edited

steppi commented Mar 8, 2023 •

edited

steppi left a comment •

edited

steppi Mar 8, 2023 •

edited

steppi Mar 9, 2023 •

edited

rlucas7 Mar 10, 2023 •

edited

rlucas7 commented Mar 10, 2023 •

edited

rlucas7 commented Mar 10, 2023 •

edited

steppi left a comment •

edited

steppi commented Mar 13, 2023 •

edited

steppi commented Jun 26, 2023 •

edited

rlucas7 commented Jul 17, 2023 •

edited

steppi commented Jul 17, 2023 •

edited