Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add stirling2 function to scipy.special #18103

Merged
merged 36 commits into from Jul 21, 2023
Merged

Conversation

rlucas7
Copy link
Member

@rlucas7 rlucas7 commented Mar 5, 2023

Reference issue

Closes gh-17890

What does this implement/fix?

Implements a ufunc in scipy.special for Stirling numbers of the second kind, in scipy/special/cephes.

Additional information

The method follows the discussion in the linked issue.

@rlucas7 rlucas7 requested a review from person142 as a code owner March 5, 2023 20:30
@rlucas7 rlucas7 requested review from rgommers and removed request for rgommers March 5, 2023 20:30
@tupui tupui added enhancement A new feature or improvement scipy.special labels Mar 5, 2023
@j-bowhay j-bowhay added the C/C++ Items related to the internal C/C++ code base label Mar 5, 2023
@rlucas7
Copy link
Member Author

rlucas7 commented Mar 6, 2023

it looks like the windows builds are failing:

2023-03-06T02:27:11.4935123Z error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\HostX86\x86\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -D_USE_MATH_DEFINES -ID:\a\1\s\scipy\special -IC:\hostedtoolcache\windows\Python\3.9.13\x86\lib\site-packages\numpy\core\include -IC:\hostedtoolcache\windows\Python\3.9.13\x86\include -Iscipy\_lib -Iscipy\_lib\boost_math\include -Iscipy\_build_utils\src -IC:\hostedtoolcache\windows\Python\3.9.13\x86\lib\site-packages\numpy\core\include -Ibuild\src.win32-3.9\numpy\distutils\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include -IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt -IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE -IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt /Tcscipy\special\cephes\stirling.c /Fobuild\temp.win32-3.9\scipy\special\cephes\stirling.obj /arch:SSE2" failed with exit status 2
2023-03-06T02:27:11.4939270Z scipy\special\cephes\stirling.c(32): error C2057: expected constant expression
2023-03-06T02:27:11.4940435Z scipy\special\cephes\stirling.c(32): error C2466: cannot allocate an array of constant size 0
2023-03-06T02:27:11.4941418Z scipy\special\cephes\stirling.c(32): error C2133: 'prev': unknown size
2023-03-06T02:27:11.4942535Z scipy\special\cephes\stirling.c(33): error C2057: expected constant expression
2023-03-06T02:27:11.4943476Z scipy\special\cephes\stirling.c(33): error C2466: cannot allocate an array of constant size 0
2023-03-06T02:27:11.4944425Z scipy\special\cephes\stirling.c(33): error C2133: 'curr': unknown size
2023-03-06T02:27:11.4945275Z INFO: 

I suspect this is because I'm using const int arraysize = n+1; as the array size inside the function. This works fine for me locally when I include the c99 flag to gcc but I'm using a mac and I don't have access to a windows machine. Is there something different that needs to be done for the windows compilers to handle variable length arrays in windows?

@rgommers
Copy link
Member

rgommers commented Mar 7, 2023

const int arraySize = n + 1;

I suspect this is because I'm using const int arraysize = n+1; as the array size inside the function. This works fine for me locally when I include the c99 flag to gcc but I'm using a mac and I don't have access to a windows machine. Is there something different that needs to be done for the windows compilers to handle variable length arrays in windows?

The const here is wrong - the value must be constant for the duration of the program, and here it depends on n which is a parameter to the function. So just drop the const I'd say. Why it's different across platforms is probably just because it's undefined behavior.

Also note this failure:

 Exception: /home/runner/work/scipy/scipy/scipy/special/tests/test_stirling.py is not installed

You can fix it by adding the new file to the listing in scipy/special/tests/meson.build.

@rlucas7
Copy link
Member Author

rlucas7 commented Mar 7, 2023

The const here is wrong ... So just drop the const I'd say. Why it's different across platforms is probably just because it's undefined behavior.

Got it, removed const in latest commit e6bb72d...

You can fix it by adding the new file to the listing in scipy/special/tests/meson.build.

Done in commit e6bb72d...

Comment on lines 31 to 33
int arraySize = n + 1;
int prev[arraySize];
int curr[arraySize];
Copy link
Contributor

@steppi steppi Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick comment, I won't have time to review the math until the weekend.

We should calloc these instead of declaring them on the stack (making sure to call free before returning). VLAs are part of the C99 standard, but became optional as of C11. I checked the toolchain roadmap and it looks like we shouldn't be using optional features like this.

Also, by declaring them on the stack, we can blow up the stack and get a segfault even with sane input. On my 64bit Linux machine I found that

import scipy.special as sc

stirling2(2000000, 1999999)
# Segmentation fault (core dumped)

Copy link
Member Author

@rlucas7 rlucas7 Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, by declaring them on the stack, we can blow up the stack and get a segfault even with sane input. On my 64bit Linux machine I found that

import scipy.special as sc

stirling2(2000000, 1999999)
# Segmentation fault (core dumped)

Ok. FWIW the calculation will overflow long before this point so mayvbe it makes sense to raise an error if n > X where X is the value of overflow (17 on my machine w/int being 32 bit IIRc)?

We should calloc these instead of declaring them on the stack (making sure to call free before returning). VLAs are part of the C99 standard, but became optional as of C11. I checked the toolchain roadmap and it looks like we shouldn't be using optional features like this.

Hmm, I'm somewhat confused by the toolchain roadmap... I'll look at the calloc implementation.

@steppi Thanks very much for your comments

@rlucas7
Copy link
Member Author

rlucas7 commented Mar 8, 2023

I switched to heap memory from @steppi 's comments (and free) before returning. I ran the new version through valgrind locally:

:==12039== 
==12039== HEAP SUMMARY:
==12039==     in use at exit: 0 bytes in 0 blocks
==12039==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==12039== 
==12039== All heap blocks were freed -- no leaks are possible
==12039== 
==12039== Use --track-origins=yes to see where uninitialised values come from
==12039== For lists of detected and suppressed errors, rerun with: -s

so I think everything should be good from memory leak perspective AFAICS.
All returned values look the same as using VLAs (this is good).

Still not sure what we want to do about the overflow issue (large n or k input) though?

@rlucas7
Copy link
Member Author

rlucas7 commented Mar 8, 2023

FWIW it looks like the failing windows builds are the same issue that Tyler opened #18108
Screen Shot 2023-03-07 at 10 03 06 PM

@steppi
Copy link
Contributor

steppi commented Mar 8, 2023

Actually, I'm pretty frightened by having all these allocations in the inner loop of a ufunc. This stackexchange answer seems to give an efficient solution based on the inclusion exclusion based formula.

@rlucas7
Copy link
Member Author

rlucas7 commented Mar 8, 2023

Actually, I'm pretty frightened by having all these allocations in the inner loop of a ufunc. This stackexchange answer seems to give an efficient solution based on the inclusion exclusion based formula.

I had initially looked into IE-see the issue,. In practice IE doesn't seem to be faster because authors ignore the cost of computing the binomial coefficients-one for each entry from i=1 to k is needed. As for the memory allocations, it's really only 1 per call which doesn't seem too bad to me.

The alternative is to do everything in pure python and compute up through an array of (n,k) values. See this comment from the original issue for that computing values approach. Another benefit of the pure python approach is greater precision (overflow is less of a concern) if we stay with the python int type.

We are duplicating work in the ufunc approach though if an array contains two of the same n values, we'd recompute the array for each call. You can easily imagine cases where this becomes inefficient. The challenge here is that the ufunc mechanism (AFAIK) does not provide for sharing computation across function calls. The assumption is data parallelism for each entry in the array, which an array of n,k parameters for Stirling numbers of the second kind clearly violates.

@steppi
Copy link
Contributor

steppi commented Mar 8, 2023

The IE based implementation from this gist is recomputing the binomial coefficient from scratch at each iteration instead of using the recurrence from the above link to update it with a few arithmetic operations at each step. The algorithm from the compsci stack exchange link uses fewer overall options than the allocation heavy approach.

@rgommers rgommers changed the title add stirling2 function to scipy special cephes ENH: add stirling2 function to scipy.special Mar 8, 2023
@steppi
Copy link
Contributor

steppi commented Mar 8, 2023

I've come with a plan that might work.

Ok. FWIW the calculation will overflow long before this point so mayvbe it makes sense to raise an error if n > X where X is the value of overflow (17 on my machine w/int being 32 bit IIRc)?

OK, so the size of an int is platform dependent, but for compatibility uses it usually will be 32 bits, even on 64 bit machines.

The expression

$${2000000 \brace 1999999} = {2000000 \choose 2} = 1999999000000$$

won't overflow when using 64 bit ints, but would overflow when using 32 bit ints. We will want to support 64 bit. The current special infrastructure allows types int and long. On 64 bit linux a long is 64 bits, but on windows it's still 32 bits. We could either support both int and long, maybe using Cython with fused types, or just support long.

I still think we should be using the fast IE based algorithm from here, not the allocation based one. If one wants to compute many Stirling numbers, the allocation based one will duplicate too much effort, so isn't very efficient actually.

For overflow. I think we can check for it inside the compiled function and return -1 if there's been overflow. We can also have a pure python implementation which can handle arbitrary sized ints. There would be a Python wrapper, which first calls the vectorized ufunc. We then check for -1s. If any exist we convert the array to type object, and use the pure Python implementation to fill the -1s with the correct values.

Thoughts?

@steppi
Copy link
Contributor

steppi commented Mar 8, 2023

Oh, I see. Even when speeding up the binomial coefficient calculation, the IE formula is still unworkable because you need to divide by factorial at the end, and the numerator and denominator will both overflow even for reasonable input.

Copy link
Contributor

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to other opinions on this, but I think we should make the return type long instead of int, so we can handle larger return values on platforms where long is 64 bit.

Also, we don't actually need to compute the full triangle so we can save some effort and memory. There's a way to set it up so only the necessary values are computed. See the relevant comment. Hopefully I'll have time to post a pure Python implementation this week that you can refer to I've posted a pure Python implementation you can refer to.

I'm open to more suggestions on this too, but I think it would be good to return -1 in case of overflow or failure to allocate memory. Since the actual result should always be non-negative, this would signal to the users that the value is incorrect. We'd just have to document this behavior well.

scipy/special/_cephes.pxd Outdated Show resolved Hide resolved
scipy/special/cephes.h Outdated Show resolved Hide resolved
scipy/special/functions.json Outdated Show resolved Hide resolved
scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved
scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved
Comment on lines 38 to 43
for (int i=1; i<arraySize; i++){
// build up next row in curr
for (int j=1; j<arraySize; j++){
curr[j] = prev[j - 1];
curr[j] += prev[j] * j;
}
Copy link
Contributor

@steppi steppi Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're doing too much work here and allocating more than we need to. To compute ${n \brace k}$ we don't need the full triangle. We only need a parallelogram of dimensions $k \times (n - k + 1)$ like in the image below.

image

If $k \leq n - k + 1$ we can allocate two arrays of size $k$ and compute the values in the parallelogram left to right and then top to bottom. If $k &gt; n - k + 1$ we can allocate two arrays of size $n - k + 1$ and compute the values top to bottom and then left to right.

This also has the benefit of making it so that if an overflow occurs at any time during execution, then the final result will overflow. If we compute the full triangle this won't be the case. We need to be able to detect overflow, and this will simplify that.

Copy link
Contributor

@steppi steppi Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import numpy as np


def stirling2(n, k):
    if k <= 0 or k > n or n < 0:
        if n == 0:
            return 1
        return 0
    if k <= n - k + 1:
        current = [1]*k
        for i in range(1, n - k + 1):
            for j in range(1, k):
                current[j] = (j+1) * current[j] + current[j-1]
    else:
        current = [1]*(n - k + 1)
        for i in range(1, k):
            for j in range(1, n - k + 1):
                current[j] = (i+1) * current[j-1] + current[j]
    return current[-1]

@rlucas7 Here's a Python implementation that only computes the values in the triangle that are actually needed. It should be relatively straightforward to translate into C. Note that we will want malloc instead of calloc now because we no longer initialize with zero entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just updated the implementation above to only use one array instead of two.

Copy link
Member Author

@rlucas7 rlucas7 Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're doing too much work here and allocating more than we need to. To compute {nk} we don't need the full triangle. ... If k≤n−k+1 we can allocate two arrays of size k and compute the values in the parallelogram left to right and then top to bottom. If k>n−k+1 we can allocate two arrays of size n−k+1 and compute the values top to bottom and then left to right.

I understand this as taking lhs or the top (angular) array of 1s in the Stirling triangle and proceeding to do the calculations these ways, along angled movement of a row.

Clever.

This also has the benefit of making it so that if an overflow occurs at any time during execution, then the final result will overflow.

I agree.

If we compute the full triangle this won't be the case. We need to be able to detect overflow, and this will simplify that.

I don't follow the reasoning of these last 2 sentences. It seems to me that either case it's needed to check for overflow. Unless I'm mistaken the same n,k values overflow in either case (using array based dynamic programs). Did I miss something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the reasoning of these last 2 sentences. It seems to me that either case it's relatively straightforward to check. Unless I'm mistaken the same n,k values overflow in either case (using array based dynamic programs). Did I miss something?

I just mean, in the reduced schedule where we only compute stirling numbers over the parallelogram, we just need to check if overflow occurs since every value computed will take part in the final result. If we do the whole triangle up to ${n \brace k}$, some of the entries of the array will overflow but these won't be used in the final result. It's not that difficult to check with the full triangle, it's just that we need to check for overflow and also that the value will be used in the final result. I like the simpler condition where we just have to check for overflow better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... it's just that we need to check for overflow and also that the value will be used in the final result. I like the simpler condition where we just have to check for overflow better.

Got it-Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW After switching to long outputs in C (on my mac) the overflow bumps up from first happening at 17-18 to 25-26,

stirling2(25,0) = 0 
stirling2(25,1) = 1 
stirling2(25,2) = 16777215 
stirling2(25,3) = 141197991025 
stirling2(25,4) = 46771289738810 
stirling2(25,5) = 2436684974110751 
stirling2(25,6) = 37026417000002430 
stirling2(25,7) = 227832482998716310 
stirling2(25,8) = 690223721118368580 
stirling2(25,9) = 1167921451092973005 
stirling2(25,10) = 1203163392175387500 
stirling2(25,11) = 802355904438462660 
stirling2(25,12) = 362262620784874680 
stirling2(25,13) = 114485073343744260 
stirling2(25,14) = 25958110360896000 
stirling2(25,15) = 4299394655347200 
stirling2(25,16) = 526655161695960 
stirling2(25,17) = 48063331393110 
stirling2(25,18) = 3275678594925 
stirling2(25,19) = 166218969675 
stirling2(25,20) = 6220194750 
stirling2(25,21) = 168519505 
stirling2(25,22) = 3200450 
stirling2(25,23) = 40250 
stirling2(25,24) = 300 
stirling2(25,25) = 1 
stirling2(25,26) = 0 

stirling2(26,0) = 0 
stirling2(26,1) = 1 
stirling2(26,2) = 33554431 
stirling2(26,3) = 423610750290 
stirling2(26,4) = 187226356946265 
stirling2(26,5) = 12230196160292565 
stirling2(26,6) = 224595186974125331 
stirling2(26,7) = 1631853797991016600 
stirling2(26,8) = 5749622251945664950 
stirling2(26,9) = -7245227292754425991 
stirling2(26,10) = -5247188700862703611 
stirling2(26,11) = -8417665732711074856 
stirling2(26,12) = 5149507353856958820 
stirling2(26,13) = 1850568574253550060 
stirling2(26,14) = 477898618396288260 
stirling2(26,15) = 90449030191104000 
stirling2(26,16) = 12725877242482560 
stirling2(26,17) = 1343731795378830 
stirling2(26,18) = 107025546101760 
stirling2(26,19) = 6433839018750 
stirling2(26,20) = 290622864675 
stirling2(26,21) = 9759104355 
stirling2(26,22) = 238929405 
stirling2(26,23) = 4126200 
stirling2(26,24) = 47450 
stirling2(26,25) = 325 
stirling2(26,26) = 1 
stirling2(26,27) = 0 

@rlucas7
Copy link
Member Author

rlucas7 commented Mar 10, 2023

the IE formula is still unworkable ... overflow

Yes. I noticed this too, the +/-1s in the IE formulation indicate that something is too large (at some point) and needs to be subtracted, (or vice versa) so that overflows are likely to occur sooner. If arbitrary precision (gmp or pure python) we don't have this concern but not the case here.

@rlucas7
Copy link
Member Author

rlucas7 commented Mar 10, 2023

@steppi Latest commit has your memory optimization-using only 1 array. I'll think a bit more on the overflow checking.
From my local testing/spot-checking it seems like a check for curr[j] <=0 in the inner for loop would work but I haven't tested it out yet so I'll maybe do those over the weekend so I added that too.

Also, to disambiguate the potential-albeit unlikely-malloc error I'm thinking to return a -2 for the overflow whilst keeping the -1 return for the malloc error.

Copy link
Contributor

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overflow of unsigned integers in C leads to unsigned undefined behavior. See the suggestion for how to check properly. I also suggested a long comment to explain the reduced version of the dynamic programming algorithm. Let me know what you think of it.

Also, can you add some tests that check overflow is being handled properly?

scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved
scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved
scipy/special/cephes/stirling.c Outdated Show resolved Hide resolved
@steppi
Copy link
Contributor

steppi commented Mar 11, 2023

@rlucas7, I thought it would be good to also support arbitrarily large ints using a pure Python implementation. I've submitted a nested PR rlucas7#1 to make stirling2 work similarly to scipy.special.comb. It is unfinished, it still needs an implementation of an asymptotic approximation for the exact=False case, is in need of updated documentation, and probably needs some polishing. Please take a look and let me know what you think.

@steppi
Copy link
Contributor

steppi commented Mar 13, 2023

I implemented the first and second order asymptotic approximations from Temme in Python just to see how well they work. I've been having fun with this. With the second order approximation relative error doesn't seem terrible. On the order of 1e-6 to 1e-7 for the few test cases below. I think we'll need to go to 3rd or 4th order to get reasonably close to machine precision.

To implement this, we' can use binom from special/orthogonal_eval.pxd and lambertw_scalar from special/_lambertw.pxd. I don't how to call these from C but we could call them from Cython. @rlucas7, once we get everything sorted out with rlucas7#1, would you be willing to translate special/cephes/stirling.c into Cython at special/_stirling2.pxd? nevermind, we can just keep your stirling2 in stirling.c, and I can move stirling2_approx into Cython in a future PR for the asymptotics

import numpy as np
import scipy.special as sc


def stirling2_asymptotic(n, k):
    mu = k/n
    k, n = float(k), float(n)

    def f(x):
        return mu * x - 1 + np.exp(-x)

    def df(x):
        return mu - np.exp(-x)

    def d2f(x):
        return np.exp(-x)

    delta = 1/mu * np.exp(-1/mu)
    x0 = 1/mu + sc.lambertw(-delta)
    t0 = (n - k) / k
    F = np.sqrt(t0/((1 + t0)*(x0 - t0)))
    A = -n*np.log(x0) + k*np.log(np.exp(x0) - 1) - k*t0 + (n - k)*np.log(t0)
    try:
        result = np.exp(A)*k**(n - k)*F*sc.binom(n, k)
    except OverflowError:
        result = np.inf
    return result


def stirling2_asymptotic_order2(n, k):
    mu = k/n
    k, n = float(k), float(n)

    def f(x):
        return mu * x - 1 + np.exp(-x)

    def df(x):
        return mu - np.exp(-x)

    def d2f(x):
        return np.exp(-x)

    delta = 1/mu * np.exp(-1/mu)
    x0 = 1/mu + sc.lambertw(-delta)
    t0 = (n - k) / k
    F = np.sqrt(t0/((1 + t0)*(x0 - t0)))
    A = -n*np.log(x0) + k*np.log(np.exp(x0) - 1) - k*t0 + (n - k)*np.log(t0)
    F1 = (
        -2*x0**3 + 2*t0**5 + 4*t0**3 + 4*t0**4 + 3*x0**2 * t0 - 6*x0*t0**4
        - 5*x0**2 * t0**2 + 2*x0**4 * t0 + x0**3*t0 - 6*x0**3 * t0**2
        + 8*x0**2 * t0**3
    ) / (24*F*(1 + t0)**2*(x0 - t0)**4)
    try:
        result = np.exp(A)*k**(n - k)*sc.binom(n, k)*(F - F1/k)
    except OverflowError:
        result = np.inf
    return result


for n, k in ((100, 20), (112, 45), (200, 33)):
    exact = sc.stirling2(n, k)
    asymptotic = stirling2_asymptotic(n, k)
    asymptotic2 = stirling2_asymptotic_order2(n, k)
    rel_error = abs(float(exact) - asymptotic) / abs(float(exact))
    order2_rel_error = abs(float(exact) - asymptotic2) / abs(float(exact))
    print("-------------------")
    print(f"stirling2({n}, {k})")
    print("--------------------")
    print("rounded exact val:", float(exact))
    print("asymptotic:", asymptotic)
    print("relative error:", rel_error)
    print("asymptotic order2:", asymptotic2)
    print("relative error order2:", order2_rel_error)

The output is

-------------------
stirling2(100, 20)
--------------------
rounded exact val: 4.619255470442159e+111
asymptotic: (4.621063648187713e+111+0j)
relative error: 0.0003914435469360925
asymptotic order2: (4.6192562713383553e+111+0j)
relative error order2: 1.7338209619962726e-07
-------------------
stirling2(112, 45)
--------------------
rounded exact val: 1.619249775669471e+127
asymptotic: (1.6200869069683281e+127+0j)
relative error: 0.0005169871328289221
asymptotic order2: (1.6192552623088327e+127+0j)
relative error order2: 3.3883835861936724e-06
-------------------
stirling2(200, 33)
--------------------
rounded exact val: 5.41301537436116e+266
asymptotic: (5.413628464612622e+266+0j)
relative error: 0.00011326224092513798
asymptotic order2: (5.413011555128007e+266+0j)
relative error order2: 7.055648079593478e-07

Copy link
Member Author

@rlucas7 rlucas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the docstring does need to be raw style.

@steppi
Copy link
Contributor

steppi commented Jun 23, 2023

Docs failures have been resolved. The arpack test, test_hermitian_mode, is still failing for Linux Meson tests / Meson build (3.11, false) (pull_request); I plan to look into that. The other failure in Linux Meson tests / Prerelease deps and 64-bit BLAS (3.9) (pull_request) is gh-18721.

@alugowski
Copy link
Contributor

The arpack test, test_hermitian_mode, is still failing for Linux Meson tests / Meson build (3.11, false) (pull_request); I plan to look into that.

Hopefully this can help to get started #18736 :)

@rlucas7
Copy link
Member Author

rlucas7 commented Jun 26, 2023

@steppi I worked through Temme's paper this weekend and reproduced the results in that paper. In particular his equation 4.4 on page 10 matches what you have here in the details code dropdown. I reproduce the max error values with my own implementation (I missed the code in the dropdown tab in your comment) so I independently reproduced both the code and the max error values Temme has noted for n=10, 20, etc for the second order approximation. I think that's what we should use here-unless you've got a more accurate one. The choice of N where we use the approximation over the exact value needs to be made. It seems with the second order the max relative error for N=50 is already 0.00002 at m=7 and for N=100 at m=13 with max relative error of 4.7973885812281365e-06 so probably something at or above N=100 is fine. For larger values of N:

(max rel error, K that achieves max error, N)
(1.1865872292929588e-06, 26, 200)
(9.36707897293228e-07, 29, 225)
(1.62298310561457e-07, 20, 250)
(2.1612156823753847e-08, 14, 275)
(5.521646313770846e-11, 11, 300)

so somewhere between 275 and 300 seems ok as a threshold.

incidentally if you want me to look at a pr for the Temme approximation just drop me an email once it's up. I'm comfortable reviewing a PR on that now.

Here is the code I used to generate the values noted here:

from scipy import optimize
from math import sqrt, comb
## calculated via exact method using dp soln above...
# setting n=10 we can verify that max relative error occurs @ m=4 w/value =0.000466 (rounds to value given in Temme paper)
# also n=20 verifies max relative error in paper too @ m=3 w/value = (0.000116, rounds to 0.00012 in paper)
n = 275
actual = stirling2(n, range(n+1))
m2 = m1 = (-np.inf, -1)
for m in range(1, n):
    # calculate t0 as described on page 235 of temme
    # this is noted in the writing just after equation 2.6 on page 235
    t0 = (n-m)/m
    def func(x):
        return (m/n)*x - 1 + np.exp(-x)
    # calculate x0 by solving root finding problem
    # equation 2.5 in Temme
    res = optimize.root(func, n/m, method='broyden1', tol=1e-14)
    x0 = res.x.take(0)
    # now 
    def phi(x):
        return -n * np.log(x) + m*np.log(np.exp(x) - 1)
    A = phi(x0) - m*t0 + (n-m)*np.log(t0)
    # typically f() will be called w/argument t0
    def f(x):
        return sqrt(x / ((1+x) * (x0-x)) )
    # needed for second order approx 
    def f1(x):
        num = -2*x0**3 + 2*x**5 + 4*x**4 + 4*x**3 + 3*x*x0**2 - 6*x0*x**4 - 5*x0**2*x**2 + 2*x0**4*x + x0**3*x - 6*x0**3*x**2 + 8*x0**2*x**3
        denom = 24*f(x)*(1+x)**2*(x0-x)**4
        return num/denom
    # first order approx gives
    first = np.exp(A)*m**(n-m)*f(t0)*comb(n,m)
    # second order approx gives 
    second = np.exp(A)*m**(n-m)*(f(t0)-f1(t0)/m)*comb(n,m)
    rel_err1 = (actual[m] - first) / first
    rel_err2 = (actual[m] - second) / second
    if abs(rel_err1) > m1[0]:
        m1 = (abs(rel_err1), m)
    if abs(rel_err2) > m2[0]:
        m2 = (abs(rel_err2), m)
    print(f"n={n}, m={m}, first = {first}, second={second}, actual={actual[m]}, rel_err1={rel_err1}, rel_err2={rel_err2}")

print(m1, m2)

@steppi
Copy link
Contributor

steppi commented Jun 26, 2023

@rlucas7 Cool. Yeah, I agree we should start with the second order approximation. Later on if we get bandwidth we can try to independently work out higher order approximations.

I also have the following approximation based on the inclusion/exclusion formula, and using the Lanczos approximation to tame the factorial, cancelling as much as possible. This can be used for smaller values. It doesn't work as k approaches n, I think we can just do a floating point version of the recurrence in those cases when n is still too small for the asymptotic expansion to be usable.

def stirling2_double(n, k):
    result = 0
    k_choose_j = 1
    for j in range(k):
        term = (
            (-1)**j * k_choose_j * (1 - j/k)**(k + 0.5)
            * (k - j)**(n - k - 0.5)
        )
        result += term
        k_choose_j = k_choose_j * (k - j) / (j + 1)
    return (
        result
        * (np.e / (1 + (lanczos_g + 0.5)/k))**(k + 0.5)
        / _lanczos_sum_expg_scaled(k + 1)
    )

Let's merge the current PR this week, and I'll submit a PR for the approximation next weekend.

Update: Improved approximation above by pulling a constant factor out of the sum.

@rlucas7
Copy link
Member Author

rlucas7 commented Jul 17, 2023

It looks like the docs still aren't rendering the brace notation (and the r""" is now in the docs. Let me know if anyone has any ideas on how to fix that.
Screen Shot 2023-07-17 at 5 43 21 PM
If you want to check yourself the link is:
https://output.circle-artifacts.com/output/job/17f05edd-f24b-4c0d-ad13-89a0030bf835/artifacts/0/html/reference/generated/scipy.special.stirling2.html
it looks like there are a few outstanding items on this one still-besides the docstring rendering properly.

There are also approximation method(s) to be added.

@steppi if you want I can put together what I had for Temme's second order and add it here to the existing PR?

I spent some time studying the Lanczos approaches in Pugh's thesis I'm still working through the Chebyshev approximation part but I get the gist of that particular approach. I didn't have access to the reference so I needed to work out the derivation for Lemma 3.2. In any case I feel comfortable reviewing a PR for that work if you wanted to put up something for lower values of (n,k), e.g. where n < 300. Using the inclusion-exclusion and Lanczos approximation you indicated. I haven't tested out either of the numerical approaches indicated in the thesis so I'm not sure of their strengths/weaknesses using actual values.

I'm also not sure what the consensus is on the 0-dimension numpy array, do we need to change the behavior in this PR? I think the answer is yes but want to confirm before adding another change.

scipy/special/_basic.py Outdated Show resolved Hide resolved
@steppi
Copy link
Contributor

steppi commented Jul 17, 2023

I'm also not sure what the consensus is on the 0-dimension numpy array, do we need to change the behavior in this PR? I think the answer is yes but want to confirm before adding another change.

Let’s follow @h-vetinari’s precedent for the factorial functions. He made a good case for it in gh-18768.

@steppi if you want I can put together what I had for Temme's second order and add it here to the existing PR?

Let’s save it for a followup PR. Once the doc issues are resolved and the behavior lines up with the factorial functions regarding 0d arrays I think it’ll be ready to merge.

@steppi
Copy link
Contributor

steppi commented Jul 17, 2023

Besides the 0d array thing, we made stirling2 accept object array of int input and the factorial functions don’t. We should probably make stirling2 match the factorial functions with respect to this too. @h-vetinari do you have any thoughts?

@rlucas7
Copy link
Member Author

rlucas7 commented Jul 17, 2023

Besides the 0d array thing, we made stirling2 accept object array of int input and the factorial functions don’t. We should probably make stirling2 match the factorial functions with respect to this too. @h-vetinari do you have any thoughts?

The reason we used object as the dtype was for arbitrary precision integers b/c the stirling numbers of the second kind overflow quickly for both 32 and 64 bit integers. If we switched back we'd be dealing with the precision issue again and it would severely limit the usefulness of this function. E.g. IIRC all n>27 would overflow in 64 bit.

@steppi
Copy link
Contributor

steppi commented Jul 17, 2023

Besides the 0d array thing, we made stirling2 accept object array of int input and the factorial functions don’t. We should probably make stirling2 match the factorial functions with respect to this too. @h-vetinari do you have any thoughts?

The reason we used object as the dtype was for arbitrary precision integers b/c the stirling numbers of the second kind overflow quickly for both 32 and 64 bit integers. If we switched back we'd be dealing with the precision issue again and it would severely limit the usefulness of this function. E.g. all n>27 IIRC.

Sorry. That’s not what I mean. The factorial functions also work with Python ints internally and return an object array, but the input array must be some kind of numpy int dtype. I’d made a suggestion earlier that would let stirling2 accept object arrays as input which might have been bad from a consistency standpoint.

scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
@rlucas7
Copy link
Member Author

rlucas7 commented Jul 17, 2023

Besides the 0d array thing, we made stirling2 accept object array of int input and the factorial functions don’t. We should probably make stirling2 match the factorial functions with respect to this too. @h-vetinari do you have any thoughts?

The reason we used object as the dtype was for arbitrary precision integers b/c the stirling numbers of the second kind overflow quickly for both 32 and 64 bit integers. If we switched back we'd be dealing with the precision issue again and it would severely limit the usefulness of this function. E.g. all n>27 IIRC.

Sorry. That’s not what I mean. The factorial functions also work with Python ints internally and return an object array, but the input array must be some kind of numpy int dtype. I’d made a suggestion earlier that would let stirling2 accept object arrays as input which might have been bad from a consistency standpoint.

Ah, I see what you mean. IIRC the current implementation raises an error in the case when the input is an object array in the input. There is a test that confirms this behavior in this PR. Is there something that needs to be done for the object input beyond what's already done?

should fix the 0-dim case and not break the handling for existing scalar case.

Co-authored-by: Daniel Schmitz <40656107+dschmitz89@users.noreply.github.com>
Co-authored-by: Albert Steppi <albert.steppi@gmail.com>
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
change test w/decision to *not* accept input array w/dtype object
@h-vetinari
Copy link
Member

If it helps (for example with the whole 0d vs. scalar thing), you can probably take a look at the test infrastructure for factorial2 (factorial itself has a bunch of legacy stuff in there and is a bit more distracting), particularly the test for corner cases.

PS. The factorial functions (with exact=True) also switch to object dtype when the respective integer types are too small.

Copy link
Contributor

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think this is ready. Thanks @rlucas7. If no one objects I'll merge tomorrow evening UTC-4.

@rlucas7
Copy link
Member Author

rlucas7 commented Jul 21, 2023

Looks good, I think this is ready. Thanks @rlucas7. If no one objects I'll merge tomorrow evening UTC-4.

@steppi SGTM would you like me to put together a separate PR for the Temme approximation?

@steppi steppi merged commit eff66d0 into scipy:main Jul 21, 2023
24 of 25 checks passed
@steppi
Copy link
Contributor

steppi commented Jul 21, 2023

@steppi SGTM would you like me to put together a separate PR for the Temme approximation?

Yes, that would be awesome.

@steppi steppi added this to the 1.12.0 milestone Dec 4, 2023
@steppi steppi mentioned this pull request Dec 4, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Stirling Numbers of the second Kind
10 participants