ENH: add exact=False support for stirling2 #19287

rlucas7 · 2023-09-24T01:16:26Z

adding preliminary support for exact=False support to stirling2 for now only adds inclusion-exclusion via Lanczos which works well up to around N=33 and then has overflow issues.

Reference issue

Adds preliminary exact=False support for Stirling numbers of the second kind
#17890

What does this implement/fix?

adding preliminary support for exact=False support to stirling2

Additional information

@steppi During review from prior PR we decided to include exact=False support.

This PR adds initial support for inclusion-exclusion via Lanczos approximation.

Still needs to add:

support for Temme for n > 33 and other places where it is more accurate than Lanczos

steppi · 2023-09-24T01:47:06Z

Thanks @rlucas7! Some quick comments/suggestions

I think the exact=False case should be handled in a ufunc with the kernel written in C, C++, or Cython.
Depending on the parameters one should use either:
- The inclusion/exclusion formula with Lanczos approximation like in this PR
- The dynamic programming algorithm in C calculated with doubles instead of ints
- Temme's asymptotic expansion. 2nd order for now since that's what we have, but ideally we'd work out higher order expansions.
The key bit of work here is finding out for which inputs one should use each of the above three methods, trading off on accuracy, time, and memory. Numerical experiments seems to be the way to go here.

rlucas7 · 2023-09-24T02:21:09Z

Thanks @rlucas7! Some quick comments/suggestions

* I think the `exact=False` case should be handled in a ufunc with the kernel written in C, C++, or Cython.

* Depending on the parameters one should use either:
      
  ....
  * The dynamic programming algorithm in C calculated with doubles instead of ints

I don't think the DP approach really fits the workflow of a ufunc though, IIUC every value is computed in a parallel way. Is there some point I'm missing?

steppi · 2023-09-24T03:01:11Z

Thanks @rlucas7! Some quick comments/suggestions
* I think the `exact=False` case should be handled in a ufunc with the kernel written in C, C++, or Cython.

* Depending on the parameters one should use either:
      
  ....
  * The dynamic programming algorithm in C calculated with doubles instead of ints
I don't think the DP approach really fits the workflow of a ufunc though, IIUC every value is computed in a parallel way. Is there some point I'm missing?

NumPy ufuncs don't employ parallelization. They just loop through and calculate values in a compiled language without any Python overhead. At one point while working on stirling2 you had a ufunc which used the dynamic programming algorithm, but the computation was with 64 bit ints not doubles. For not too large values of n, the dp algorithm is so fast that any redundant computation will be negligible.

rlucas7 · 2023-10-03T03:05:45Z

... At one point while working on stirling2 you had a ufunc which used the dynamic programming algorithm, but the computation was with 64 bit ints not doubles. For not too large values of n, the dp algorithm is so fast that any redundant computation will be negligible.

@steppi Looking into the DP in ufunc and I had a question

In the stirling2 PR there was also some concern about doing a memory allocation (specifically malloc) inside each ufunc call-we need one to store the solution to sub-problems.

Is there a different way to do the memory allocation or are we ok with doing it that way?

rlucas7 · 2023-10-04T03:09:46Z

This passes builds for me locally but seems like different math.h constants in the test-runners on gh don't have M_E as the symbol for the constant?

looks like I can use the e constant here which should fix all but one of the build issues.

Other one is a linter issue from the rebase I just did to get the action to run.

rlucas7 · 2023-10-05T13:36:25Z

Ok, looks like the e part is fixed. There is still some doc build issue, I always get confused on which thing needs to go where here in special.

@steppi | @person142 It looks like I need to add function names to some constituent part of __all__ but I don't think we want to expose the inexact dp, nor the lanczos c ufuncs. I think I need to modify the function names to prefix with an _stirling2_{inexact,lanczos} but not sure where. Do you know where I need to modify the function?

steppi

Just a few brief comments, I haven't looked at this closely yet.

Also just a reminder, before this goes in I'd like to see

At least the second order asymptotic expansion implemented.
Some experiments showing how the regimes where the different implementations are used was decided.

steppi · 2023-10-05T13:45:51Z

scipy/_lib/unuran

Looks like this change got pulled in somehow.

on the rebase-will remove

steppi · 2023-10-05T13:50:00Z

scipy/special/_add_newdocs.py

+add_newdoc("stirling2_inexact",
+    r"""
+    private method do not use
+    """)
+
+
+add_newdoc("stirling2_lanczos",
+    r"""
+    private method do not use
+    """)


Private functions in special should have a leading underscore in their names. Also, the language used for other private functions is

"Internal function, do not use."

scipy/special/cephes/stirling2.c

steppi · 2023-10-05T14:20:40Z

I think I need to modify the function names to prefix with an _stirling2_{inexact,lanczos} but not sure where. Do you know where I need to modify the function?

They need to be modified in functions.json (and correspondingly, in _add_newdocs.py.)

rlucas7 · 2023-10-07T03:46:29Z

Just a few brief comments, I haven't looked at this closely yet.

Also just a reminder, before this goes in I'd like to see

At least the second order asymptotic expansion implemented.

Not clear to me how to get access to lambertw_scalar in the stirling2.c code

Some experiments showing how the regimes where the different implementations are used was decided.

Yes. I will do these once I have all 3 methods implemented at C level. I plan to use inexact and a dispatch table to call out to temme/lanczos/dp by (n,k). my plan is to first implement all 3 methods as separate functions at c level and expose to python level. Doing it this way makes it easier (for me) to run the experiments for switching between the 3 approaches based on different n, k values.

Let me know if you've got a better way.

steppi · 2023-10-07T12:07:23Z

Not clear to me how to get access to lambertw_scalar in the stirling2.c code

I’ll post a small example when I have time later today or tomorrow.

Doing it this way makes it easier (for me) to run the experiments for switching between the 3 approaches based on different n, k values.

Ok, yeah that makes sense, just mark your PR as draft until you think it’s ready to go in.

dschmitz89

Hi @rlucas7 , I appreciate the review request but this is way outside my expertise. I had not even heard of Stirling numbers before.

rlucas7 · 2023-10-09T02:40:01Z

C/H files are up in top level of special and found there after build.

Not clear to me how to get access to lambertw_scalar in the stirling2.c code

I’ll post a small example when I have time later today or tomorrow.

Once you are able to post the code for this I can finish the Temme approximation code in stirling.c.

Then I'll do the experiments to determine when to switch between Lanczos/DP(doubles)/Temme and add the dispatch to stirling_inexact().

steppi · 2023-10-09T21:37:15Z

Once you are able to post the code for this I can finish the Temme approximation code in stirling.c.

Sorry, it's trickier than I thought because of the complexity of SciPy's build system (or maybe just my lack of familiarity). I'm not sure what changes would need to be made here to make lambertw_scalar (and also binom from _orthogonal_eval.pxd) available in stirling.c. If we do figure that out, it would be as simple as adding

#include "_lambertw.h"
#include "_orthogonal_eval.h"

I've only done this in small projects with a setup.py build, in that case if you have an extension where the sources contain a .c and a .pxd, you can directly include from the .h generated from the .pxd.

I think the simplest thing to do is just to write stirling2 in Cython instead of C. Then you can just do

from ._lambertw cimport lambertw_scalar
from ._orthogonal_eval cimport binom

scipy/special/stirling2.c

steppi · 2023-10-16T17:23:19Z

@rlucas7, long term, I think the best course of action will be to rewrite lambertw, binom, (and all other special functions in SciPy that are implemented in Cython) into C++. This will make it easier to call them from other functions, and also opens up the possibility of using them on the GPU in CuPy, and other array computing libraries.

@izaid has kindly offered to help review PRs to rewrite such functions in C++. I plan to submit PRs for these two first. At the moment I think your options are to either follow through implementing the expansion in Cython, and we can convert later, or to wait until the lambertw and binom have been rewritten.

rlucas7 · 2023-10-18T01:52:01Z

@steppi thanks for letting me know. I’m not opposed to switching to c/c++ for Cython special functions but we should probably have a dev list discussion about that-unless we already have. Just to make sure everyone is on board, e.g. I think the docs still read to prefer cython for new stuff, which hasn’t really been followed strictly since I’ve been contributing.Do you have a time estimate on the lambertw and binom functions? (Just approximate timeline is all I’m asking) o/w it’s difficult for me to make the decision of cython or c. Alternately, we could keep the Temme in Python for now and move it down to c later once the cython -> c++ migrations are merged, Please let me know if you have opinions on proposed third way.Also, happy to be included in any reviews touching those functions. -Lucas Roberts On Oct 16, 2023, at 1:23 PM, Albert Steppi ***@***.***> wrote: @rlucas7, long term, I think the best course of action will be to rewrite lambertw, binom, (and all other special functions in SciPy that are implemented in Cython) into C++. This will make it easier to call them from other functions, and also opens up the possibility of using them on the GPU in CuPy, and other array computing libraries. @izaid has kindly offered to help review PRs to rewrite such functions in C++. I plan to submit PRs for these two first. At the moment I think your options are to either follow through implementing the expansion in Cython, and we can convert later, or to wait until the lambertw and binom have been rewritten. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

steppi · 2023-10-18T03:01:39Z

@rlucas7. I plan to have PRs submitted for binom and lambertw within the next two weeks and plan to make an RFC issue and dev list posting for the Cython->C++ proposal tomorrow. I'll ping you on the two PRs and the issue when I make them.

steppi · 2023-10-19T02:11:14Z

@rlucas7 while you’re blocked due to the current state of lambertw and binom I think the best thing to do would be to just use a Python implementation of the asymptotic expansion in your experiments to help determine how the choice of method should depend on n and k.

rlucas7 · 2023-10-23T01:09:56Z

Also just a reminder, before this goes in I'd like to see

At least the second order asymptotic expansion implemented.

Some experiments showing how the regimes where the different implementations are used was decided.

Sure, here you go. My decision based on looking at these is to NOT USE either Lanczos or Temme (this was a bit of a surprise for me). The DP approach is pretty accurate up through the space of values that are covered by valid doubles. Neither Lanczos (applied to inclusion-exclusion formula) nor the Temme 2nd order approximation come close in terms of accuracy.

stirling2_approx_comparisons.txt

Let me know if you disagree.

Here's the source I used for approximations:

# bring into dev env via `python dev.py python`
import numpy as np
import scipy.special as sc

def stirling2_asymptotic_order2(n, k):
    n, k = np.asarray(n,dtype=float), np.asarray(k,dtype=float)
    mu = k/n
    delta = 1/mu * np.exp(-1/mu)
    x0 = 1/mu + sc.lambertw(-delta)
    t0 = (n - k) / k
    F = np.sqrt(t0/((1 + t0)*(x0 - t0)))
    A = -n*np.log(x0) + k*np.log(np.exp(x0) - 1) - k*t0 + (n - k)*np.log(t0)
    F1 = (
        -2*x0**3 + 2*t0**5 + 4*t0**3 + 4*t0**4 + 3*x0**2 * t0 - 6*x0*t0**4
        - 5*x0**2 * t0**2 + 2*x0**4 * t0 + x0**3*t0 - 6*x0**3 * t0**2
        + 8*x0**2 * t0**3
    ) / (24*F*(1 + t0)**2*(x0 - t0)**4)
    try:
        result = np.exp(A)*k**(n - k)*sc.binom(n, k)*(F - F1/k)
    except OverflowError:
        result = np.inf
    return result.real


rows = []
for n in range(3, 221): # I get an overflow error at n=220
    args = ([n], list(range(1,n))) # the last value is n-1, including `n` value gives a nan
    exact = sc.stirling2(*args, exact=True)
    asymptotic2 = stirling2_asymptotic_order2(n=args[0], k=args[1])
    rel_error =  np.abs(exact - asymptotic2) / np.abs(exact)
    print(f"n: {n} (Temme-2) max rel_error: {max(rel_error)}")
    dp2 = sc.stirling2_dp(args[0], args[1])
    rel_error =  np.abs(exact - dp2) / np.abs(exact)
    print(f"n: {n} (dp) max rel_error: {max(rel_error)}")
    lanczos = sc.stirling2_lanczos(args[0], args[1])
    rel_error =  np.abs(exact - lanczos) / np.abs(exact)
    print(f"n: {n} (Lanczos) max rel_error: {max(rel_error)}")
    # we also have errors on the dp approx and on the Lanczos approx

that has some changes I'm going to push now (nothing substantive changed from previous commit though).

rlucas7 · 2023-10-23T01:14:20Z

@steppi unless you disagree w/findings in the previous post, I can delete all the other C stuff and leave only the DP approach. This may also impact the prioritization of the C++ migration of lambertw and binom from cython discussed in
#19404

steppi · 2023-10-23T01:25:59Z

@rlucas7, we need to take into account time and memory efficiency for exact=False. The asymptotic expansion will be much faster for large parameters and not require large memory allocations. Inclusion exclusion should only work well for small k, I’m not sure if we’d want to use it anywhere. The 2nd order expansion from Temme is pretty inaccurate in the regimes we’re looking at. Eventually we should try higher order.

rlucas7 · 2023-11-25T21:34:08Z

How’s this coming along @rlucas7? I’m willing to merge as soon as I verify you have it working regardless of C++ style considerations.

I've been working on timings as a follow up to this comment. When we compare only the approximate versions of DP and temme, the runtime of the approximate versions do start to scale linearly with n once we get above n=45 or so.

so for the dispatch I'll use n>50 to cut over to Temme.

from scipy.special import stirling2_dp, stirling2_temme, stirling2
import matplotlib.pyplot as plt
import timeit

y_temme = []
y_dp = []
x = list(range(10, 105, 5))
for n in x:
    s_temme = f"""n={n}; stirling2_temme([{n}], list(range(1,{n})))"""
    y_temme.append(timeit.timeit(stmt=s_temme, number=10000, setup="from scipy.special import stirling2_temme"))
    s_dp = f"""n={n}; stirling2_dp([{n}], list(range(1,{n})))"""
    y_dp.append(timeit.timeit(stmt=s_dp, number=10000, setup="from scipy.special import stirling2_dp"))


fig, ax = plt.subplots()
line1, = ax.plot(x, y_temme, linewidth=2.0, label="Temme")
line2, = ax.plot(x, y_dp, linewidth=2.0, label="DP")
ax.legend(handles=[line1, line2])
plt.title(label="Runtimes")
plt.show()

^ code used to generate the figure

rlucas7 · 2023-11-25T21:55:18Z

Looks like the _lambertw.h code isn't getting picked up in the build process on github, this works for me locally, @steppi is there something that looks incorrect here to you?

scipy/special/stirling2.h

* adding preliminary support for exact=False support to stirling2 for now only adds partial Temme and no Lanczos b/c dp works well for small N and Lancozs has overflow issuesfor N>32. add sitrling2_inexact dp ufunc add stirling2_lanczos c version fix dtype interactive rebase error move c code out of cephes to top level of special add more cleanup code attempt cxx move inline stirling2 for g++ and other related changes cleanup and apply horner

rlucas7 · 2023-11-26T05:18:17Z

I'm seeing

FAILED special/tests/test_basic.py::TestStirling2::test_numpy_array_unsigned_int_dtype -
 TypeError: ufunc '_stirling2_inexact' not supported for the input types, and the 
inputs could not be safely coerced to any supported types according to the 
casting rule ''safe''

both locally and in the test harness, I'm not sure how we handle this with the c++ setup here in special, @steppi have you seen this before? (any ideas?)

steppi · 2023-11-26T12:23:11Z

I'm seeing
FAILED special/tests/test_basic.py::TestStirling2::test_numpy_array_unsigned_int_dtype -
 TypeError: ufunc '_stirling2_inexact' not supported for the input types, and the 
inputs could not be safely coerced to any supported types according to the 
casting rule ''safe''
both locally and in the test harness, I'm not sure how we handle this with the c++ setup here in special, @steppi have you seen this before? (any ideas?)

unsigned int dtype isn't supported in the ufunc infrastructure. These are the allowed typecodes in functions.json

The input parameter types are denoted by single character type
codes, according to

   'f': 'float'
   'd': 'double'
   'g': 'long double'
   'F': 'float complex'
   'D': 'double complex'
   'G': 'long double complex'
   'i': 'int'
   'l': 'long'
   'v': 'void'

I think just get rid of that test for exact=False and clearly document which input types work for exact=True and exact=False.

steppi · 2023-11-26T12:49:57Z

~~Or I think it might just work if you replace "ii->d" in functions.json with "ll->d" since a 32 bit unsigned int can be safely cast to a 64 bit int.~~

Nevermind that, the test has 64 bit unsigned longs. I think the "ii" should be replaced by "ll" anyway though because I know stirling2(n, n-1) at least will fit in a double for n pretty large.

rlucas7 · 2023-11-26T18:00:52Z

~~Or I think it might just work if you replace "ii->d" in functions.json with "ll->d" since a 32 bit unsigned int can be safely cast to a 64 bit int.~~

Nevermind that, the test has 64 bit unsigned longs. I think the "ii" should be replaced by "ll" anyway though because I know stirling2(n, n-1) at least will fit in a double for n pretty large.

The build fails for me locally if I swap "ll" for "ii", so I don't think that type is currently supported here. I seem to recall this issue was surfaced a few years back when I was doing other special changes but I don't recall the details.

rlucas7 · 2023-11-26T18:07:39Z

... unsigned int dtype isn't supported in the ufunc infrastructure.
...
I think just get rid of that test for exact=False and clearly document which input types work for exact=True and exact=False.

Should we add an input validation and raise a TypeError for this?
(inside the python portion of stirling2())

steppi · 2023-11-26T18:39:09Z

Should we add an input validation and raise a TypeError for this?
(inside the python portion of stirling2())

Yeah. And if we do that we can make the ufunc take doubles and do input validation and convert to double before passing. I think the overhead should be negligible.

rlucas7 · 2023-11-27T03:18:08Z

Should we add an input validation and raise a TypeError for this?
(inside the python portion of stirling2())

Yeah. And if we do that we can make the ufunc take doubles and do input validation and convert to double before passing. I think the overhead should be negligible.

Ok I think I understand, with this way we can handle the uint cases too. This means that there is no need to remove the uint dtypes test case for the exact=False case. Instead we typecast the uint arguments (n,k) to doubles and pass them into the dispatcher for routing. This seems to work fine so I left it all in with a note to describe what's going on in this scenario in case it surfaces in future maintenance work.

If no major issues from test harness-I'll change from draft status tomorrow.

rlucas7 · 2023-11-27T04:37:54Z

...
If no major issues from test harness-I'll change from draft status tomorrow.

I'm a few minutes before tomorrow (20 or so) but everything is green in the test harness. @steppi I think this is ready for a closer scrutiny.

steppi

Nice job getting this working. Still needs some cleaning up, but nothing too major.

scipy/special/_add_newdocs.py

scipy/special/_basic.py

scipy/special/_cephes.pxd

scipy/special/stirling2.h

scipy/special/tests/test_basic.py

steppi · 2023-11-30T21:06:44Z

Nice work @rlucas7! Thanks for all your patience here.

rlucas7 requested review from person142 and steppi as code owners September 24, 2023 01:16

rlucas7 added enhancement A new feature or improvement scipy.special labels Sep 24, 2023

rlucas7 force-pushed the s2a branch from ec3076e to ac730c4 Compare October 4, 2023 02:59

rlucas7 requested a review from dschmitz89 October 5, 2023 13:38

steppi requested changes Oct 5, 2023

View reviewed changes

steppi changed the title ~~ENH: add Lanczos support for stirling2~~ ENH: add exact=False support for stirling2 Oct 5, 2023

dschmitz89 reviewed Oct 7, 2023

View reviewed changes

rlucas7 marked this pull request as draft October 9, 2023 01:52

steppi reviewed Oct 9, 2023

View reviewed changes

scipy/special/stirling2.c Outdated Show resolved Hide resolved

steppi mentioned this pull request Oct 18, 2023

RFC: seeking thoughts on rewriting special functions written in Cython into C++ #19404

Closed

rlucas7 force-pushed the s2a branch from 7ca1aa6 to 373c551 Compare November 25, 2023 21:36

steppi reviewed Nov 25, 2023

View reviewed changes

scipy/special/stirling2.h Outdated Show resolved Hide resolved

rlucas7 added 4 commits November 25, 2023 20:55

use specials binom

3949652

fix horner scheme et al

8caed8b

update header for lambertw

56f4f68

rlucas7 force-pushed the s2a branch from c80ee90 to 56f4f68 Compare November 26, 2023 02:12

make stirling2 inexact functions private

8f54373

handle uints and update docstring

f9a5637

rlucas7 marked this pull request as ready for review November 27, 2023 04:36

steppi requested changes Nov 27, 2023

View reviewed changes

rlucas7 added 4 commits November 27, 2023 23:11

address gh review comments

2fe884e

remove unused import for lanczos scaled g

ea7bd03

parametrize unit tests

1a4e5ad

use full vectorization in ufunc

5b70fd7

steppi approved these changes Nov 30, 2023

View reviewed changes

steppi merged commit 7da0e75 into scipy:main Nov 30, 2023
26 of 28 checks passed

steppi added this to the 1.12.0 milestone Nov 30, 2023

steppi mentioned this pull request Dec 4, 2023

DOC: 1.12.0 release notes #19628

Merged

9 tasks

ENH: add exact=False support for stirling2 #19287

ENH: add exact=False support for stirling2 #19287

Conversation

rlucas7 commented Sep 24, 2023 • edited

Reference issue

What does this implement/fix?

Additional information

steppi commented Sep 24, 2023

rlucas7 commented Sep 24, 2023

steppi commented Sep 24, 2023

rlucas7 commented Oct 3, 2023 • edited

rlucas7 commented Oct 4, 2023

rlucas7 commented Oct 5, 2023 • edited

steppi left a comment

Choose a reason for hiding this comment

steppi Oct 5, 2023

Choose a reason for hiding this comment

rlucas7 Oct 7, 2023

Choose a reason for hiding this comment

steppi Oct 5, 2023

Choose a reason for hiding this comment

steppi commented Oct 5, 2023

rlucas7 commented Oct 7, 2023

steppi commented Oct 7, 2023

dschmitz89 left a comment

Choose a reason for hiding this comment

rlucas7 commented Oct 9, 2023

steppi commented Oct 9, 2023 • edited

steppi commented Oct 16, 2023

rlucas7 commented Oct 18, 2023 via email

steppi commented Oct 18, 2023

steppi commented Oct 19, 2023

rlucas7 commented Oct 23, 2023 • edited

rlucas7 commented Oct 23, 2023

steppi commented Oct 23, 2023 • edited

rlucas7 commented Nov 25, 2023

rlucas7 commented Nov 25, 2023

rlucas7 commented Nov 26, 2023 • edited

steppi commented Nov 26, 2023

steppi commented Nov 26, 2023 • edited

rlucas7 commented Nov 26, 2023

rlucas7 commented Nov 26, 2023

steppi commented Nov 26, 2023 • edited

rlucas7 commented Nov 27, 2023

rlucas7 commented Nov 27, 2023 • edited

steppi left a comment

Choose a reason for hiding this comment

steppi commented Nov 30, 2023

rlucas7 commented Sep 24, 2023 •

edited

rlucas7 commented Oct 3, 2023 •

edited

rlucas7 commented Oct 5, 2023 •

edited

steppi commented Oct 9, 2023 •

edited

rlucas7 commented Oct 23, 2023 •

edited

steppi commented Oct 23, 2023 •

edited

rlucas7 commented Nov 26, 2023 •

edited

steppi commented Nov 26, 2023 •

edited

steppi commented Nov 26, 2023 •

edited

rlucas7 commented Nov 27, 2023 •

edited