Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add exact=False support for stirling2 #19287

Merged
merged 10 commits into from Nov 30, 2023
Merged

ENH: add exact=False support for stirling2 #19287

merged 10 commits into from Nov 30, 2023

Conversation

rlucas7
Copy link
Member

@rlucas7 rlucas7 commented Sep 24, 2023

  • adding preliminary support for exact=False support to stirling2 for now only adds inclusion-exclusion via Lanczos which works well up to around N=33 and then has overflow issues.

Reference issue

Adds preliminary exact=False support for Stirling numbers of the second kind
#17890

What does this implement/fix?

adding preliminary support for exact=False support to stirling2

Additional information

@steppi During review from prior PR we decided to include exact=False support.

This PR adds initial support for inclusion-exclusion via Lanczos approximation.

Still needs to add:

  • support for Temme for n > 33 and other places where it is more accurate than Lanczos

@rlucas7 rlucas7 added enhancement A new feature or improvement scipy.special labels Sep 24, 2023
@steppi
Copy link
Contributor

steppi commented Sep 24, 2023

Thanks @rlucas7! Some quick comments/suggestions

  • I think the exact=False case should be handled in a ufunc with the kernel written in C, C++, or Cython.
  • Depending on the parameters one should use either:
    • The inclusion/exclusion formula with Lanczos approximation like in this PR
    • The dynamic programming algorithm in C calculated with doubles instead of ints
    • Temme's asymptotic expansion. 2nd order for now since that's what we have, but ideally we'd work out higher order expansions.
  • The key bit of work here is finding out for which inputs one should use each of the above three methods, trading off on accuracy, time, and memory. Numerical experiments seems to be the way to go here.

@rlucas7
Copy link
Member Author

rlucas7 commented Sep 24, 2023

Thanks @rlucas7! Some quick comments/suggestions

* I think the `exact=False` case should be handled in a ufunc with the kernel written in C, C++, or Cython.

* Depending on the parameters one should use either:
      
  ....
  * The dynamic programming algorithm in C calculated with doubles instead of ints

I don't think the DP approach really fits the workflow of a ufunc though, IIUC every value is computed in a parallel way. Is there some point I'm missing?

@steppi
Copy link
Contributor

steppi commented Sep 24, 2023

Thanks @rlucas7! Some quick comments/suggestions

* I think the `exact=False` case should be handled in a ufunc with the kernel written in C, C++, or Cython.

* Depending on the parameters one should use either:
      
  ....
  * The dynamic programming algorithm in C calculated with doubles instead of ints

I don't think the DP approach really fits the workflow of a ufunc though, IIUC every value is computed in a parallel way. Is there some point I'm missing?

NumPy ufuncs don't employ parallelization. They just loop through and calculate values in a compiled language without any Python overhead. At one point while working on stirling2 you had a ufunc which used the dynamic programming algorithm, but the computation was with 64 bit ints not doubles. For not too large values of n, the dp algorithm is so fast that any redundant computation will be negligible.

@rlucas7
Copy link
Member Author

rlucas7 commented Oct 3, 2023

... At one point while working on stirling2 you had a ufunc which used the dynamic programming algorithm, but the computation was with 64 bit ints not doubles. For not too large values of n, the dp algorithm is so fast that any redundant computation will be negligible.

@steppi Looking into the DP in ufunc and I had a question

In the stirling2 PR there was also some concern about doing a memory allocation (specifically malloc) inside each ufunc call-we need one to store the solution to sub-problems.

Is there a different way to do the memory allocation or are we ok with doing it that way?

@rlucas7
Copy link
Member Author

rlucas7 commented Oct 4, 2023

This passes builds for me locally but seems like different math.h constants in the test-runners on gh don't have M_E as the symbol for the constant?

looks like I can use the e constant here which should fix all but one of the build issues.

Other one is a linter issue from the rebase I just did to get the action to run.

@rlucas7
Copy link
Member Author

rlucas7 commented Oct 5, 2023

Ok, looks like the e part is fixed. There is still some doc build issue, I always get confused on which thing needs to go where here in special.

@steppi | @person142 It looks like I need to add function names to some constituent part of __all__ but I don't think we want to expose the inexact dp, nor the lanczos c ufuncs. I think I need to modify the function names to prefix with an _stirling2_{inexact,lanczos} but not sure where. Do you know where I need to modify the function?

Copy link
Contributor

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few brief comments, I haven't looked at this closely yet.

Also just a reminder, before this goes in I'd like to see

  1. At least the second order asymptotic expansion implemented.
  2. Some experiments showing how the regimes where the different implementations are used was decided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change got pulled in somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the rebase-will remove

Comment on lines 14458 to 14473
add_newdoc("stirling2_inexact",
r"""
private method do not use
""")


add_newdoc("stirling2_lanczos",
r"""
private method do not use
""")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private functions in special should have a leading underscore in their names. Also, the language used for other private functions is

"Internal function, do not use."

scipy/special/cephes/stirling2.c Outdated Show resolved Hide resolved
scipy/special/cephes/stirling2.c Outdated Show resolved Hide resolved
@steppi
Copy link
Contributor

steppi commented Oct 5, 2023

I think I need to modify the function names to prefix with an _stirling2_{inexact,lanczos} but not sure where. Do you know where I need to modify the function?

They need to be modified in functions.json (and correspondingly, in _add_newdocs.py.)

@steppi steppi changed the title ENH: add Lanczos support for stirling2 ENH: add exact=False support for stirling2 Oct 5, 2023
@rlucas7
Copy link
Member Author

rlucas7 commented Oct 7, 2023

Just a few brief comments, I haven't looked at this closely yet.

Also just a reminder, before this goes in I'd like to see

  1. At least the second order asymptotic expansion implemented.

Not clear to me how to get access to lambertw_scalar in the stirling2.c code

  1. Some experiments showing how the regimes where the different implementations are used was decided.

Yes. I will do these once I have all 3 methods implemented at C level. I plan to use inexact and a dispatch table to call out to temme/lanczos/dp by (n,k). my plan is to first implement all 3 methods as separate functions at c level and expose to python level. Doing it this way makes it easier (for me) to run the experiments for switching between the 3 approaches based on different n, k values.

Let me know if you've got a better way.

@steppi
Copy link
Contributor

steppi commented Oct 7, 2023

Not clear to me how to get access to lambertw_scalar in the stirling2.c code

I’ll post a small example when I have time later today or tomorrow.

Doing it this way makes it easier (for me) to run the experiments for switching between the 3 approaches based on different n, k values.

Ok, yeah that makes sense, just mark your PR as draft until you think it’s ready to go in.

Copy link
Contributor

@dschmitz89 dschmitz89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rlucas7 , I appreciate the review request but this is way outside my expertise. I had not even heard of Stirling numbers before.

@rlucas7 rlucas7 marked this pull request as draft October 9, 2023 01:52
@rlucas7
Copy link
Member Author

rlucas7 commented Oct 9, 2023

C/H files are up in top level of special and found there after build.

Not clear to me how to get access to lambertw_scalar in the stirling2.c code

I’ll post a small example when I have time later today or tomorrow.

Once you are able to post the code for this I can finish the Temme approximation code in stirling.c.

Then I'll do the experiments to determine when to switch between Lanczos/DP(doubles)/Temme and add the dispatch to stirling_inexact().

@steppi
Copy link
Contributor

steppi commented Oct 9, 2023

Once you are able to post the code for this I can finish the Temme approximation code in stirling.c.

Sorry, it's trickier than I thought because of the complexity of SciPy's build system (or maybe just my lack of familiarity). I'm not sure what changes would need to be made here to make lambertw_scalar (and also binom from _orthogonal_eval.pxd) available in stirling.c. If we do figure that out, it would be as simple as adding

#include "_lambertw.h"
#include "_orthogonal_eval.h"

I've only done this in small projects with a setup.py build, in that case if you have an extension where the sources contain a .c and a .pxd, you can directly include from the .h generated from the .pxd.

I think the simplest thing to do is just to write stirling2 in Cython instead of C. Then you can just do

from ._lambertw cimport lambertw_scalar
from ._orthogonal_eval cimport binom

scipy/special/stirling2.c Outdated Show resolved Hide resolved
@steppi
Copy link
Contributor

steppi commented Oct 16, 2023

@rlucas7, long term, I think the best course of action will be to rewrite lambertw, binom, (and all other special functions in SciPy that are implemented in Cython) into C++. This will make it easier to call them from other functions, and also opens up the possibility of using them on the GPU in CuPy, and other array computing libraries.

@izaid has kindly offered to help review PRs to rewrite such functions in C++. I plan to submit PRs for these two first. At the moment I think your options are to either follow through implementing the expansion in Cython, and we can convert later, or to wait until the lambertw and binom have been rewritten.

@rlucas7
Copy link
Member Author

rlucas7 commented Oct 18, 2023 via email

@steppi
Copy link
Contributor

steppi commented Oct 18, 2023

@rlucas7. I plan to have PRs submitted for binom and lambertw within the next two weeks and plan to make an RFC issue and dev list posting for the Cython->C++ proposal tomorrow. I'll ping you on the two PRs and the issue when I make them.

@steppi
Copy link
Contributor

steppi commented Oct 19, 2023

@rlucas7 while you’re blocked due to the current state of lambertw and binom I think the best thing to do would be to just use a Python implementation of the asymptotic expansion in your experiments to help determine how the choice of method should depend on n and k.

@rlucas7
Copy link
Member Author

rlucas7 commented Oct 23, 2023

Also just a reminder, before this goes in I'd like to see

  1. At least the second order asymptotic expansion implemented.
  2. Some experiments showing how the regimes where the different implementations are used was decided.

Sure, here you go. My decision based on looking at these is to NOT USE either Lanczos or Temme (this was a bit of a surprise for me). The DP approach is pretty accurate up through the space of values that are covered by valid doubles. Neither Lanczos (applied to inclusion-exclusion formula) nor the Temme 2nd order approximation come close in terms of accuracy.

stirling2_approx_comparisons.txt

Let me know if you disagree.

Here's the source I used for approximations:

# bring into dev env via `python dev.py python`
import numpy as np
import scipy.special as sc

def stirling2_asymptotic_order2(n, k):
    n, k = np.asarray(n,dtype=float), np.asarray(k,dtype=float)
    mu = k/n
    delta = 1/mu * np.exp(-1/mu)
    x0 = 1/mu + sc.lambertw(-delta)
    t0 = (n - k) / k
    F = np.sqrt(t0/((1 + t0)*(x0 - t0)))
    A = -n*np.log(x0) + k*np.log(np.exp(x0) - 1) - k*t0 + (n - k)*np.log(t0)
    F1 = (
        -2*x0**3 + 2*t0**5 + 4*t0**3 + 4*t0**4 + 3*x0**2 * t0 - 6*x0*t0**4
        - 5*x0**2 * t0**2 + 2*x0**4 * t0 + x0**3*t0 - 6*x0**3 * t0**2
        + 8*x0**2 * t0**3
    ) / (24*F*(1 + t0)**2*(x0 - t0)**4)
    try:
        result = np.exp(A)*k**(n - k)*sc.binom(n, k)*(F - F1/k)
    except OverflowError:
        result = np.inf
    return result.real


rows = []
for n in range(3, 221): # I get an overflow error at n=220
    args = ([n], list(range(1,n))) # the last value is n-1, including `n` value gives a nan
    exact = sc.stirling2(*args, exact=True)
    asymptotic2 = stirling2_asymptotic_order2(n=args[0], k=args[1])
    rel_error =  np.abs(exact - asymptotic2) / np.abs(exact)
    print(f"n: {n} (Temme-2) max rel_error: {max(rel_error)}")
    dp2 = sc.stirling2_dp(args[0], args[1])
    rel_error =  np.abs(exact - dp2) / np.abs(exact)
    print(f"n: {n} (dp) max rel_error: {max(rel_error)}")
    lanczos = sc.stirling2_lanczos(args[0], args[1])
    rel_error =  np.abs(exact - lanczos) / np.abs(exact)
    print(f"n: {n} (Lanczos) max rel_error: {max(rel_error)}")
    # we also have errors on the dp approx and on the Lanczos approx

that has some changes I'm going to push now (nothing substantive changed from previous commit though).

@rlucas7
Copy link
Member Author

rlucas7 commented Oct 23, 2023

@steppi unless you disagree w/findings in the previous post, I can delete all the other C stuff and leave only the DP approach. This may also impact the prioritization of the C++ migration of lambertw and binom from cython discussed in
#19404

@steppi
Copy link
Contributor

steppi commented Oct 23, 2023

@rlucas7, we need to take into account time and memory efficiency for exact=False. The asymptotic expansion will be much faster for large parameters and not require large memory allocations. Inclusion exclusion should only work well for small k, I’m not sure if we’d want to use it anywhere. The 2nd order expansion from Temme is pretty inaccurate in the regimes we’re looking at. Eventually we should try higher order.

@rlucas7
Copy link
Member Author

rlucas7 commented Nov 25, 2023

How’s this coming along @rlucas7? I’m willing to merge as soon as I verify you have it working regardless of C++ style considerations.

I've been working on timings as a follow up to this comment. When we compare only the approximate versions of DP and temme, the runtime of the approximate versions do start to scale linearly with n once we get above n=45 or so.

runtimes_temme_dp

so for the dispatch I'll use n>50 to cut over to Temme.

from scipy.special import stirling2_dp, stirling2_temme, stirling2
import matplotlib.pyplot as plt
import timeit

y_temme = []
y_dp = []
x = list(range(10, 105, 5))
for n in x:
    s_temme = f"""n={n}; stirling2_temme([{n}], list(range(1,{n})))"""
    y_temme.append(timeit.timeit(stmt=s_temme, number=10000, setup="from scipy.special import stirling2_temme"))
    s_dp = f"""n={n}; stirling2_dp([{n}], list(range(1,{n})))"""
    y_dp.append(timeit.timeit(stmt=s_dp, number=10000, setup="from scipy.special import stirling2_dp"))


fig, ax = plt.subplots()
line1, = ax.plot(x, y_temme, linewidth=2.0, label="Temme")
line2, = ax.plot(x, y_dp, linewidth=2.0, label="DP")
ax.legend(handles=[line1, line2])
plt.title(label="Runtimes")
plt.show()

^ code used to generate the figure

@rlucas7
Copy link
Member Author

rlucas7 commented Nov 25, 2023

Looks like the _lambertw.h code isn't getting picked up in the build process on github, this works for me locally, @steppi is there something that looks incorrect here to you?

Screen Shot 2023-11-25 at 4 54 41 PM

scipy/special/stirling2.h Outdated Show resolved Hide resolved
  * adding preliminary support for exact=False support to stirling2
    for now only adds partial Temme and no Lanczos b/c dp works well
    for small N and Lancozs has overflow issuesfor N>32.

add sitrling2_inexact dp ufunc

add stirling2_lanczos c version

fix dtype interactive rebase error

move c code out of cephes to top level of special

add more cleanup code

attempt cxx move

inline stirling2 for g++ and other related changes

cleanup and apply horner
@rlucas7
Copy link
Member Author

rlucas7 commented Nov 26, 2023

I'm seeing

FAILED special/tests/test_basic.py::TestStirling2::test_numpy_array_unsigned_int_dtype -
 TypeError: ufunc '_stirling2_inexact' not supported for the input types, and the 
inputs could not be safely coerced to any supported types according to the 
casting rule ''safe''

both locally and in the test harness, I'm not sure how we handle this with the c++ setup here in special, @steppi have you seen this before? (any ideas?)

@steppi
Copy link
Contributor

steppi commented Nov 26, 2023

I'm seeing

FAILED special/tests/test_basic.py::TestStirling2::test_numpy_array_unsigned_int_dtype -
 TypeError: ufunc '_stirling2_inexact' not supported for the input types, and the 
inputs could not be safely coerced to any supported types according to the 
casting rule ''safe''

both locally and in the test harness, I'm not sure how we handle this with the c++ setup here in special, @steppi have you seen this before? (any ideas?)

unsigned int dtype isn't supported in the ufunc infrastructure. These are the allowed typecodes in functions.json

The input parameter types are denoted by single character type
codes, according to

   'f': 'float'
   'd': 'double'
   'g': 'long double'
   'F': 'float complex'
   'D': 'double complex'
   'G': 'long double complex'
   'i': 'int'
   'l': 'long'
   'v': 'void'

I think just get rid of that test for exact=False and clearly document which input types work for exact=True and exact=False.

@steppi
Copy link
Contributor

steppi commented Nov 26, 2023

Or I think it might just work if you replace "ii->d" in functions.json with "ll->d" since a 32 bit unsigned int can be safely cast to a 64 bit int.

Nevermind that, the test has 64 bit unsigned longs. I think the "ii" should be replaced by "ll" anyway though because I know stirling2(n, n-1) at least will fit in a double for n pretty large.

@rlucas7
Copy link
Member Author

rlucas7 commented Nov 26, 2023

Or I think it might just work if you replace "ii->d" in functions.json with "ll->d" since a 32 bit unsigned int can be safely cast to a 64 bit int.

Nevermind that, the test has 64 bit unsigned longs. I think the "ii" should be replaced by "ll" anyway though because I know stirling2(n, n-1) at least will fit in a double for n pretty large.

The build fails for me locally if I swap "ll" for "ii", so I don't think that type is currently supported here. I seem to recall this issue was surfaced a few years back when I was doing other special changes but I don't recall the details.

@rlucas7
Copy link
Member Author

rlucas7 commented Nov 26, 2023

... unsigned int dtype isn't supported in the ufunc infrastructure.
...
I think just get rid of that test for exact=False and clearly document which input types work for exact=True and exact=False.

Should we add an input validation and raise a TypeError for this?
(inside the python portion of stirling2())

@steppi
Copy link
Contributor

steppi commented Nov 26, 2023

Should we add an input validation and raise a TypeError for this?
(inside the python portion of stirling2())

Yeah. And if we do that we can make the ufunc take doubles and do input validation and convert to double before passing. I think the overhead should be negligible.

@rlucas7
Copy link
Member Author

rlucas7 commented Nov 27, 2023

Should we add an input validation and raise a TypeError for this?
(inside the python portion of stirling2())

Yeah. And if we do that we can make the ufunc take doubles and do input validation and convert to double before passing. I think the overhead should be negligible.

Ok I think I understand, with this way we can handle the uint cases too. This means that there is no need to remove the uint dtypes test case for the exact=False case. Instead we typecast the uint arguments (n,k) to doubles and pass them into the dispatcher for routing. This seems to work fine so I left it all in with a note to describe what's going on in this scenario in case it surfaces in future maintenance work.

If no major issues from test harness-I'll change from draft status tomorrow.

@rlucas7 rlucas7 marked this pull request as ready for review November 27, 2023 04:36
@rlucas7
Copy link
Member Author

rlucas7 commented Nov 27, 2023

...
If no major issues from test harness-I'll change from draft status tomorrow.

I'm a few minutes before tomorrow (20 or so) but everything is green in the test harness. @steppi I think this is ready for a closer scrutiny.

Copy link
Contributor

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job getting this working. Still needs some cleaning up, but nothing too major.

scipy/special/_add_newdocs.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_cephes.pxd Outdated Show resolved Hide resolved
scipy/special/stirling2.h Show resolved Hide resolved
scipy/special/stirling2.h Outdated Show resolved Hide resolved
scipy/special/tests/test_basic.py Outdated Show resolved Hide resolved
scipy/special/tests/test_basic.py Outdated Show resolved Hide resolved
scipy/special/tests/test_basic.py Show resolved Hide resolved
@steppi
Copy link
Contributor

steppi commented Nov 30, 2023

Nice work @rlucas7! Thanks for all your patience here.

@steppi steppi merged commit 7da0e75 into scipy:main Nov 30, 2023
26 of 28 checks passed
@steppi steppi added this to the 1.12.0 milestone Nov 30, 2023
@steppi steppi mentioned this pull request Dec 4, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants