ENH: Make special C++ implementations work on CUDA (and beyond!) #19601

izaid · 2023-11-26T14:31:36Z

Reference issue

RFC 19404

What does this implement/fix?

This is the next step in the C++ migration of sp.special. What it does in particular is update the new C++ lambertw to work on CUDA. This required a bit of extra work and boilerplate, but after that was done the code looks almost the same. By doing this now, we'll make more special functions CUDA-compatible (and array API compatible more generally) as we go.

What you will note is that NVRTC lacks some of the standard library. People are aware of that and there are many ways to work around it. What I've done, and will advocate for here, is a simple config.h file that checks to see if certain std include headers exist. If they don't, we either alias existing functions or types, or simply add them ourselves. A small amount of work can take us a long way here. I have chosen to define them directly in std, rather than declare some other namespace, and we can talk more about that if people want.

NVIDIA is working on this more generally, see https://github.com/nvidia/cccl

izaid · 2023-11-26T14:33:39Z

We'll need to figure out if / how to test this. The long term goal is to propose a special extension for the array API, but we need to be a bit the change we want to see.

Here is a script I've been using that shows how to use CuPy to turn this into a CuPy ufunc and run it:

import cupy as cp

import numpy as np

import scipy as sp
import scipy.special

preamble = '#include <special/lambertw.h>'

kern = cp.ElementwiseKernel('complex128 z, int64 k, float64 tol', 'complex128 out', 'out = special::lambertw(z, k, tol)', 'lambertw',
    preamble = preamble, options = (f'--include-path={sp.special._get_include()}', '-std=c++14'))

z = np.arange(10, dtype = complex)
k = np.asarray(10, dtype = cp.int64)
tol = np.asarray(0.01, dtype = float)

out_sp = sp.special.lambertw(z, k, tol)

z = cp.arange(10, dtype = complex)
k = cp.asarray(10, dtype = cp.int64)
tol = cp.asarray(0.01, dtype = float)

out_cp = kern(z, k, tol)

print()
print(out_sp)
print(out_cp)

izaid · 2023-11-26T14:35:12Z

cc @steppi @mdhaber @tirthasheshpatel

ev-br · 2023-11-26T15:21:23Z

A huge thank you for this!

(had recently to copy-caste some cephes code into https://github.com/cupy/cupy/tree/main/cupyx/scipy/special --- would be great to avoid copy-pasting going forward)

izaid · 2023-11-26T15:32:17Z

A huge thank you for this!

(had recently to copy-caste some cephes code into https://github.com/cupy/cupy/tree/main/cupyx/scipy/special --- would be great to avoid copy-pasting going forward)

That is the plan! I believe @steppi has his 👀 on all of Cephes, possibly quite soon.

steppi · 2023-11-26T16:22:59Z

Very nice @izaid!

One small thing; can you put the formatting changes in a separate commit so it’s easier to look at the diffs?

For the test, I think we can just add a test in scipy/special/tests that gets skipped if CuPy isn’t available.

I’ll give this a closer look tomorrow.

steppi · 2023-11-26T16:31:17Z

That is the plan! I believe @steppi has his 👀 on all of Cephes, possibly quite soon.

I’ll do my best to have a PR ready this coming week.

izaid · 2023-11-26T17:34:39Z

@steppi I have reverted the formatting, and will save that for a follow-up PR.

I have some thoughts on how to test sensibly, but I'd actually like to leave that also for a follow-up PR. It will require some effort, and I'd like to get these C++ changes in so as to not conflict with your and others work. Of course, there is already testing in this PR by virtue of the host code being the default SciPy implementation.

steppi · 2023-11-26T18:43:45Z

@steppi I have reverted the formatting, and will save that for a follow-up PR.

I have some thoughts on how to test sensibly, but I'd actually like to leave that also for a follow-up PR. It will require some effort, and I'd like to get these C++ changes in so as to not conflict with your and others work. Of course, there is already testing in this PR by virtue of the host code being the default SciPy implementation.

Ok. I can accept adding the CuPy tests later as long as this PR doesn’t break anything in SciPy.

steppi · 2023-11-26T18:46:52Z

I’m seeing relevant test failures due to redefinitions of constants.

steppi · 2023-11-26T21:15:54Z

Nice. Looks like that did it.

izaid · 2023-11-26T21:19:08Z

Yeah, I think something unusual was going on in the build system for special, but I sort of guessed how to sidestep it.

steppi

I had a synchronous conversation with @izaid, who explained how all of this works, since I don't have experience with CUDA programming.

In summary, when run on CUDA, we need to add __host__ and __device__ to the function signatures to tell the compiler we want them to be able to run on CPU and GPU respectively. There are also constants and standard library functions that are missing on CUDA, so if the preprocessor detects these are missing, the code defines them. Thrust is used for missing standard library functions, and only those actually needed for lambertw are included. We'll add more as they are needed. It will be up to downstream users of this library to ensure the needed dependencies are available and linked.

This PR has no impact on the behavior of this code on the SciPy side, so I think it's fine for us not to add new tests. izaid has verified that this code will work with CuPy locally.

Amazing work izaid, really happy to see this!

jrhemstad · 2023-11-30T21:35:06Z

Howdy, I lead the team that maintains https://github.com/NVIDIA/cccl

I wanted to follow up with deferring to Thrust for the missing standard library functionality.

I'd highly recommend using libcudacxx instead of Thrust as libcudacxx is intended specifically for this kind of use case where you want a heterogeneous C++ Standard library implementation that works with NVRTC.

So instead of thrust::complex you use cuda::std::complex or instead of std::numeric_limits, cuda::std::numeric_limits.

I'd also strongly recommend against injecting things into the std:: namespace as that is undefined behavior and can lead to all sorts of nasty problems.

izaid · 2023-11-30T21:48:32Z

Hi @jrhemstad! Thanks for the helpful thoughts. Indeed, I'm keeping an eye on cccl, and we'll be looking to use cuda::std in general as soon as we can. Right now, the priority is making things work with CuPy, so we need to explore whether cuda::std is available there.

Of course, noted about std injection, and am well aware of that. It's a choice we're making just for now, in this particular set of circumstances, but we may well change that as we go.

Would certainly be great to be in touch! Am sure a lot of the work you are doing will be quite useful here.

jakirkham · 2023-11-30T21:50:15Z

Right now, the priority is making things work with CuPy, so we need to explore whether cuda::std is available there.

cc @leofang (who may know)

izaid · 2023-11-30T21:53:42Z

Also noting that I do intend to add tests to SciPy as soon as I can, essentially by generating CuPy ufuncs with ElementwiseKernel. I think this is non-trivial, as it's testing over multiple array APIs and without a specific extension existing. I have some thoughts on how to do this reasonably.

steppi · 2023-12-01T02:49:38Z

Welcome @jrhemstad! I'm so happy you found your way here. We'd been meaning to try to reach out to someone at NVIDIA about the work we're beginning. I now recall @izaid had told me about libcudacxx when we spoke Monday. He told me he chose thrust because he knew it was vendored by CuPy. He'd also explained the issues with injecting into std and that it is just a temporary expedient. Our work is still tentative, and you should think of this PR as a proof of concept. We really appreciate the advice, and like izaid said, it would be great if you could be in touch while we carry out this project.

jrhemstad · 2023-12-01T14:02:37Z

Happy to help any way I can! Let me know if you want to set up a time to chat more in depth and I'd be happy to answer any questions or explain where we're going.

izaid and others added 2 commits November 26, 2023 14:17

Changes to special C++ code to make lambertw work on CUDA

842f605

Merge branch 'scipy:main' into main

de26fc7

izaid requested review from person142 and steppi as code owners November 26, 2023 14:31

lucascolley added scipy.special enhancement A new feature or improvement C/C++ Items related to the internal C/C++ code base labels Nov 26, 2023

ev-br mentioned this pull request Nov 26, 2023

Tracker: cupyx.scipy.signal cupy/cupy#7403

Open

izaid added 2 commits November 26, 2023 17:31

Undid formatting, saving that for a follow-up PR

0f01add

Merge branch 'main' of https://github.com/izaid/scipy

1fe7a8a

izaid added 2 commits November 26, 2023 19:31

Attempt to avoid redefinition of M_E

54cd120

removed complex.h as a standalone file

db3a1c9

steppi approved these changes Nov 30, 2023

View reviewed changes

steppi merged commit 9a43b5c into scipy:main Nov 30, 2023
26 of 27 checks passed

steppi added this to the 1.12.0 milestone Nov 30, 2023

jakirkham mentioned this pull request Nov 30, 2023

Supporting CUDA-enabled SciPy conda-forge/scipy-feedstock#261

Closed

izaid mentioned this pull request Dec 1, 2023

MAINT: Added clang-format formatting for special C++ files #19613

Merged

steppi mentioned this pull request Dec 12, 2023

MAINT: Add binom to new C++ special lib along with its cephes dependencies #19681

Merged

steppi mentioned this pull request Jan 24, 2024

Add lambertw function cupy/cupy#8140

Merged

steppi mentioned this pull request Mar 9, 2024

META: Streamlined Special Function Development in SciPy SDG Tracking #20223

Open

26 tasks

steppi mentioned this pull request Mar 26, 2024

RFC: Adoption of std::mdspan like structures in C++ code #20334

Open

steppi mentioned this pull request Apr 5, 2024

MAINT/BLD: Remove stats._boost and add the distribution related functions to scipy.special._ufuncs #20393

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Make special C++ implementations work on CUDA (and beyond!) #19601

ENH: Make special C++ implementations work on CUDA (and beyond!) #19601

izaid commented Nov 26, 2023

izaid commented Nov 26, 2023 •

edited

izaid commented Nov 26, 2023

ev-br commented Nov 26, 2023

izaid commented Nov 26, 2023

steppi commented Nov 26, 2023 •

edited

steppi commented Nov 26, 2023 •

edited

izaid commented Nov 26, 2023

steppi commented Nov 26, 2023

steppi commented Nov 26, 2023

steppi commented Nov 26, 2023

izaid commented Nov 26, 2023

steppi left a comment •

edited

jrhemstad commented Nov 30, 2023 •

edited

izaid commented Nov 30, 2023

jakirkham commented Nov 30, 2023

izaid commented Nov 30, 2023

steppi commented Dec 1, 2023

jrhemstad commented Dec 1, 2023

ENH: Make special C++ implementations work on CUDA (and beyond!) #19601

ENH: Make special C++ implementations work on CUDA (and beyond!) #19601

Conversation

izaid commented Nov 26, 2023

Reference issue

What does this implement/fix?

izaid commented Nov 26, 2023 • edited

izaid commented Nov 26, 2023

ev-br commented Nov 26, 2023

izaid commented Nov 26, 2023

steppi commented Nov 26, 2023 • edited

steppi commented Nov 26, 2023 • edited

izaid commented Nov 26, 2023

steppi commented Nov 26, 2023

steppi commented Nov 26, 2023

steppi commented Nov 26, 2023

izaid commented Nov 26, 2023

steppi left a comment • edited

Choose a reason for hiding this comment

jrhemstad commented Nov 30, 2023 • edited

izaid commented Nov 30, 2023

jakirkham commented Nov 30, 2023

izaid commented Nov 30, 2023

steppi commented Dec 1, 2023

jrhemstad commented Dec 1, 2023

izaid commented Nov 26, 2023 •

edited

steppi commented Nov 26, 2023 •

edited

steppi commented Nov 26, 2023 •

edited

steppi left a comment •

edited

jrhemstad commented Nov 30, 2023 •

edited