[CUDA][Win32] Add `fma(long double,..)` to math forward declares. #73756

fodinabor · 2023-11-29T07:02:41Z

As per AdaptiveCpp/AdaptiveCpp#1256 - we are missing the fma long double variant for Cpp20 compat with MS-STL.

See also #49853.
-> I'm wondering if we should include this for all long double variants as defined in https://github.com/microsoft/STL/blob/730af178f7c79bd75d414627f969550775e14994/stl/inc/cmath#L291 ?

CC @blinkfrog

llvmbot · 2023-11-29T07:03:09Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: None (fodinabor)

Changes

As per AdaptiveCpp/AdaptiveCpp#1256 - we are missing the fma long double variant for Cpp20 compact with MS-STL.

See also #49853.
-> I'm wondering if we should include this for all long double variants as defined in https://github.com/microsoft/STL/blob/730af178f7c79bd75d414627f969550775e14994/stl/inc/cmath#L291 ?

CC @blinkfrog

Full diff: https://github.com/llvm/llvm-project/pull/73756.diff

1 Files Affected:

(modified) clang/lib/Headers/__clang_cuda_math_forward_declares.h (+3)

diff --git a/clang/lib/Headers/__clang_cuda_math_forward_declares.h b/clang/lib/Headers/__clang_cuda_math_forward_declares.h
index c0f1f47cc99302a..2d1bdd0f9bb0f41 100644
--- a/clang/lib/Headers/__clang_cuda_math_forward_declares.h
+++ b/clang/lib/Headers/__clang_cuda_math_forward_declares.h
@@ -70,6 +70,9 @@ __DEVICE__ double floor(double);
 __DEVICE__ float floor(float);
 __DEVICE__ double fma(double, double, double);
 __DEVICE__ float fma(float, float, float);
+#ifdef _MSC_VER
+__DEVICE__ long double fma(long double, long double, long double);
+#endif
 __DEVICE__ double fmax(double, double);
 __DEVICE__ float fmax(float, float);
 __DEVICE__ double fmin(double, double);

Artem-B · 2023-12-04T18:41:22Z

clang/lib/Headers/__clang_cuda_math_forward_declares.h

@@ -70,6 +70,9 @@ __DEVICE__ double floor(double);
 __DEVICE__ float floor(float);
 __DEVICE__ double fma(double, double, double);
 __DEVICE__ float fma(float, float, float);
+#ifdef _MSC_VER
+__DEVICE__ long double fma(long double, long double, long double);


CUDA does not provide GPU-side fma for long double.

This declaration probably warrants a comment about this, because whoever ends up using it on a GPU will be surprised when their code compiles, but fails somewhere in ptxas, which will look like a compiler crash.

That's right, CUDA has provided long double fma since 10.2 but for HOST only:

inline _LIBCUDACXX_INLINE_VISIBILITY long double fma(long double __lcpp_x, long double __lcpp_y, long double __lcpp_z) _NOEXCEPT {return ::fmal(__lcpp_x, __lcpp_y, __lcpp_z);}

If a declaration of device long double fma is needed in order to parse it only and with disallowing its usage on GPU, then it is better to add it to clang/lib/Headers/cuda_wrappers/cmath under _LIBCPP_STD_VER and _LIBCPP_HIDE_FROM_ABI; and it would need a corresponding GPU test, of course.

We already have a handful of MSVC-specific no-implementation functions declared here, so one more is OK.
I just want to document it better.

You may want to add a macro with a descriptive name. (E.g. CUDA_ALLOW_LONG_DOUBLE_MATH_FUNCTION_DECLS) and have it defined for MSVC, along with the comment why it's needed. Then replace the existing #if _MSC_VER with it.

Yes, we have. But how do we prevent them from running on GPU?

Given that there's no implementation for these functions, the reference will remain unresolved and GPU-side executable linking will fail.

That is it, it is only a compile-time story, and the situation when the successfully compiled code "fails somewhere in ptxas" on runtime would never happen till CUDA implementation of long double fma appears, or am I missing something?

Correct. If someone ends up using fma(long double) in the GPU-side code, we will fail at build time, until the implementation is actually available. Failure in ptxas is not the best way to diagnose it, but it's better than failing to compile valid code because of a CUDA compatibility quirk in a standard library.

blinkfrog · 2023-12-06T12:01:09Z

I would like to share some thoughts regarding the proposed fix involving the #ifdef _MSC_VER check in the LLVM PR for the fma function ambiguity issue. I am sorry if I say something stupid.

In the typical workflow with AdaptiveCpp and CUDA backend on Windows, MSVC is used only for compiling LLVM and Clang. However, the actual compilation of SYCL programs targeting the CUDA backend is performed using Clang with the AdaptiveCpp plugin.

Given this workflow, the #ifdef _MSC_VER check may not be effective because _MSC_VER is specific to the MSVC compiler and is not defined when using Clang, even on Windows. Consequently, this check might not activate during our compilation process, and the ambiguity issue with the fma function may persist.

A more suitable approach might be to use a different check that can effectively detect the Windows platform in the Clang environment, such as _WIN32, ensuring that the fix is applied in the relevant scenarios. Probably, better would be to check _MSVC_STL_VERSION , but this probably won't work, as standard libraries, where it is defined, are included after LLVM headers I think.

I wanted to bring this to your attention to ensure that the proposed solution addresses the issue in the context of workflows similar to ours.

Thank you for considering this point.

Artem-B · 2023-12-07T23:44:45Z

I'm not familiar enough with MSVC.

@rnk -- what's the best way to check for compilation with microsoft's stardard C++ library?

rnk · 2023-12-08T00:20:38Z

@rnk -- what's the best way to check for compilation with microsoft's stardard C++ library?

If Clang is compiling with the MSVC STL headers, it should be defining _MSC_VER, usually -fms-compatibilty has to be enabled to compile MSVC STL headers. However, I searched, and it looks like the MSVC STL defines this macro, _MSVC_STL_VERSION . I don't know when they started defining that, but it has the right meaning.

emankov · 2024-01-09T12:33:07Z

Hello, @fodinabor and @blinkfrog

the #ifdef _MSC_VER check may not be effective because _MSC_VER is specific to the MSVC compiler and is not defined when using Clang, even on Windows

Does it mean that this fix doesn't solve AdaptiveCpp/AdaptiveCpp#1256? Or, is blinkfrog saying above about yet other cases?

[CUDA][Win32] Add fma(long double,..) to math forward declares.

25b8080

fodinabor requested review from jdoerfert, emankov and aaronenyeshi November 29, 2023 07:02

llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Nov 29, 2023

fodinabor requested a review from Artem-B December 4, 2023 13:37

Artem-B reviewed Dec 4, 2023

View reviewed changes

aaronenyeshi removed their request for review February 15, 2024 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA][Win32] Add `fma(long double,..)` to math forward declares. #73756

[CUDA][Win32] Add `fma(long double,..)` to math forward declares. #73756

fodinabor commented Nov 29, 2023 •

edited

llvmbot commented Nov 29, 2023 •

edited

Artem-B Dec 4, 2023 •

edited

emankov Dec 4, 2023

Artem-B Dec 4, 2023

emankov Dec 5, 2023

Artem-B Dec 5, 2023

emankov Dec 6, 2023

Artem-B Dec 6, 2023

blinkfrog commented Dec 6, 2023

Artem-B commented Dec 7, 2023

rnk commented Dec 8, 2023

emankov commented Jan 9, 2024

[CUDA][Win32] Add fma(long double,..) to math forward declares. #73756

Are you sure you want to change the base?

[CUDA][Win32] Add fma(long double,..) to math forward declares. #73756

Conversation

fodinabor commented Nov 29, 2023 • edited

llvmbot commented Nov 29, 2023 • edited

Artem-B Dec 4, 2023 • edited

Choose a reason for hiding this comment

emankov Dec 4, 2023

Choose a reason for hiding this comment

Artem-B Dec 4, 2023

Choose a reason for hiding this comment

emankov Dec 5, 2023

Choose a reason for hiding this comment

Artem-B Dec 5, 2023

Choose a reason for hiding this comment

emankov Dec 6, 2023

Choose a reason for hiding this comment

Artem-B Dec 6, 2023

Choose a reason for hiding this comment

blinkfrog commented Dec 6, 2023

Artem-B commented Dec 7, 2023

rnk commented Dec 8, 2023

emankov commented Jan 9, 2024

[CUDA][Win32] Add `fma(long double,..)` to math forward declares. #73756

[CUDA][Win32] Add `fma(long double,..)` to math forward declares. #73756

fodinabor commented Nov 29, 2023 •

edited

llvmbot commented Nov 29, 2023 •

edited

Artem-B Dec 4, 2023 •

edited