-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA][Win32] Add fma(long double,..)
to math forward declares.
#73756
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-x86 Author: None (fodinabor) ChangesAs per AdaptiveCpp/AdaptiveCpp#1256 - we are missing the See also #49853. CC @blinkfrog Full diff: https://github.com/llvm/llvm-project/pull/73756.diff 1 Files Affected:
diff --git a/clang/lib/Headers/__clang_cuda_math_forward_declares.h b/clang/lib/Headers/__clang_cuda_math_forward_declares.h
index c0f1f47cc99302a..2d1bdd0f9bb0f41 100644
--- a/clang/lib/Headers/__clang_cuda_math_forward_declares.h
+++ b/clang/lib/Headers/__clang_cuda_math_forward_declares.h
@@ -70,6 +70,9 @@ __DEVICE__ double floor(double);
__DEVICE__ float floor(float);
__DEVICE__ double fma(double, double, double);
__DEVICE__ float fma(float, float, float);
+#ifdef _MSC_VER
+__DEVICE__ long double fma(long double, long double, long double);
+#endif
__DEVICE__ double fmax(double, double);
__DEVICE__ float fmax(float, float);
__DEVICE__ double fmin(double, double);
|
@@ -70,6 +70,9 @@ __DEVICE__ double floor(double); | |||
__DEVICE__ float floor(float); | |||
__DEVICE__ double fma(double, double, double); | |||
__DEVICE__ float fma(float, float, float); | |||
#ifdef _MSC_VER | |||
__DEVICE__ long double fma(long double, long double, long double); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA does not provide GPU-side fma
for long double
.
This declaration probably warrants a comment about this, because whoever ends up using it on a GPU will be surprised when their code compiles, but fails somewhere in ptxas, which will look like a compiler crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, CUDA has provided long double fma
since 10.2 but for HOST
only:
inline _LIBCUDACXX_INLINE_VISIBILITY long double fma(long double __lcpp_x, long double __lcpp_y, long double __lcpp_z) _NOEXCEPT {return ::fmal(__lcpp_x, __lcpp_y, __lcpp_z);}
If a declaration of device long double fma
is needed in order to parse it only and with disallowing its usage on GPU, then it is better to add it to clang/lib/Headers/cuda_wrappers/cmath
under _LIBCPP_STD_VER
and _LIBCPP_HIDE_FROM_ABI
; and it would need a corresponding GPU
test, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a handful of MSVC-specific no-implementation functions declared here, so one more is OK.
I just want to document it better.
You may want to add a macro with a descriptive name. (E.g. CUDA_ALLOW_LONG_DOUBLE_MATH_FUNCTION_DECLS
) and have it defined for MSVC, along with the comment why it's needed. Then replace the existing #if _MSC_VER
with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have. But how do we prevent them from running on GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there's no implementation for these functions, the reference will remain unresolved and GPU-side executable linking will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is it, it is only a compile-time story, and the situation when the successfully compiled code "fails somewhere in ptxas" on runtime would never happen till CUDA implementation of long double fma
appears, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. If someone ends up using fma(long double) in the GPU-side code, we will fail at build time, until the implementation is actually available. Failure in ptxas is not the best way to diagnose it, but it's better than failing to compile valid code because of a CUDA compatibility quirk in a standard library.
I would like to share some thoughts regarding the proposed fix involving the #ifdef _MSC_VER check in the LLVM PR for the fma function ambiguity issue. I am sorry if I say something stupid. In the typical workflow with AdaptiveCpp and CUDA backend on Windows, MSVC is used only for compiling LLVM and Clang. However, the actual compilation of SYCL programs targeting the CUDA backend is performed using Clang with the AdaptiveCpp plugin. Given this workflow, the #ifdef _MSC_VER check may not be effective because _MSC_VER is specific to the MSVC compiler and is not defined when using Clang, even on Windows. Consequently, this check might not activate during our compilation process, and the ambiguity issue with the A more suitable approach might be to use a different check that can effectively detect the Windows platform in the Clang environment, such as _WIN32, ensuring that the fix is applied in the relevant scenarios. Probably, better would be to check I wanted to bring this to your attention to ensure that the proposed solution addresses the issue in the context of workflows similar to ours. Thank you for considering this point. |
I'm not familiar enough with MSVC. @rnk -- what's the best way to check for compilation with microsoft's stardard C++ library? |
If Clang is compiling with the MSVC STL headers, it should be defining |
Hello, @fodinabor and @blinkfrog
Does it mean that this fix doesn't solve AdaptiveCpp/AdaptiveCpp#1256? Or, is blinkfrog saying above about yet other cases? |
As per AdaptiveCpp/AdaptiveCpp#1256 - we are missing the
fma
long double variant for Cpp20 compat with MS-STL.See also #49853.
-> I'm wondering if we should include this for all long double variants as defined in https://github.com/microsoft/STL/blob/730af178f7c79bd75d414627f969550775e14994/stl/inc/cmath#L291 ?
CC @blinkfrog