Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA][Win32] Add fma(long double,..) to math forward declares. #73756

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fodinabor
Copy link
Contributor

@fodinabor fodinabor commented Nov 29, 2023

As per AdaptiveCpp/AdaptiveCpp#1256 - we are missing the fma long double variant for Cpp20 compat with MS-STL.

See also #49853.
-> I'm wondering if we should include this for all long double variants as defined in https://github.com/microsoft/STL/blob/730af178f7c79bd75d414627f969550775e14994/stl/inc/cmath#L291 ?

CC @blinkfrog

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Nov 29, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 29, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: None (fodinabor)

Changes

As per AdaptiveCpp/AdaptiveCpp#1256 - we are missing the fma long double variant for Cpp20 compact with MS-STL.

See also #49853.
-> I'm wondering if we should include this for all long double variants as defined in https://github.com/microsoft/STL/blob/730af178f7c79bd75d414627f969550775e14994/stl/inc/cmath#L291 ?

CC @blinkfrog


Full diff: https://github.com/llvm/llvm-project/pull/73756.diff

1 Files Affected:

  • (modified) clang/lib/Headers/__clang_cuda_math_forward_declares.h (+3)
diff --git a/clang/lib/Headers/__clang_cuda_math_forward_declares.h b/clang/lib/Headers/__clang_cuda_math_forward_declares.h
index c0f1f47cc99302a..2d1bdd0f9bb0f41 100644
--- a/clang/lib/Headers/__clang_cuda_math_forward_declares.h
+++ b/clang/lib/Headers/__clang_cuda_math_forward_declares.h
@@ -70,6 +70,9 @@ __DEVICE__ double floor(double);
 __DEVICE__ float floor(float);
 __DEVICE__ double fma(double, double, double);
 __DEVICE__ float fma(float, float, float);
+#ifdef _MSC_VER
+__DEVICE__ long double fma(long double, long double, long double);
+#endif
 __DEVICE__ double fmax(double, double);
 __DEVICE__ float fmax(float, float);
 __DEVICE__ double fmin(double, double);

@@ -70,6 +70,9 @@ __DEVICE__ double floor(double);
__DEVICE__ float floor(float);
__DEVICE__ double fma(double, double, double);
__DEVICE__ float fma(float, float, float);
#ifdef _MSC_VER
__DEVICE__ long double fma(long double, long double, long double);
Copy link
Member

@Artem-B Artem-B Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA does not provide GPU-side fma for long double.

This declaration probably warrants a comment about this, because whoever ends up using it on a GPU will be surprised when their code compiles, but fails somewhere in ptxas, which will look like a compiler crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, CUDA has provided long double fma since 10.2 but for HOST only:

inline _LIBCUDACXX_INLINE_VISIBILITY long double fma(long double __lcpp_x, long double __lcpp_y, long double __lcpp_z) _NOEXCEPT {return ::fmal(__lcpp_x, __lcpp_y, __lcpp_z);}

If a declaration of device long double fma is needed in order to parse it only and with disallowing its usage on GPU, then it is better to add it to clang/lib/Headers/cuda_wrappers/cmath under _LIBCPP_STD_VER and _LIBCPP_HIDE_FROM_ABI; and it would need a corresponding GPU test, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a handful of MSVC-specific no-implementation functions declared here, so one more is OK.
I just want to document it better.

You may want to add a macro with a descriptive name. (E.g. CUDA_ALLOW_LONG_DOUBLE_MATH_FUNCTION_DECLS) and have it defined for MSVC, along with the comment why it's needed. Then replace the existing #if _MSC_VER with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have. But how do we prevent them from running on GPU?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there's no implementation for these functions, the reference will remain unresolved and GPU-side executable linking will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is it, it is only a compile-time story, and the situation when the successfully compiled code "fails somewhere in ptxas" on runtime would never happen till CUDA implementation of long double fma appears, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. If someone ends up using fma(long double) in the GPU-side code, we will fail at build time, until the implementation is actually available. Failure in ptxas is not the best way to diagnose it, but it's better than failing to compile valid code because of a CUDA compatibility quirk in a standard library.

@blinkfrog
Copy link

I would like to share some thoughts regarding the proposed fix involving the #ifdef _MSC_VER check in the LLVM PR for the fma function ambiguity issue. I am sorry if I say something stupid.

In the typical workflow with AdaptiveCpp and CUDA backend on Windows, MSVC is used only for compiling LLVM and Clang. However, the actual compilation of SYCL programs targeting the CUDA backend is performed using Clang with the AdaptiveCpp plugin.

Given this workflow, the #ifdef _MSC_VER check may not be effective because _MSC_VER is specific to the MSVC compiler and is not defined when using Clang, even on Windows. Consequently, this check might not activate during our compilation process, and the ambiguity issue with the fma function may persist.

A more suitable approach might be to use a different check that can effectively detect the Windows platform in the Clang environment, such as _WIN32, ensuring that the fix is applied in the relevant scenarios. Probably, better would be to check _MSVC_STL_VERSION , but this probably won't work, as standard libraries, where it is defined, are included after LLVM headers I think.

I wanted to bring this to your attention to ensure that the proposed solution addresses the issue in the context of workflows similar to ours.

Thank you for considering this point.

@Artem-B
Copy link
Member

Artem-B commented Dec 7, 2023

I'm not familiar enough with MSVC.

@rnk -- what's the best way to check for compilation with microsoft's stardard C++ library?

@rnk
Copy link
Collaborator

rnk commented Dec 8, 2023

@rnk -- what's the best way to check for compilation with microsoft's stardard C++ library?

If Clang is compiling with the MSVC STL headers, it should be defining _MSC_VER, usually -fms-compatibilty has to be enabled to compile MSVC STL headers. However, I searched, and it looks like the MSVC STL defines this macro, _MSVC_STL_VERSION . I don't know when they started defining that, but it has the right meaning.

@emankov
Copy link
Contributor

emankov commented Jan 9, 2024

Hello, @fodinabor and @blinkfrog

the #ifdef _MSC_VER check may not be effective because _MSC_VER is specific to the MSVC compiler and is not defined when using Clang, even on Windows

Does it mean that this fix doesn't solve AdaptiveCpp/AdaptiveCpp#1256? Or, is blinkfrog saying above about yet other cases?

@aaronenyeshi aaronenyeshi removed their request for review February 15, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants