[AMDGPU] Allow w64 ballot to be used on w32 targets #80183

jhuber6 · 2024-01-31T19:20:57Z

Summary:
Currently we cannot compile __builtin_amdgcn_ballot_w64 on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.

llvmbot · 2024-01-31T19:21:25Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently we cannot compile __builtin_amdgcn_ballot_w64 on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.

Full diff: https://github.com/llvm/llvm-project/pull/80183.diff

2 Files Affected:

(modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1-1)
(modified) clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl (+2-5)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 74dfd1d214e84..a32238381312c 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
 //===----------------------------------------------------------------------===//
 
 TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc", "wavefrontsize64")
+BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc")
 
 // Deprecated intrinsics in favor of __builtin_amdgn_ballot_{w32|w64}
 BUILTIN(__builtin_amdgcn_uicmp, "WUiUiUiIi", "nc")
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
index 99e93acd9a213..99aa2d634b2a0 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
@@ -4,13 +4,10 @@
 // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s
 // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s
 
+// expected-no-diagnostics
+
 typedef unsigned long ulong;
 
 void test_ballot_wave64(global ulong* out, int a, int b) {
-  *out = __builtin_amdgcn_ballot_w64(a == b);  // expected-error {{'__builtin_amdgcn_ballot_w64' needs target feature wavefrontsize64}}
-}
-
-__attribute__((target("wavefrontsize64")))
-void test_ballot_wave64_target_attr(global ulong* out, int a, int b) {
   *out = __builtin_amdgcn_ballot_w64(a == b);
 }

yxsamliu · 2024-01-31T19:39:31Z

clang/include/clang/Basic/BuiltinsAMDGPU.def

@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
 //===----------------------------------------------------------------------===//

 TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc", "wavefrontsize64")
+BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc")


I am not sure whether this will cause the intrinsic being removed by backend if the required target feature is missing. I remember some pass is added for that.

Also removing the check increases chance of miss-use in HIP or OpenCL.

device libs bypass this requirement by adding __attribute__((target("wavefrontsize64"))) to the callers. can you do the same?

The difference is simply on the return value from the intrinsic. It's always legal to do type promotion, so on a system with a 32 wide wavefront it will just get promoted to a 64-bit value which will be correct.

The ROCm-Device-Libs can only do this because they have the __oclc_wavefrontsize64 variable, which I don't want to copy.

Just for reference, the main difference is whether or not we read exec_hi, here's some code https://godbolt.org/z/z64Yoo7ve. I believe @arsenm said that exec_hi is simply zero when read from wave32 mode? I could double check that.

The problem with using the attributes is that it puts the attribute into the function, which prevents it from being called in wave32 correctly. This would require some compile-time switch like the ROCm libs, or perhaps a runtime check on the wave size.

Did some basic checks and it works as expected when called from w32 code.

The target attribute is just a giant footgun for AMDGPU. This one only works in the library with special handholding by the compiler

jayfoad · 2024-02-01T08:58:33Z

After this change is there any value in having two different builtins? You could just have one that always return 64 bits.

arsenm · 2024-02-01T09:00:37Z

clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl

-__attribute__((target("wavefrontsize64")))
-void test_ballot_wave64_target_attr(global ulong* out, int a, int b) {


Should keep the target_attr test

If I kept the attribute it would result in

(frontend): invalid feature combination: 'wavefrontsize32' and 'wavefrontsize64' are mutually exclusive

I just got rid of the +w32 run lines.

jhuber6 · 2024-02-01T13:19:43Z

After this change is there any value in having two different builtins? You could just have one that always return 64 bits.

I personally think it would be better to just have the one, but I figured that decision was made earlier and it would break backwards compatibility.

Summary: Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64 targets even though it is valid. This is relevant for making library code that can handle both without needing to check the wavefront size. This patch relaxes the semantic check for w64 so it can be used normally.

Summary: Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64 targets even though it is valid. This is relevant for making library code that can handle both without needing to check the wavefront size. This patch relaxes the semantic check for w64 so it can be used normally. (cherry picked from commit 5249379) Change-Id: Ief434d2bc0928a5bdc3149651357b94706611f09

jhuber6 requested review from arsenm, b-sumner, jayfoad and yxsamliu January 31, 2024 19:20

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 31, 2024

yxsamliu reviewed Jan 31, 2024

View reviewed changes

jhuber6 requested a review from shiltian January 31, 2024 19:44

arsenm reviewed Feb 1, 2024

View reviewed changes

jhuber6 force-pushed the BallotW64 branch from 5153ba3 to 26b75cd Compare February 1, 2024 13:47

arsenm approved these changes Feb 5, 2024

View reviewed changes

jhuber6 merged commit 5249379 into llvm:main Feb 5, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Allow w64 ballot to be used on w32 targets #80183

[AMDGPU] Allow w64 ballot to be used on w32 targets #80183

jhuber6 commented Jan 31, 2024

llvmbot commented Jan 31, 2024 •

edited

yxsamliu Jan 31, 2024

jhuber6 Jan 31, 2024

jhuber6 Jan 31, 2024

jhuber6 Jan 31, 2024

arsenm Feb 1, 2024

jayfoad commented Feb 1, 2024

arsenm Feb 1, 2024

jhuber6 Feb 1, 2024

jhuber6 Feb 1, 2024

jhuber6 commented Feb 1, 2024

		__attribute__((target("wavefrontsize64")))
		void test_ballot_wave64_target_attr(global ulong* out, int a, int b) {

[AMDGPU] Allow w64 ballot to be used on w32 targets #80183

[AMDGPU] Allow w64 ballot to be used on w32 targets #80183

Conversation

jhuber6 commented Jan 31, 2024

llvmbot commented Jan 31, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayfoad commented Feb 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhuber6 commented Feb 1, 2024

llvmbot commented Jan 31, 2024 •

edited