-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Allow w64 ballot to be used on w32 targets #80183
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/80183.diff 2 Files Affected:
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 74dfd1d214e84..a32238381312c 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
//===----------------------------------------------------------------------===//
TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc", "wavefrontsize64")
+BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc")
// Deprecated intrinsics in favor of __builtin_amdgn_ballot_{w32|w64}
BUILTIN(__builtin_amdgcn_uicmp, "WUiUiUiIi", "nc")
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
index 99e93acd9a213..99aa2d634b2a0 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
@@ -4,13 +4,10 @@
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s
+// expected-no-diagnostics
+
typedef unsigned long ulong;
void test_ballot_wave64(global ulong* out, int a, int b) {
- *out = __builtin_amdgcn_ballot_w64(a == b); // expected-error {{'__builtin_amdgcn_ballot_w64' needs target feature wavefrontsize64}}
-}
-
-__attribute__((target("wavefrontsize64")))
-void test_ballot_wave64_target_attr(global ulong* out, int a, int b) {
*out = __builtin_amdgcn_ballot_w64(a == b);
}
|
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") | |||
//===----------------------------------------------------------------------===// | |||
|
|||
TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32") | |||
TARGET_BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc", "wavefrontsize64") | |||
BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether this will cause the intrinsic being removed by backend if the required target feature is missing. I remember some pass is added for that.
Also removing the check increases chance of miss-use in HIP or OpenCL.
device libs bypass this requirement by adding __attribute__((target("wavefrontsize64")))
to the callers. can you do the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is simply on the return value from the intrinsic. It's always legal to do type promotion, so on a system with a 32 wide wavefront it will just get promoted to a 64-bit value which will be correct.
The ROCm-Device-Libs can only do this because they have the __oclc_wavefrontsize64
variable, which I don't want to copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for reference, the main difference is whether or not we read exec_hi
, here's some code https://godbolt.org/z/z64Yoo7ve. I believe @arsenm said that exec_hi
is simply zero when read from wave32 mode? I could double check that.
The problem with using the attributes is that it puts the attribute into the function, which prevents it from being called in wave32
correctly. This would require some compile-time switch like the ROCm libs, or perhaps a runtime check on the wave size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some basic checks and it works as expected when called from w32 code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target attribute is just a giant footgun for AMDGPU. This one only works in the library with special handholding by the compiler
After this change is there any value in having two different builtins? You could just have one that always return 64 bits. |
__attribute__((target("wavefrontsize64"))) | ||
void test_ballot_wave64_target_attr(global ulong* out, int a, int b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should keep the target_attr test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I kept the attribute it would result in
(frontend): invalid feature combination: 'wavefrontsize32' and 'wavefrontsize64' are mutually exclusive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just got rid of the +w32
run lines.
I personally think it would be better to just have the one, but I figured that decision was made earlier and it would break backwards compatibility. |
Summary: Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64 targets even though it is valid. This is relevant for making library code that can handle both without needing to check the wavefront size. This patch relaxes the semantic check for w64 so it can be used normally.
Summary: Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64 targets even though it is valid. This is relevant for making library code that can handle both without needing to check the wavefront size. This patch relaxes the semantic check for w64 so it can be used normally.
Summary: Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64 targets even though it is valid. This is relevant for making library code that can handle both without needing to check the wavefront size. This patch relaxes the semantic check for w64 so it can be used normally. (cherry picked from commit 5249379) Change-Id: Ief434d2bc0928a5bdc3149651357b94706611f09
Summary: Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64 targets even though it is valid. This is relevant for making library code that can handle both without needing to check the wavefront size. This patch relaxes the semantic check for w64 so it can be used normally. (cherry picked from commit 5249379) Change-Id: Ief434d2bc0928a5bdc3149651357b94706611f09
Summary:
Currently we cannot compile
__builtin_amdgcn_ballot_w64
on non-wave64targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.