Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Allow w64 ballot to be used on w32 targets #80183

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jan 31, 2024

Summary:
Currently we cannot compile __builtin_amdgcn_ballot_w64 on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 31, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 31, 2024

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently we cannot compile __builtin_amdgcn_ballot_w64 on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.


Full diff: https://github.com/llvm/llvm-project/pull/80183.diff

2 Files Affected:

  • (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1-1)
  • (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl (+2-5)
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 74dfd1d214e84..a32238381312c 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
 //===----------------------------------------------------------------------===//
 
 TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc", "wavefrontsize64")
+BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc")
 
 // Deprecated intrinsics in favor of __builtin_amdgn_ballot_{w32|w64}
 BUILTIN(__builtin_amdgcn_uicmp, "WUiUiUiIi", "nc")
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
index 99e93acd9a213..99aa2d634b2a0 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
@@ -4,13 +4,10 @@
 // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s
 // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s
 
+// expected-no-diagnostics
+
 typedef unsigned long ulong;
 
 void test_ballot_wave64(global ulong* out, int a, int b) {
-  *out = __builtin_amdgcn_ballot_w64(a == b);  // expected-error {{'__builtin_amdgcn_ballot_w64' needs target feature wavefrontsize64}}
-}
-
-__attribute__((target("wavefrontsize64")))
-void test_ballot_wave64_target_attr(global ulong* out, int a, int b) {
   *out = __builtin_amdgcn_ballot_w64(a == b);
 }

@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
//===----------------------------------------------------------------------===//

TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
TARGET_BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc", "wavefrontsize64")
BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this will cause the intrinsic being removed by backend if the required target feature is missing. I remember some pass is added for that.

Also removing the check increases chance of miss-use in HIP or OpenCL.

device libs bypass this requirement by adding __attribute__((target("wavefrontsize64"))) to the callers. can you do the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is simply on the return value from the intrinsic. It's always legal to do type promotion, so on a system with a 32 wide wavefront it will just get promoted to a 64-bit value which will be correct.

The ROCm-Device-Libs can only do this because they have the __oclc_wavefrontsize64 variable, which I don't want to copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, the main difference is whether or not we read exec_hi, here's some code https://godbolt.org/z/z64Yoo7ve. I believe @arsenm said that exec_hi is simply zero when read from wave32 mode? I could double check that.

The problem with using the attributes is that it puts the attribute into the function, which prevents it from being called in wave32 correctly. This would require some compile-time switch like the ROCm libs, or perhaps a runtime check on the wave size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some basic checks and it works as expected when called from w32 code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target attribute is just a giant footgun for AMDGPU. This one only works in the library with special handholding by the compiler

@jhuber6 jhuber6 requested a review from shiltian January 31, 2024 19:44
@jayfoad
Copy link
Contributor

jayfoad commented Feb 1, 2024

After this change is there any value in having two different builtins? You could just have one that always return 64 bits.

Comment on lines 13 to 14
__attribute__((target("wavefrontsize64")))
void test_ballot_wave64_target_attr(global ulong* out, int a, int b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keep the target_attr test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I kept the attribute it would result in

(frontend): invalid feature combination: 'wavefrontsize32' and 'wavefrontsize64' are mutually exclusive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got rid of the +w32 run lines.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Feb 1, 2024

After this change is there any value in having two different builtins? You could just have one that always return 64 bits.

I personally think it would be better to just have the one, but I figured that decision was made earlier and it would break backwards compatibility.

Summary:
Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.
@jhuber6 jhuber6 merged commit 5249379 into llvm:main Feb 5, 2024
3 of 4 checks passed
agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
Summary:
Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request May 1, 2024
Summary:
Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.

(cherry picked from commit 5249379)
Change-Id: Ief434d2bc0928a5bdc3149651357b94706611f09
rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request Jun 4, 2024
Summary:
Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.

(cherry picked from commit 5249379)
Change-Id: Ief434d2bc0928a5bdc3149651357b94706611f09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants