-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." #80303
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ve32 mode." (llvm#78429) This reverts commit 9791e54. Added lit test constraint: REQUIRES: amdgpu-registered-target
llvmbot
added
clang
Clang issues not falling into any other category
backend:AMDGPU
llvm:transforms
labels
Feb 1, 2024
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-amdgpu Author: Valery Pykhtin (vpykhtin) ChangesThis reverts #78429 and reapplies #71556 with added lit test constraint:
Full diff: https://github.com/llvm/llvm-project/pull/80303.diff 4 Files Affected:
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
index 43553131f63c5..da1ae24443155 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
@@ -1,3 +1,4 @@
+// REQUIRES: amdgpu-registered-target
// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -D__AMDGCN_WAVEFRONT_SIZE=32 -target-feature +wavefrontsize32 -S -emit-llvm -o - %s | FileCheck -enable-var-scope %s
// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -S -emit-llvm -o - %s | FileCheck -enable-var-scope %s
// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -target-feature +wavefrontsize32 -S -emit-llvm -o - %s | FileCheck -enable-var-scope %s
@@ -24,13 +25,11 @@ void test_ballot_wave32_target_attr(global uint* out, int a, int b)
}
// CHECK-LABEL: @test_read_exec(
-// CHECK: call i64 @llvm.amdgcn.ballot.i64(i1 true)
+// CHECK: call i32 @llvm.amdgcn.ballot.i32(i1 true)
void test_read_exec(global uint* out) {
*out = __builtin_amdgcn_read_exec();
}
-// CHECK: declare i64 @llvm.amdgcn.ballot.i64(i1) #[[$NOUNWIND_READONLY:[0-9]+]]
-
// CHECK-LABEL: @test_read_exec_lo(
// CHECK: call i32 @llvm.amdgcn.ballot.i32(i1 true)
void test_read_exec_lo(global uint* out) {
@@ -38,9 +37,7 @@ void test_read_exec_lo(global uint* out) {
}
// CHECK-LABEL: @test_read_exec_hi(
-// CHECK: call i64 @llvm.amdgcn.ballot.i64(i1 true)
-// CHECK: lshr i64 [[A:%.*]], 32
-// CHECK: trunc i64 [[B:%.*]] to i32
+// CHECK: store i32 0, ptr addrspace(1) %out
void test_read_exec_hi(global uint* out) {
*out = __builtin_amdgcn_read_exec_hi();
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 07997b7e1678a..5278b552a6551 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -2391,7 +2391,7 @@ void AMDGPUDAGToDAGISel::SelectBRCOND(SDNode *N) {
auto CC = cast<CondCodeSDNode>(Cond->getOperand(2))->get();
if ((CC == ISD::SETEQ || CC == ISD::SETNE) &&
isNullConstant(Cond->getOperand(1)) &&
- // TODO: make condition below an assert after fixing ballot bitwidth.
+ // We may encounter ballot.i64 in wave32 mode on -O0.
VCMP.getValueType().getSizeInBits() == ST->getWavefrontSize()) {
// %VCMP = i(WaveSize) AMDGPUISD::SETCC ...
// %C = i1 ISD::SETCC %VCMP, 0, setne/seteq
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index cf1e62151046c..703b0738a3855 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -990,6 +990,19 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
return IC.replaceInstUsesWith(II, Constant::getNullValue(II.getType()));
}
}
+ if (ST->isWave32() && II.getType()->getIntegerBitWidth() == 64) {
+ // %b64 = call i64 ballot.i64(...)
+ // =>
+ // %b32 = call i32 ballot.i32(...)
+ // %b64 = zext i32 %b32 to i64
+ Value *Call = IC.Builder.CreateZExt(
+ IC.Builder.CreateIntrinsic(Intrinsic::amdgcn_ballot,
+ {IC.Builder.getInt32Ty()},
+ {II.getArgOperand(0)}),
+ II.getType());
+ Call->takeName(&II);
+ return IC.replaceInstUsesWith(II, Call);
+ }
break;
}
case Intrinsic::amdgcn_wqm_vote: {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
index 804283cc20cd6..94c32e3cbe99f 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
@@ -2599,7 +2599,8 @@ declare i32 @llvm.amdgcn.ballot.i32(i1) nounwind readnone convergent
define i64 @ballot_nocombine_64(i1 %i) {
; CHECK-LABEL: @ballot_nocombine_64(
-; CHECK-NEXT: [[B:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 [[I:%.*]])
+; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.ballot.i32(i1 [[I:%.*]])
+; CHECK-NEXT: [[B:%.*]] = zext i32 [[TMP1]] to i64
; CHECK-NEXT: ret i64 [[B]]
;
%b = call i64 @llvm.amdgcn.ballot.i64(i1 %i)
@@ -2616,7 +2617,8 @@ define i64 @ballot_zero_64() {
define i64 @ballot_one_64() {
; CHECK-LABEL: @ballot_one_64(
-; CHECK-NEXT: [[B:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
+; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.ballot.i32(i1 true)
+; CHECK-NEXT: [[B:%.*]] = zext i32 [[TMP1]] to i64
; CHECK-NEXT: ret i64 [[B]]
;
%b = call i64 @llvm.amdgcn.ballot.i64(i1 1)
|
arsenm
approved these changes
Feb 2, 2024
agozillon
pushed a commit
to agozillon/llvm-project
that referenced
this pull request
Feb 5, 2024
…ve32 mode." (llvm#80303) Reapply llvm#71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e54.
searlmc1
pushed a commit
to ROCm/llvm-project
that referenced
this pull request
May 1, 2024
…ve32 mode." (llvm#80303) Reapply llvm#71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e54. (cherry picked from commit b8025d1) Change-Id: I03aafda08ca433456f6e82accd6b702a307bfd0b
rocm-ci
pushed a commit
to ROCm/llvm-project
that referenced
this pull request
Jun 4, 2024
…ve32 mode." (llvm#80303) Reapply llvm#71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e54. (cherry picked from commit b8025d1) Change-Id: I03aafda08ca433456f6e82accd6b702a307bfd0b
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This reverts #78429 and reapplies #71556 with added lit test constraint:
REQUIRES: amdgpu-registered-target