Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." #80303

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

vpykhtin
Copy link
Contributor

@vpykhtin vpykhtin commented Feb 1, 2024

This reverts #78429 and reapplies #71556 with added lit test constraint:

REQUIRES: amdgpu-registered-target

…ve32 mode." (llvm#78429)

This reverts commit 9791e54.

Added lit test constraint: REQUIRES: amdgpu-registered-target
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU llvm:transforms labels Feb 1, 2024
@vpykhtin vpykhtin requested a review from arsenm February 1, 2024 15:56
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 1, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Valery Pykhtin (vpykhtin)

Changes

This reverts #78429 and reapplies #71556 with added lit test constraint:

REQUIRES: amdgpu-registered-target


Full diff: https://github.com/llvm/llvm-project/pull/80303.diff

4 Files Affected:

  • (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl (+3-6)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+13)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll (+4-2)
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
index 43553131f63c5..da1ae24443155 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave32.cl
@@ -1,3 +1,4 @@
+// REQUIRES: amdgpu-registered-target
 // RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -D__AMDGCN_WAVEFRONT_SIZE=32 -target-feature +wavefrontsize32 -S -emit-llvm -o - %s | FileCheck -enable-var-scope %s
 // RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -S -emit-llvm -o - %s | FileCheck -enable-var-scope %s
 // RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -target-feature +wavefrontsize32 -S -emit-llvm -o - %s | FileCheck -enable-var-scope %s
@@ -24,13 +25,11 @@ void test_ballot_wave32_target_attr(global uint* out, int a, int b)
 }
 
 // CHECK-LABEL: @test_read_exec(
-// CHECK: call i64 @llvm.amdgcn.ballot.i64(i1 true)
+// CHECK: call i32 @llvm.amdgcn.ballot.i32(i1 true)
 void test_read_exec(global uint* out) {
   *out = __builtin_amdgcn_read_exec();
 }
 
-// CHECK: declare i64 @llvm.amdgcn.ballot.i64(i1) #[[$NOUNWIND_READONLY:[0-9]+]]
-
 // CHECK-LABEL: @test_read_exec_lo(
 // CHECK: call i32 @llvm.amdgcn.ballot.i32(i1 true)
 void test_read_exec_lo(global uint* out) {
@@ -38,9 +37,7 @@ void test_read_exec_lo(global uint* out) {
 }
 
 // CHECK-LABEL: @test_read_exec_hi(
-// CHECK: call i64 @llvm.amdgcn.ballot.i64(i1 true)
-// CHECK: lshr i64 [[A:%.*]], 32
-// CHECK: trunc i64 [[B:%.*]] to i32
+// CHECK: store i32 0, ptr addrspace(1) %out
 void test_read_exec_hi(global uint* out) {
   *out = __builtin_amdgcn_read_exec_hi();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 07997b7e1678a..5278b552a6551 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -2391,7 +2391,7 @@ void AMDGPUDAGToDAGISel::SelectBRCOND(SDNode *N) {
     auto CC = cast<CondCodeSDNode>(Cond->getOperand(2))->get();
     if ((CC == ISD::SETEQ || CC == ISD::SETNE) &&
         isNullConstant(Cond->getOperand(1)) &&
-        // TODO: make condition below an assert after fixing ballot bitwidth.
+        // We may encounter ballot.i64 in wave32 mode on -O0.
         VCMP.getValueType().getSizeInBits() == ST->getWavefrontSize()) {
       // %VCMP = i(WaveSize) AMDGPUISD::SETCC ...
       // %C = i1 ISD::SETCC %VCMP, 0, setne/seteq
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index cf1e62151046c..703b0738a3855 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -990,6 +990,19 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
         return IC.replaceInstUsesWith(II, Constant::getNullValue(II.getType()));
       }
     }
+    if (ST->isWave32() && II.getType()->getIntegerBitWidth() == 64) {
+      // %b64 = call i64 ballot.i64(...)
+      // =>
+      // %b32 = call i32 ballot.i32(...)
+      // %b64 = zext i32 %b32 to i64
+      Value *Call = IC.Builder.CreateZExt(
+          IC.Builder.CreateIntrinsic(Intrinsic::amdgcn_ballot,
+                                     {IC.Builder.getInt32Ty()},
+                                     {II.getArgOperand(0)}),
+          II.getType());
+      Call->takeName(&II);
+      return IC.replaceInstUsesWith(II, Call);
+    }
     break;
   }
   case Intrinsic::amdgcn_wqm_vote: {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
index 804283cc20cd6..94c32e3cbe99f 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
@@ -2599,7 +2599,8 @@ declare i32 @llvm.amdgcn.ballot.i32(i1) nounwind readnone convergent
 
 define i64 @ballot_nocombine_64(i1 %i) {
 ; CHECK-LABEL: @ballot_nocombine_64(
-; CHECK-NEXT:    [[B:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 [[I:%.*]])
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.amdgcn.ballot.i32(i1 [[I:%.*]])
+; CHECK-NEXT:    [[B:%.*]] = zext i32 [[TMP1]] to i64
 ; CHECK-NEXT:    ret i64 [[B]]
 ;
   %b = call i64 @llvm.amdgcn.ballot.i64(i1 %i)
@@ -2616,7 +2617,8 @@ define i64 @ballot_zero_64() {
 
 define i64 @ballot_one_64() {
 ; CHECK-LABEL: @ballot_one_64(
-; CHECK-NEXT:    [[B:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.amdgcn.ballot.i32(i1 true)
+; CHECK-NEXT:    [[B:%.*]] = zext i32 [[TMP1]] to i64
 ; CHECK-NEXT:    ret i64 [[B]]
 ;
   %b = call i64 @llvm.amdgcn.ballot.i64(i1 1)

@vpykhtin vpykhtin merged commit b8025d1 into llvm:main Feb 2, 2024
7 checks passed
@vpykhtin vpykhtin deleted the reapply_ballot_width branch February 2, 2024 12:09
agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
…ve32 mode." (llvm#80303)

Reapply llvm#71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.

This reverts commit 9791e54.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request May 1, 2024
…ve32 mode." (llvm#80303)

Reapply llvm#71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.

This reverts commit 9791e54.

(cherry picked from commit b8025d1)
Change-Id: I03aafda08ca433456f6e82accd6b702a307bfd0b
rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request Jun 4, 2024
…ve32 mode." (llvm#80303)

Reapply llvm#71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.

This reverts commit 9791e54.

(cherry picked from commit b8025d1)
Change-Id: I03aafda08ca433456f6e82accd6b702a307bfd0b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants