[AMDGPU] Support bfloat comparison for ballot intrinsic #165495

changpeng · 2025-10-29T00:19:46Z

We do not have native instructions for direct bfloat comparisons.
However, we can expand bfloat to float, and do float comparison instead.

TODO: handle bfloat comparison for ballot intrinsic on global isel path.

Fixes: SWDEV-563403

We do not have native instructions for direct bfloat comparisons. However, we can expand bfloat to float, and do float coparison instead. TODO: handle bfloat comparison for ballot intrinsic on global isel path. Fixes: SWDEV-563403

llvmbot · 2025-10-29T00:20:17Z

@llvm/pr-subscribers-backend-amdgpu

Author: Changpeng Fang (changpeng)

Changes

We do not have native instructions for direct bfloat comparisons.
However, we can expand bfloat to float, and do float coparison instead.

TODO: handle bfloat comparison for ballot intrinsic on global isel path.

Fixes: SWDEV-563403

Full diff: https://github.com/llvm/llvm-project/pull/165495.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+8-2)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll (+21)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll (+12)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index be4229155c983..2c90ec2088e45 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7035,9 +7035,15 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering &TLI, SDNode *N,
   SDLoc SL(N);
 
   if (Src.getOpcode() == ISD::SETCC) {
+    SDValue Op0 = Src.getOperand(0);
+    SDValue Op1 = Src.getOperand(1);
+    // Need to expand bflat to float for comparison (setcc).
+    if (Op0.getValueType() == MVT::bf16) {
+      Op0 = DAG.getNode(ISD::FP_EXTEND, SL, MVT::f32, Op0);
+      Op1 = DAG.getNode(ISD::FP_EXTEND, SL, MVT::f32, Op1);
+    }
     // (ballot (ISD::SETCC ...)) -> (AMDGPUISD::SETCC ...)
-    return DAG.getNode(AMDGPUISD::SETCC, SL, VT, Src.getOperand(0),
-                       Src.getOperand(1), Src.getOperand(2));
+    return DAG.getNode(AMDGPUISD::SETCC, SL, VT, Op0, Op1, Src.getOperand(2));
   }
   if (const ConstantSDNode *Arg = dyn_cast<ConstantSDNode>(Src)) {
     // (ballot 0) -> 0
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
index e00e1f13b2b77..9940ea70d3467 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll
@@ -591,3 +591,24 @@ exit:
   store i32 %ballot, ptr addrspace(1) %out
   ret void
 }
+
+define amdgpu_cs i32 @compare_bfloats(bfloat %x, bfloat %y) {
+; GFX10-LABEL: compare_bfloats:
+; GFX10:       ; %bb.0:
+; GFX10-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
+; GFX10-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
+; GFX10-NEXT:    v_cmp_gt_f32_e64 s0, v0, v1
+; GFX10-NEXT:    ; return to shader part epilog
+;
+; GFX11-LABEL: compare_bfloats:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    v_mov_b16_e32 v2.l, 0
+; GFX11-NEXT:    v_mov_b16_e32 v2.h, v1.l
+; GFX11-NEXT:    v_mov_b16_e32 v1.h, v0.l
+; GFX11-NEXT:    v_mov_b16_e32 v1.l, v2.l
+; GFX11-NEXT:    v_cmp_gt_f32_e64 s0, v1, v2
+; GFX11-NEXT:    ; return to shader part epilog
+  %cmp = fcmp ogt bfloat %x, %y
+  %ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
+  ret i32 %ballot
+}
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll
index b4adf7f641550..1720a62eb6367 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll
@@ -557,3 +557,15 @@ exit:
   store i64 %ballot, ptr addrspace(1) %out
   ret void
 }
+
+define amdgpu_cs i64 @compare_bfloats(bfloat %x, bfloat %y) {
+; CHECK-LABEL: compare_bfloats:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
+; CHECK-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
+; CHECK-NEXT:    v_cmp_gt_f32_e64 s[0:1], v0, v1
+; CHECK-NEXT:    ; return to shader part epilog
+  %cmp = fcmp ogt bfloat %x, %y
+  %ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
+  ret i64 %ballot
+}

jayfoad · 2025-10-29T10:00:48Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

  if (Src.getOpcode() == ISD::SETCC) {
+    SDValue Op0 = Src.getOperand(0);
+    SDValue Op1 = Src.getOperand(1);
+    // Need to expand bfloat to float for comparison (setcc).


Surely generic legalization of ISD::SETCC should already promote bf16 to f32? And if that doesn't work because the ISD::SETCC hasn't been legalized yet, can't we just use the same generic machinery to promote bf16 AMDGPUISD::SETCC to f32?

Yes, legalization of ISD::SETCC correctly promotes bf16 to f32. But apparently the ballot intrinsic is lowered to AMDGPUISD::SETCC here, so we have to promote bf16 to f32. What is the "generic machinery to promote bf16"?

Look at the "lowerFCMPIntrinsic" above this function, a similar approach was sued to promote f16 to f32 when f16 is not legal.

rampitec

LGTM

arsenm

Description is not really accurate for what this actually does

We do not have native instructions for direct bfloat comparisons. However, we can expand bfloat to float, and do float comparison instead. TODO: handle bfloat comparison for ballot intrinsic on global isel path. Fixes: SWDEV-563403

[AMDGPU] Support bfloat comparison for ballot intrinsic

76e4afb

We do not have native instructions for direct bfloat comparisons. However, we can expand bfloat to float, and do float coparison instead. TODO: handle bfloat comparison for ballot intrinsic on global isel path. Fixes: SWDEV-563403

llvmbot added the backend:AMDGPU label Oct 29, 2025

changpeng requested a review from rampitec October 29, 2025 00:20

changpeng requested review from arsenm, bcahoon and shiltian October 29, 2025 00:20

Fix a typo

a84e5e8

jayfoad reviewed Oct 29, 2025

View reviewed changes

rampitec approved these changes Oct 29, 2025

View reviewed changes

changpeng merged commit 6b5afdc into llvm:main Oct 30, 2025
10 checks passed

changpeng deleted the ballot branch October 30, 2025 16:44

arsenm reviewed Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Support bfloat comparison for ballot intrinsic #165495

[AMDGPU] Support bfloat comparison for ballot intrinsic #165495

Uh oh!

changpeng commented Oct 29, 2025 •

edited by jayfoad

Loading

Uh oh!

llvmbot commented Oct 29, 2025

Uh oh!

jayfoad Oct 29, 2025

Uh oh!

changpeng Oct 29, 2025

Uh oh!

rampitec left a comment

Uh oh!

Uh oh!

arsenm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AMDGPU] Support bfloat comparison for ballot intrinsic #165495

[AMDGPU] Support bfloat comparison for ballot intrinsic #165495

Uh oh!

Conversation

changpeng commented Oct 29, 2025 • edited by jayfoad Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 29, 2025

Uh oh!

jayfoad Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

changpeng Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

rampitec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

changpeng commented Oct 29, 2025 •

edited by jayfoad

Loading