[AMDGPU] Add a trap lowering workaround for gfx11 #85854

epilk · 2024-03-19T19:40:43Z

On gfx11 shaders run with PRIV=1, which causes s_trap 2 to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2.

Fixes: SWDEV-438421

llvmbot · 2024-03-19T19:41:13Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Emma Pilkington (epilk)

Changes

On gfx11 shaders run with PRIV=1, which causes s_trap 2 to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2.

Full diff: https://github.com/llvm/llvm-project/pull/85854.diff

11 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h (+3)
(modified) llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+8-2)
(modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+2)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+11)
(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+47)
(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+9)
(modified) llvm/lib/Target/AMDGPU/SIInstructions.td (+6)
(added) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir (+40)
(modified) llvm/test/CodeGen/AMDGPU/trap-abis.ll (+68)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index bee43b6c18c880..1a4711dc06c4bd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -5376,6 +5376,7 @@ const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(RETURN_TO_EPILOG)
   NODE_NAME_CASE(ENDPGM)
   NODE_NAME_CASE(ENDPGM_TRAP)
+  NODE_NAME_CASE(SIMULATED_TRAP)
   NODE_NAME_CASE(DWORDADDR)
   NODE_NAME_CASE(FRACT)
   NODE_NAME_CASE(SETCC)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
index f10a357125e562..72661a8d29f816 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
@@ -407,6 +407,9 @@ enum NodeType : unsigned {
   // s_endpgm, but we may want to insert it in the middle of the block.
   ENDPGM_TRAP,
 
+  // "s_trap 2" equivalent on hardware that does not support it.
+  SIMULATED_TRAP,
+
   // Return to a shader part's epilog code.
   RETURN_TO_EPILOG,
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td b/llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
index 82f58ea38fd0a7..702f6e67c55271 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
@@ -377,6 +377,8 @@ def AMDGPUendpgm : SDNode<"AMDGPUISD::ENDPGM", SDTNone,
     [SDNPHasChain, SDNPOptInGlue]>;
 def AMDGPUendpgm_trap : SDNode<"AMDGPUISD::ENDPGM_TRAP", SDTNone,
     [SDNPHasChain]>;
+def AMDGPUsimulated_trap : SDNode<"AMDGPUISD::SIMULATED_TRAP", SDTNone,
+    [SDNPHasChain]>;
 
 def AMDGPUreturn_to_epilog : SDNode<"AMDGPUISD::RETURN_TO_EPILOG", SDTNone,
     [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 90872516dd6db1..02921bee91a417 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -6720,8 +6720,14 @@ bool AMDGPULegalizerInfo::legalizeTrapHsaQueuePtr(
 
 bool AMDGPULegalizerInfo::legalizeTrapHsa(
     MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B) const {
-  B.buildInstr(AMDGPU::S_TRAP)
-      .addImm(static_cast<unsigned>(GCNSubtarget::TrapID::LLVMAMDHSATrap));
+  if (!ST.requiresSimulatedTrap()) {
+    B.buildInstr(AMDGPU::S_TRAP)
+        .addImm(static_cast<unsigned>(GCNSubtarget::TrapID::LLVMAMDHSATrap));
+    MI.eraseFromParent();
+    return true;
+  }
+
+  ST.getInstrInfo()->insertSimulatedTrap(MRI, B.getMBB(), MI, MI.getDebugLoc());
   MI.eraseFromParent();
   return true;
 }
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index ca51da659c3311..a7eab6caa9aba4 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -443,6 +443,8 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
     return isAmdHsaOS() ? TrapHandlerAbi::AMDHSA : TrapHandlerAbi::NONE;
   }
 
+  bool requiresSimulatedTrap() const { return getGeneration() == GFX11; }
+
   bool supportsGetDoorbellID() const {
     // The S_GETREG DOORBELL_ID is supported by all GFX9 onward targets.
     return getGeneration() >= GFX9;
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 5ccf21f76015de..25d8d2f968cdfa 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -5408,6 +5408,14 @@ MachineBasicBlock *SITargetLowering::EmitInstrWithCustomInserter(
     MI.eraseFromParent();
     return SplitBB;
   }
+  case AMDGPU::SIMULATED_TRAP: {
+    assert(Subtarget->requiresSimulatedTrap());
+    MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();
+    MachineBasicBlock *SplitBB =
+        TII->insertSimulatedTrap(MRI, *BB, MI, MI.getDebugLoc());
+    MI.eraseFromParent();
+    return SplitBB;
+  }
   default:
     return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);
   }
@@ -6621,6 +6629,9 @@ SDValue SITargetLowering::lowerTrapHsa(
   SDLoc SL(Op);
   SDValue Chain = Op.getOperand(0);
 
+  if (Subtarget->requiresSimulatedTrap())
+    return DAG.getNode(AMDGPUISD::SIMULATED_TRAP, SL, MVT::Other, Chain);
+
   uint64_t TrapID = static_cast<uint64_t>(GCNSubtarget::TrapID::LLVMAMDHSATrap);
   SDValue Ops[] = {
     Chain,
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index f4b21b7dfac391..0d5928cc25e3b7 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2026,6 +2026,53 @@ void SIInstrInfo::insertReturn(MachineBasicBlock &MBB) const {
   }
 }
 
+MachineBasicBlock *SIInstrInfo::insertSimulatedTrap(MachineRegisterInfo &MRI,
+                                                    MachineBasicBlock &MBB,
+                                                    MachineInstr &MI,
+                                                    const DebugLoc &DL) const {
+  MachineFunction *MF = MBB.getParent();
+  MachineBasicBlock *SplitBB = MBB.splitAt(MI, /*UpdateLiveIns=*/false);
+  MachineBasicBlock *HaltLoop = MF->CreateMachineBasicBlock();
+  MF->push_back(HaltLoop);
+
+  // Start with a `s_trap 2`, if we're in PRIV=1 and we need the workaround this
+  // will be a nop.
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_TRAP))
+      .addImm(static_cast<unsigned>(GCNSubtarget::TrapID::LLVMAMDHSATrap));
+  Register DoorbellReg = MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_SENDMSG_RTN_B32), DoorbellReg)
+      .addImm(AMDGPU::SendMsg::ID_RTN_GET_DOORBELL);
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B32), AMDGPU::TTMP2)
+      .addUse(AMDGPU::M0);
+  Register And0x3ff = MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_AND_B32), And0x3ff)
+      .addUse(DoorbellReg)
+      .addImm(0x3ff);
+  Register SetWaveAbortBit =
+      MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_OR_B32), SetWaveAbortBit)
+      .addUse(And0x3ff)
+      .addImm(0x400);
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B32), AMDGPU::M0)
+      .addUse(SetWaveAbortBit);
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_SENDMSG))
+      .addImm(AMDGPU::SendMsg::ID_INTERRUPT);
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_MOV_B32), AMDGPU::M0)
+      .addUse(AMDGPU::TTMP2);
+  BuildMI(MBB, MI, DL, get(AMDGPU::S_BRANCH)).addMBB(HaltLoop);
+
+  BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_SETHALT)).addImm(5);
+  BuildMI(*HaltLoop, HaltLoop->end(), DL, get(AMDGPU::S_BRANCH))
+      .addMBB(HaltLoop);
+
+  if (SplitBB != &MBB)
+    MBB.removeSuccessor(SplitBB);
+  MBB.addSuccessor(HaltLoop);
+  HaltLoop->addSuccessor(HaltLoop);
+
+  return SplitBB;
+}
+
 unsigned SIInstrInfo::getNumWaitStates(const MachineInstr &MI) {
   switch (MI.getOpcode()) {
   default:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index a62bf779fe2e2d..1ba16bffa5e28e 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -1194,6 +1194,15 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
                    unsigned Quantity) const override;
 
   void insertReturn(MachineBasicBlock &MBB) const;
+
+  /// Build instructions that simulate the behavior of a `s_trap 2` instructions
+  /// for hardware (namely, gfx11) that runs in PRIV=1 mode. There, s_trap is
+  /// interpreted as a nop.
+  MachineBasicBlock *insertSimulatedTrap(MachineRegisterInfo &MRI,
+                                         MachineBasicBlock &MBB,
+                                         MachineInstr &MI,
+                                         const DebugLoc &DL) const;
+
   /// Return the number of wait states that result from executing this
   /// instruction.
   static unsigned getNumWaitStates(const MachineInstr &MI);
diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 1c942dcefdacea..3f9a048a329b2a 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -106,6 +106,12 @@ def ENDPGM_TRAP : SPseudoInstSI<
   let usesCustomInserter = 1;
 }
 
+def SIMULATED_TRAP : SPseudoInstSI<(outs), (ins), [(AMDGPUsimulated_trap)],
+                                   "SIMULATED_TRAP"> {
+  let hasSideEffects = 1;
+  let usesCustomInserter = 1;
+}
+
 def ATOMIC_FENCE : SPseudoInstSI<
   (outs), (ins i32imm:$ordering, i32imm:$scope),
   [(atomic_fence (i32 timm:$ordering), (i32 timm:$scope))],
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
new file mode 100644
index 00000000000000..0795e2b68bec98
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir
@@ -0,0 +1,40 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+# RUN: llc -global-isel=1 -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -o - -run-pass=legalizer %s | FileCheck -check-prefix=GCN %s
+
+---
+name: test_trap
+body: |
+  bb.0:
+    ; GCN-LABEL: name: test_trap
+    ; GCN: successors: %bb.2(0x80000000)
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; GCN-NEXT: [[C1:%[0-9]+]]:_(p1) = G_CONSTANT i64 0
+    ; GCN-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+    ; GCN-NEXT: S_TRAP 2
+    ; GCN-NEXT: [[S_SENDMSG_RTN_B32_:%[0-9]+]]:sreg_32 = S_SENDMSG_RTN_B32 128
+    ; GCN-NEXT: $ttmp2 = S_MOV_B32 $m0
+    ; GCN-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_SENDMSG_RTN_B32_]], 1023, implicit-def $scc
+    ; GCN-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], 1024, implicit-def $scc
+    ; GCN-NEXT: $m0 = S_MOV_B32 [[S_OR_B32_]]
+    ; GCN-NEXT: S_SENDMSG 1, implicit $exec, implicit $m0
+    ; GCN-NEXT: $m0 = S_MOV_B32 $ttmp2
+    ; GCN-NEXT: S_BRANCH %bb.2
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: .1:
+    ; GCN-NEXT: successors:
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: G_STORE [[C]](s32), [[C1]](p1) :: (store (s8), addrspace 1)
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: .2:
+    ; GCN-NEXT: successors: %bb.2(0x80000000)
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: S_SETHALT 5
+    ; GCN-NEXT: S_BRANCH %bb.2
+    %0:_(s8) = G_CONSTANT i8 0
+    %1:_(p1) = G_CONSTANT i64 0
+    G_STORE %0, %1 :: (store 1, addrspace 1)
+    G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.trap)
+    G_STORE %0, %1 :: (store 1, addrspace 1)
+
+...
diff --git a/llvm/test/CodeGen/AMDGPU/trap-abis.ll b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
index 3cd6c98ef4b8e0..8d5d752925742b 100644
--- a/llvm/test/CodeGen/AMDGPU/trap-abis.ll
+++ b/llvm/test/CodeGen/AMDGPU/trap-abis.ll
@@ -3,6 +3,7 @@
 ; RUN: llc %s -o - -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -verify-machineinstrs | FileCheck --check-prefix=HSA-TRAP-GFX803 %s
 ; RUN: llc %s -o - -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs | FileCheck --check-prefix=HSA-TRAP-GFX900 %s
 ; RUN: llc %s -o - -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-trap-handler -verify-machineinstrs | FileCheck --check-prefix=HSA-NOTRAP-GFX900 %s
+; RUN: llc %s -o - -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs | FileCheck --check-prefix=HSA-TRAP-GFX1100 %s
 
 declare void @llvm.trap() #0
 declare void @llvm.debugtrap() #1
@@ -49,6 +50,27 @@ define amdgpu_kernel void @trap(ptr addrspace(1) nocapture readonly %arg0) {
 ; HSA-NOTRAP-GFX900-NEXT:    global_store_dword v0, v1, s[0:1]
 ; HSA-NOTRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; HSA-NOTRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX1100-LABEL: trap:
+; HSA-TRAP-GFX1100:       ; %bb.0:
+; HSA-TRAP-GFX1100-NEXT:    s_load_b64 s[0:1], s[0:1], 0x0
+; HSA-TRAP-GFX1100-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 1
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT:    s_trap 2
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
+; HSA-TRAP-GFX1100-NEXT:    s_bitset1_b32 s0, 10
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-NEXT:  .LBB0_1: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-NEXT:    s_sethalt 5
+; HSA-TRAP-GFX1100-NEXT:    s_branch .LBB0_1
   store volatile i32 1, ptr addrspace(1) %arg0
   call void @llvm.trap()
   unreachable
@@ -128,6 +150,37 @@ define amdgpu_kernel void @non_entry_trap(ptr addrspace(1) nocapture readonly %a
 ; HSA-NOTRAP-GFX900-NEXT:    s_endpgm
 ; HSA-NOTRAP-GFX900-NEXT:  .LBB1_2: ; %trap
 ; HSA-NOTRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX1100-LABEL: non_entry_trap:
+; HSA-TRAP-GFX1100:       ; %bb.0: ; %entry
+; HSA-TRAP-GFX1100-NEXT:    s_load_b64 s[0:1], s[0:1], 0x0
+; HSA-TRAP-GFX1100-NEXT:    v_mov_b32_e32 v0, 0
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    global_load_b32 v1, v0, s[0:1] glc dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt vmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    v_cmp_eq_u32_e32 vcc_lo, -1, v1
+; HSA-TRAP-GFX1100-NEXT:    s_cbranch_vccz .LBB1_2
+; HSA-TRAP-GFX1100-NEXT:  ; %bb.1: ; %ret
+; HSA-TRAP-GFX1100-NEXT:    v_mov_b32_e32 v1, 3
+; HSA-TRAP-GFX1100-NEXT:    global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT:    s_nop 0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; HSA-TRAP-GFX1100-NEXT:    s_endpgm
+; HSA-TRAP-GFX1100-NEXT:  .LBB1_2: ; %trap
+; HSA-TRAP-GFX1100-NEXT:    s_trap 2
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 ttmp2, m0
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    s_and_b32 s0, s0, 0x3ff
+; HSA-TRAP-GFX1100-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
+; HSA-TRAP-GFX1100-NEXT:    s_bitset1_b32 s0, 10
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 m0, s0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_INTERRUPT)
+; HSA-TRAP-GFX1100-NEXT:    s_mov_b32 m0, ttmp2
+; HSA-TRAP-GFX1100-NEXT:  .LBB1_3: ; =>This Inner Loop Header: Depth=1
+; HSA-TRAP-GFX1100-NEXT:    s_sethalt 5
+; HSA-TRAP-GFX1100-NEXT:    s_branch .LBB1_3
 entry:
   %tmp29 = load volatile i32, ptr addrspace(1) %arg0
   %cmp = icmp eq i32 %tmp29, -1
@@ -197,6 +250,21 @@ define amdgpu_kernel void @debugtrap(ptr addrspace(1) nocapture readonly %arg0)
 ; HSA-NOTRAP-GFX900-NEXT:    global_store_dword v0, v2, s[0:1]
 ; HSA-NOTRAP-GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; HSA-NOTRAP-GFX900-NEXT:    s_endpgm
+;
+; HSA-TRAP-GFX1100-LABEL: debugtrap:
+; HSA-TRAP-GFX1100:       ; %bb.0:
+; HSA-TRAP-GFX1100-NEXT:    s_load_b64 s[0:1], s[0:1], 0x0
+; HSA-TRAP-GFX1100-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 1
+; HSA-TRAP-GFX1100-NEXT:    v_mov_b32_e32 v2, 2
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt lgkmcnt(0)
+; HSA-TRAP-GFX1100-NEXT:    global_store_b32 v0, v1, s[0:1] dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT:    s_trap 3
+; HSA-TRAP-GFX1100-NEXT:    global_store_b32 v0, v2, s[0:1] dlc
+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT:    s_nop 0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; HSA-TRAP-GFX1100-NEXT:    s_endpgm
   store volatile i32 1, ptr addrspace(1) %arg0
   call void @llvm.debugtrap()
   store volatile i32 2, ptr addrspace(1) %arg0

arsenm · 2024-03-20T07:37:49Z

llvm/test/CodeGen/AMDGPU/trap-abis.ll

+; HSA-TRAP-GFX1100-NEXT:    s_waitcnt_vscnt null, 0x0
+; HSA-TRAP-GFX1100-NEXT:    s_nop 0
+; HSA-TRAP-GFX1100-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; HSA-TRAP-GFX1100-NEXT:    s_endpgm
  store volatile i32 1, ptr addrspace(1) %arg0


Should have a -O0 test. Is this emulation safe in the presence of any spills?

Sure, I just added an O0 test in the update. I'm not sure I understand why spills would potentially be a problem. Are you concerned about the use of the M0 register?

slinder1 · 2024-03-21T17:10:03Z

What is the usual criteria for whether a workaround should be based on a target feature or a GFXIP# check?

Is there a chance some gfx11 revision fixes the underlying bug and we will need to switch to a feature? Can we just do that when/if it happens?

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/GCNSubtarget.h

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir

github-actions · 2024-03-22T19:20:35Z

✅ With the latest revision this PR passed the C/C++ code formatter.

github-actions · 2024-03-22T19:20:35Z

✅ With the latest revision this PR passed the Python code formatter.

llvm/lib/Target/AMDGPU/GCNSubtarget.h

On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2.

epilk · 2024-04-17T19:03:15Z

Ping!

arsenm · 2024-04-17T19:16:58Z

llvm/lib/Target/AMDGPU/AMDGPU.td

@@ -303,6 +303,12 @@ def FeatureMSAALoadDstSelBug : SubtargetFeature<"msaa-load-dst-sel-bug",
  "MSAA loads not honoring dst_sel bug"
 >;

+def FeaturePrivEnabledBug : SubtargetFeature<"priv-enabled-bug",


Should this mention trap, or is it more general?

The reason we need a simulated trap is because priv=1 is enabled, but AFAIK simulated trap is the only effect of that we care about. Would you prefer naming this FeatureTrap2NopBug or something like that?

Well I think it should be more descriptive than "priv enabled bug" in case there are different priv enabled bugs in the future. Doesn't necessarily mean it involves trap

I renamed it FeaturePrivEnabledTrap2NopBug in the update. WDYT?

Incidentally, I noticed that FeatureMADIntraFwdBug isn't included in the gfx11-generic FeatureSet. Should that be updated too?

Incidentally, I noticed that FeatureMADIntraFwdBug isn't included in the gfx11-generic FeatureSet. Should that be updated too?

I guess so, but not in this patch. @Pierre-vh?

Ok thanks, I just created #89936 to fix this.

…91652) This was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384. Fixes a bug in #85854.

epilk requested review from arsenm, kzhuravl and slinder1 March 19, 2024 19:40

llvmbot added backend:AMDGPU llvm:globalisel labels Mar 19, 2024

arsenm reviewed Mar 20, 2024

View reviewed changes

slinder1 reviewed Mar 21, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/GCNSubtarget.h Outdated Show resolved Hide resolved

arsenm reviewed Mar 22, 2024

View reviewed changes

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-trap-gfx11.mir Outdated Show resolved Hide resolved

jayfoad reviewed Mar 23, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/GCNSubtarget.h Outdated Show resolved Hide resolved

epilk added 4 commits March 25, 2024 10:13

Add an -O0 test

4f2c25e

Address review comments

b575811

Use a feature, switch to G_TRAP

258a604

epilk force-pushed the SWDEV-438421 branch from e6fa9ad to 258a604 Compare March 25, 2024 14:51

arsenm reviewed Apr 17, 2024

View reviewed changes

epilk added 2 commits April 18, 2024 10:59

Use a more descriptive name for the feature

ecf82cb

Add feature to gfx11-generic

ab773eb

arsenm approved these changes Apr 23, 2024

View reviewed changes

epilk merged commit a047147 into llvm:main Apr 24, 2024
4 checks passed

epilk mentioned this pull request May 9, 2024

[AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering #91652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Add a trap lowering workaround for gfx11 #85854

[AMDGPU] Add a trap lowering workaround for gfx11 #85854

epilk commented Mar 19, 2024 •

edited by arsenm

llvmbot commented Mar 19, 2024 •

edited

arsenm Mar 20, 2024

epilk Mar 21, 2024

slinder1 commented Mar 21, 2024

github-actions bot commented Mar 22, 2024

github-actions bot commented Mar 22, 2024

epilk commented Apr 17, 2024

arsenm Apr 17, 2024

epilk Apr 17, 2024

arsenm Apr 17, 2024

epilk Apr 23, 2024

jayfoad Apr 23, 2024

epilk Apr 24, 2024

[AMDGPU] Add a trap lowering workaround for gfx11 #85854

[AMDGPU] Add a trap lowering workaround for gfx11 #85854

Conversation

epilk commented Mar 19, 2024 • edited by arsenm

llvmbot commented Mar 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slinder1 commented Mar 21, 2024

github-actions bot commented Mar 22, 2024

github-actions bot commented Mar 22, 2024

epilk commented Apr 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epilk commented Mar 19, 2024 •

edited by arsenm

llvmbot commented Mar 19, 2024 •

edited