Skip to content

Commit

Permalink
[AMDGPU] More precise limit on SALU cycles in s_delay_alu instructions
Browse files Browse the repository at this point in the history
This just tweaks the fix for D145232 to make the limit more precise, so
that we could actually emit a delay of 3 SALU cycles (the maximum) if we
had any SALU instructions that required it.
  • Loading branch information
jayfoad committed Mar 5, 2023
1 parent 7abf6c2 commit 7ba61ea
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp
Expand Up @@ -83,9 +83,9 @@ class AMDGPUInsertDelayAlu : public MachineFunctionPass {
// an s_delay_alu instruction.
static constexpr unsigned TRANS_MAX = 4;

// The maximum number of SALU cycles we can encode in an s_delay_alu
// instruction.
static constexpr unsigned SALU_CYCLES_MAX = 3;
// One larger than the maximum number of SALU cycles we can encode in an
// s_delay_alu instruction.
static constexpr unsigned SALU_CYCLES_MAX = 4;

// If it was written by a (non-TRANS) VALU, remember how many clock cycles
// are left until it completes, and how many other (non-TRANS) VALU we have
Expand Down Expand Up @@ -284,6 +284,7 @@ class AMDGPUInsertDelayAlu : public MachineFunctionPass {

// Wait for an SALU instruction.
if (Delay.SALUCycles) {
assert(Delay.SALUCycles < DelayInfo::SALU_CYCLES_MAX);
if (Imm & 0x780) {
// We have already encoded a VALU and a TRANS delay. There's no room in
// the encoding for an SALU delay as well, so just drop it.
Expand Down

0 comments on commit 7ba61ea

Please sign in to comment.