Skip to content

Conversation

@stepthomas
Copy link
Contributor

SIInsertWaitcnts::getVmemWaitEventType() tests for GLOBAL_INV, GLOBAL_WB, and GLOBAL_WBINV instructions, but in each case it is used, then either a check has already been made for these instructions, or it is known that the instruction definitely will not be one of these.

Move the checks for these instructions out of getVmemWaitEventType() into a new function getInvOrWBWaitEventType(), that returns an optional WaitEventType value, and use that in the situation where it could be one of the instructions.

SIInstrInfo::isGFX12CacheInvOrWBInst() is now itself redundant and is removed.

SIInsertWaitcnts::getVmemWaitEventType() tests for GLOBAL_INV, GLOBAL_WB, and
GLOBAL_WBINV instructions, but in each case it is used, then either a
check has already been made for these instructions, or it is known that
the instruction definitely will not be one of these.

Move the checks for these instructions out of getVmemWaitEventType() into
a new function getInvOrWBWaitEventType(), that returns an optional WaitEventType
value, and use that in the situation where it could be one of the instructions.

SIInstrInfo::isGFX12CacheInvOrWBInst() is now itself redundant and is removed.
@llvmbot
Copy link
Member

llvmbot commented Nov 3, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Stephen Thomas (stepthomas)

Changes

SIInsertWaitcnts::getVmemWaitEventType() tests for GLOBAL_INV, GLOBAL_WB, and GLOBAL_WBINV instructions, but in each case it is used, then either a check has already been made for these instructions, or it is known that the instruction definitely will not be one of these.

Move the checks for these instructions out of getVmemWaitEventType() into a new function getInvOrWBWaitEventType(), that returns an optional WaitEventType value, and use that in the situation where it could be one of the instructions.

SIInstrInfo::isGFX12CacheInvOrWBInst() is now itself redundant and is removed.


Full diff: https://github.com/llvm/llvm-project/pull/166139.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+11-6)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (-5)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index b7fa899678ec7..1fabb3fdabf66 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -527,9 +527,10 @@ class SIInsertWaitcnts {
 #endif // NDEBUG
   }
 
-  // Return the appropriate VMEM_*_ACCESS type for Inst, which must be a VMEM
-  // instruction.
-  WaitEventType getVmemWaitEventType(const MachineInstr &Inst) const {
+  // Return an optional WaitEventType value if Inst is a cache
+  // invalidate or WB instruction.
+  std::optional<WaitEventType>
+  getInvOrWBWaitEventType(const MachineInstr &Inst) const {
     switch (Inst.getOpcode()) {
     case AMDGPU::GLOBAL_INV:
       return VMEM_READ_ACCESS; // tracked using loadcnt
@@ -537,9 +538,13 @@ class SIInsertWaitcnts {
     case AMDGPU::GLOBAL_WBINV:
       return VMEM_WRITE_ACCESS; // tracked using storecnt
     default:
-      break;
+      return {};
     }
+  }
 
+  // Return the appropriate VMEM_*_ACCESS type for Inst, which must be a VMEM
+  // instruction.
+  WaitEventType getVmemWaitEventType(const MachineInstr &Inst) const {
     // Maps VMEM access types to their corresponding WaitEventType.
     static const WaitEventType VmemReadMapping[NUM_VMEM_TYPES] = {
         VMEM_READ_ACCESS, VMEM_SAMPLER_READ_ACCESS, VMEM_BVH_READ_ACCESS};
@@ -2265,8 +2270,8 @@ void SIInsertWaitcnts::updateEventWaitcntAfter(MachineInstr &Inst,
       ScoreBrackets->updateByEvent(LDS_ACCESS, Inst);
     }
   } else if (TII->isFLAT(Inst)) {
-    if (SIInstrInfo::isGFX12CacheInvOrWBInst(Inst.getOpcode())) {
-      ScoreBrackets->updateByEvent(getVmemWaitEventType(Inst), Inst);
+    if (std::optional<WaitEventType> ET = getInvOrWBWaitEventType(Inst)) {
+      ScoreBrackets->updateByEvent(*ET, Inst);
       return;
     }
 
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index dc23a21f959ce..adb6002230fce 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -1089,11 +1089,6 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
            Opcode == AMDGPU::DS_GWS_BARRIER;
   }
 
-  static bool isGFX12CacheInvOrWBInst(unsigned Opc) {
-    return Opc == AMDGPU::GLOBAL_INV || Opc == AMDGPU::GLOBAL_WB ||
-           Opc == AMDGPU::GLOBAL_WBINV;
-  }
-
   static bool isF16PseudoScalarTrans(unsigned Opcode) {
     return Opcode == AMDGPU::V_S_EXP_F16_e64 ||
            Opcode == AMDGPU::V_S_LOG_F16_e64 ||

@stepthomas stepthomas requested a review from jayfoad November 3, 2025 09:26
Copy link
Contributor

@jmmartinez jmmartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment. Please add some tags to the PR title: [NFC][SIInsertWaitcnts]

Comment on lines 532 to 543
std::optional<WaitEventType>
getInvOrWBWaitEventType(const MachineInstr &Inst) const {
switch (Inst.getOpcode()) {
case AMDGPU::GLOBAL_INV:
return VMEM_READ_ACCESS; // tracked using loadcnt
case AMDGPU::GLOBAL_WB:
case AMDGPU::GLOBAL_WBINV:
return VMEM_WRITE_ACCESS; // tracked using storecnt
default:
break;
return {};
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move this into a static helper outside of the class now. It's not accessing any class members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll do this.

@stepthomas stepthomas changed the title remove redundant checks for invalidate instructions [AMDGPU][NFCemove redundant checks for invalidate instructions Nov 3, 2025
@stepthomas stepthomas changed the title [AMDGPU][NFCemove redundant checks for invalidate instructions [AMDGPU][NFC][SIInsertWaitcnts] Remove redundant checks for invalidate instructions Nov 3, 2025
// Return the appropriate VMEM_*_ACCESS type for Inst, which must be a VMEM
// instruction.
WaitEventType getVmemWaitEventType(const MachineInstr &Inst) const {
switch (Inst.getOpcode()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason why this function should not handle these instructions? The only documented precondition is that Inst should be a VMEM instruction, which these are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of, but way it's currently used means we can statically guarantee that in the remaining two cases where getVmemWaitEventType() is called they never will be these instructions, unless the logic using the function changes substantially.

Perhaps renaming and re-documenting the function would be in order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get the motiviation. Isn't it clearer to keep the purpose of the function as simple as possible, and have it handle all VMEM instructions without exceptions?

If there are some exceptions we should probably have an assert for them, but that would probably make the code at least as complex as not having the exceptions in the first place.

Copy link
Contributor Author

@stepthomas stepthomas Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary motivation was the observation that line 2268 ends up checking for GLOBAL_INV, GLOBAL_WB or GLOBAL_WBINV twice, the first time to check if it is one of those instructions, and the second time to work out which wait counter should be used. I wanted to streamline those two checks into a single check. The secondary motivation is the observation that the remaining uses of getVmemWaitEventType() could not be invalidate/WB instructions, because either they had already been dealt with or excluded by previous checks, so checking for them again is silly.

If it's felt that getVmemWaitEventType() should be fully general purpose, it could use getInvOrWBWaitEventType() to check for those specific instructions, but as of this patch, the remaining use cases mean it would be a redundant check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep getVmemWaitEventType general and instead look at why 2268 handles invalidate instructions specially. I think it could probably use the more general VMEM handling case at line 2277 except that it probably does not want to set IsVMEMAccess, because I think that really means "is this a VMEM access that increments Xcnt", and I guess invalidates do not increment Xcnt because they do not specify any address to translate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also the case that as far as I can tell, mayAccessVMEMThroughFlat() and mayAccessLDSThroughFlat() will by return true for these instructions, because both functions conservatively return that if there are no memory operands.

@github-actions
Copy link

🐧 Linux x64 Test Results

  • 186276 tests passed
  • 4849 tests skipped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants