-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AMDGPU][NFC][SIInsertWaitcnts] Remove redundant checks for invalidate instructions #166139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[AMDGPU][NFC][SIInsertWaitcnts] Remove redundant checks for invalidate instructions #166139
Conversation
SIInsertWaitcnts::getVmemWaitEventType() tests for GLOBAL_INV, GLOBAL_WB, and GLOBAL_WBINV instructions, but in each case it is used, then either a check has already been made for these instructions, or it is known that the instruction definitely will not be one of these. Move the checks for these instructions out of getVmemWaitEventType() into a new function getInvOrWBWaitEventType(), that returns an optional WaitEventType value, and use that in the situation where it could be one of the instructions. SIInstrInfo::isGFX12CacheInvOrWBInst() is now itself redundant and is removed.
|
@llvm/pr-subscribers-backend-amdgpu Author: Stephen Thomas (stepthomas) ChangesSIInsertWaitcnts::getVmemWaitEventType() tests for GLOBAL_INV, GLOBAL_WB, and GLOBAL_WBINV instructions, but in each case it is used, then either a check has already been made for these instructions, or it is known that the instruction definitely will not be one of these. Move the checks for these instructions out of getVmemWaitEventType() into a new function getInvOrWBWaitEventType(), that returns an optional WaitEventType value, and use that in the situation where it could be one of the instructions. SIInstrInfo::isGFX12CacheInvOrWBInst() is now itself redundant and is removed. Full diff: https://github.com/llvm/llvm-project/pull/166139.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index b7fa899678ec7..1fabb3fdabf66 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -527,9 +527,10 @@ class SIInsertWaitcnts {
#endif // NDEBUG
}
- // Return the appropriate VMEM_*_ACCESS type for Inst, which must be a VMEM
- // instruction.
- WaitEventType getVmemWaitEventType(const MachineInstr &Inst) const {
+ // Return an optional WaitEventType value if Inst is a cache
+ // invalidate or WB instruction.
+ std::optional<WaitEventType>
+ getInvOrWBWaitEventType(const MachineInstr &Inst) const {
switch (Inst.getOpcode()) {
case AMDGPU::GLOBAL_INV:
return VMEM_READ_ACCESS; // tracked using loadcnt
@@ -537,9 +538,13 @@ class SIInsertWaitcnts {
case AMDGPU::GLOBAL_WBINV:
return VMEM_WRITE_ACCESS; // tracked using storecnt
default:
- break;
+ return {};
}
+ }
+ // Return the appropriate VMEM_*_ACCESS type for Inst, which must be a VMEM
+ // instruction.
+ WaitEventType getVmemWaitEventType(const MachineInstr &Inst) const {
// Maps VMEM access types to their corresponding WaitEventType.
static const WaitEventType VmemReadMapping[NUM_VMEM_TYPES] = {
VMEM_READ_ACCESS, VMEM_SAMPLER_READ_ACCESS, VMEM_BVH_READ_ACCESS};
@@ -2265,8 +2270,8 @@ void SIInsertWaitcnts::updateEventWaitcntAfter(MachineInstr &Inst,
ScoreBrackets->updateByEvent(LDS_ACCESS, Inst);
}
} else if (TII->isFLAT(Inst)) {
- if (SIInstrInfo::isGFX12CacheInvOrWBInst(Inst.getOpcode())) {
- ScoreBrackets->updateByEvent(getVmemWaitEventType(Inst), Inst);
+ if (std::optional<WaitEventType> ET = getInvOrWBWaitEventType(Inst)) {
+ ScoreBrackets->updateByEvent(*ET, Inst);
return;
}
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index dc23a21f959ce..adb6002230fce 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -1089,11 +1089,6 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
Opcode == AMDGPU::DS_GWS_BARRIER;
}
- static bool isGFX12CacheInvOrWBInst(unsigned Opc) {
- return Opc == AMDGPU::GLOBAL_INV || Opc == AMDGPU::GLOBAL_WB ||
- Opc == AMDGPU::GLOBAL_WBINV;
- }
-
static bool isF16PseudoScalarTrans(unsigned Opcode) {
return Opcode == AMDGPU::V_S_EXP_F16_e64 ||
Opcode == AMDGPU::V_S_LOG_F16_e64 ||
|
jmmartinez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment. Please add some tags to the PR title: [NFC][SIInsertWaitcnts]
| std::optional<WaitEventType> | ||
| getInvOrWBWaitEventType(const MachineInstr &Inst) const { | ||
| switch (Inst.getOpcode()) { | ||
| case AMDGPU::GLOBAL_INV: | ||
| return VMEM_READ_ACCESS; // tracked using loadcnt | ||
| case AMDGPU::GLOBAL_WB: | ||
| case AMDGPU::GLOBAL_WBINV: | ||
| return VMEM_WRITE_ACCESS; // tracked using storecnt | ||
| default: | ||
| break; | ||
| return {}; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could move this into a static helper outside of the class now. It's not accessing any class members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll do this.
| // Return the appropriate VMEM_*_ACCESS type for Inst, which must be a VMEM | ||
| // instruction. | ||
| WaitEventType getVmemWaitEventType(const MachineInstr &Inst) const { | ||
| switch (Inst.getOpcode()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some reason why this function should not handle these instructions? The only documented precondition is that Inst should be a VMEM instruction, which these are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know of, but way it's currently used means we can statically guarantee that in the remaining two cases where getVmemWaitEventType() is called they never will be these instructions, unless the logic using the function changes substantially.
Perhaps renaming and re-documenting the function would be in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get the motiviation. Isn't it clearer to keep the purpose of the function as simple as possible, and have it handle all VMEM instructions without exceptions?
If there are some exceptions we should probably have an assert for them, but that would probably make the code at least as complex as not having the exceptions in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary motivation was the observation that line 2268 ends up checking for GLOBAL_INV, GLOBAL_WB or GLOBAL_WBINV twice, the first time to check if it is one of those instructions, and the second time to work out which wait counter should be used. I wanted to streamline those two checks into a single check. The secondary motivation is the observation that the remaining uses of getVmemWaitEventType() could not be invalidate/WB instructions, because either they had already been dealt with or excluded by previous checks, so checking for them again is silly.
If it's felt that getVmemWaitEventType() should be fully general purpose, it could use getInvOrWBWaitEventType() to check for those specific instructions, but as of this patch, the remaining use cases mean it would be a redundant check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep getVmemWaitEventType general and instead look at why 2268 handles invalidate instructions specially. I think it could probably use the more general VMEM handling case at line 2277 except that it probably does not want to set IsVMEMAccess, because I think that really means "is this a VMEM access that increments Xcnt", and I guess invalidates do not increment Xcnt because they do not specify any address to translate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also the case that as far as I can tell, mayAccessVMEMThroughFlat() and mayAccessLDSThroughFlat() will by return true for these instructions, because both functions conservatively return that if there are no memory operands.
🐧 Linux x64 Test Results
|
SIInsertWaitcnts::getVmemWaitEventType() tests for GLOBAL_INV, GLOBAL_WB, and GLOBAL_WBINV instructions, but in each case it is used, then either a check has already been made for these instructions, or it is known that the instruction definitely will not be one of these.
Move the checks for these instructions out of getVmemWaitEventType() into a new function getInvOrWBWaitEventType(), that returns an optional WaitEventType value, and use that in the situation where it could be one of the instructions.
SIInstrInfo::isGFX12CacheInvOrWBInst() is now itself redundant and is removed.