-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AMDGPU] Enable overwrite ALU bit in sched.barrier mask #160782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The sched.barrier takes a bit mask that determines which instruction categories are allowed to cross the inserted sched.barrier during the igrouplp scheduling pass. Currently, a set ALU bit results in allowing all ALU instructions to move across the barrier, independent of whether more specific bits have been specified. The documentation is silent about the semantics in that case. This PR changes the current handling: When a mask contains both a set ALU bit and a set more-specific bit, the more specific bit is respected and the ALU bit does *not imply* all other bits. Current: 0x00000005 -- 0101 set ALU and SALU bit. Currently the ALU bit implies SALU and VALU and MFMA to be set. New: 0x00000005 -- 0101 set ALU and SALU bit. SALU bit set, therefore ALU bit is ignored and only SALU bit is considered.
@llvm/pr-subscribers-backend-amdgpu Author: Jan Patrick Lehr (jplehr) ChangesThe sched.barrier takes a bit mask that determines which instruction categories are allowed to cross the inserted sched.barrier during the igrouplp scheduling pass. Currently, a set ALU bit results in allowing all ALU instructions to move across the barrier, independent of whether more specific bits have been specified. This PR changes the current handling: When a mask contains both a set ALU bit and a set more-specific bit, the more specific bit is respected and the ALU bit does not imply all other bits. Current: New: Full diff: https://github.com/llvm/llvm-project/pull/160782.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
index dbe74b1b08f8c..edb43627dd51e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
@@ -2615,8 +2615,14 @@ IGroupLPDAGMutation::invertSchedBarrierMask(SchedGroupMask Mask) const {
// allowed past the SCHED_BARRIER.
SchedGroupMask InvertedMask = ~Mask;
+ // When given, specific bits overrule the more general ALU type.
+ bool HasConcreteClassSpecified =
+ (Mask & (SchedGroupMask::SALU | SchedGroupMask::VALU |
+ SchedGroupMask::MFMA)) != SchedGroupMask::NONE;
+
// ALU implies VALU, SALU, MFMA, TRANS.
- if ((InvertedMask & SchedGroupMask::ALU) == SchedGroupMask::NONE)
+ if (!HasConcreteClassSpecified &&
+ (InvertedMask & SchedGroupMask::ALU) == SchedGroupMask::NONE)
InvertedMask &= ~SchedGroupMask::VALU & ~SchedGroupMask::SALU &
~SchedGroupMask::MFMA & ~SchedGroupMask::TRANS;
// VALU, SALU, MFMA, TRANS implies ALU.
|
I am aware that this needs a/several test(s) and I would mostly like to get some feedback on whether this is an acceptable approach. |
LGTM. This is a compatibility breaking (in perf sense) change. Please plan to clarify in https://llvm.org/docs/AMDGPUUsage.html, saying interpretation has changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think that if we want to do this, we need to be consistent across all different masks. For example, why do we constrain for SALU if our mask has both SALU and ALU, but we don't constrain for VMEM_READ if we have both VMEM_READ and VMEM. Also, I'm not sure what the intended use case is here. Needs tests. |
Excellent points. I agree that it should be handled consistently. If you are ok with the direction that this change is going, then I can go ahead and do the changes and clarifications in the documents.
The intended use is coming from Triton where I think they want to set everything and then selectively unset certain bits, e.g., MFMA. I think the overall expectation is: If you specify "everything" (ALU bit) but then want to selectively unset something, the latter should take precedence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs tests
The sched.barrier takes a bit mask that determines which instruction categories are allowed to cross the inserted sched.barrier during the igrouplp scheduling pass.
Currently, a set ALU bit results in allowing all ALU instructions to move across the barrier, independent of whether more specific bits have been specified.
The documentation is silent about the semantics in that case.
This PR changes the current handling: When a mask contains both a set ALU bit and a set more-specific bit, the more specific bit is respected and the ALU bit does not imply all other bits.
Current:
0x00000005 -- 0101 set ALU and SALU bit. Currently the ALU bit implies
SALU and VALU and MFMA to be set.
New:
0x00000005 -- 0101 set ALU and SALU bit. SALU bit set, therefore ALU bit
is ignored and only SALU bit is considered.