Skip to content

Conversation

jplehr
Copy link
Contributor

@jplehr jplehr commented Sep 25, 2025

The sched.barrier takes a bit mask that determines which instruction categories are allowed to cross the inserted sched.barrier during the igrouplp scheduling pass.

Currently, a set ALU bit results in allowing all ALU instructions to move across the barrier, independent of whether more specific bits have been specified.
The documentation is silent about the semantics in that case.

This PR changes the current handling: When a mask contains both a set ALU bit and a set more-specific bit, the more specific bit is respected and the ALU bit does not imply all other bits.

Current:
0x00000005 -- 0101 set ALU and SALU bit. Currently the ALU bit implies
SALU and VALU and MFMA to be set.

New:
0x00000005 -- 0101 set ALU and SALU bit. SALU bit set, therefore ALU bit
is ignored and only SALU bit is considered.

The sched.barrier takes a bit mask that determines which instruction
categories are allowed to cross the inserted sched.barrier during the
igrouplp scheduling pass.

Currently, a set ALU bit results in allowing all ALU instructions to
move across the barrier, independent of whether more specific bits
have been specified.
The documentation is silent about the semantics in that case.

This PR changes the current handling: When a mask contains both a set
ALU bit and a set more-specific bit, the more specific bit is respected
and the ALU bit does *not imply* all other bits.

Current:
0x00000005 -- 0101 set ALU and SALU bit. Currently the ALU bit implies
                   SALU and VALU and MFMA to be set.

New:
0x00000005 -- 0101 set ALU and SALU bit. SALU bit set, therefore ALU bit
                   is ignored and only SALU bit is considered.
@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Jan Patrick Lehr (jplehr)

Changes

The sched.barrier takes a bit mask that determines which instruction categories are allowed to cross the inserted sched.barrier during the igrouplp scheduling pass.

Currently, a set ALU bit results in allowing all ALU instructions to move across the barrier, independent of whether more specific bits have been specified.
The documentation is silent about the semantics in that case.

This PR changes the current handling: When a mask contains both a set ALU bit and a set more-specific bit, the more specific bit is respected and the ALU bit does not imply all other bits.

Current:
0x00000005 -- 0101 set ALU and SALU bit. Currently the ALU bit implies
SALU and VALU and MFMA to be set.

New:
0x00000005 -- 0101 set ALU and SALU bit. SALU bit set, therefore ALU bit
is ignored and only SALU bit is considered.


Full diff: https://github.com/llvm/llvm-project/pull/160782.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp (+7-1)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
index dbe74b1b08f8c..edb43627dd51e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
@@ -2615,8 +2615,14 @@ IGroupLPDAGMutation::invertSchedBarrierMask(SchedGroupMask Mask) const {
   // allowed past the SCHED_BARRIER.
   SchedGroupMask InvertedMask = ~Mask;
 
+  // When given, specific bits overrule the more general ALU type.
+  bool HasConcreteClassSpecified =
+      (Mask & (SchedGroupMask::SALU | SchedGroupMask::VALU |
+               SchedGroupMask::MFMA)) != SchedGroupMask::NONE;
+
   // ALU implies VALU, SALU, MFMA, TRANS.
-  if ((InvertedMask & SchedGroupMask::ALU) == SchedGroupMask::NONE)
+  if (!HasConcreteClassSpecified &&
+      (InvertedMask & SchedGroupMask::ALU) == SchedGroupMask::NONE)
     InvertedMask &= ~SchedGroupMask::VALU & ~SchedGroupMask::SALU &
                     ~SchedGroupMask::MFMA & ~SchedGroupMask::TRANS;
   // VALU, SALU, MFMA, TRANS implies ALU.

@jplehr
Copy link
Contributor Author

jplehr commented Sep 25, 2025

I am aware that this needs a/several test(s) and I would mostly like to get some feedback on whether this is an acceptable approach.

@hidekisaito
Copy link
Contributor

LGTM. This is a compatibility breaking (in perf sense) change. Please plan to clarify in https://llvm.org/docs/AMDGPUUsage.html, saying interpretation has changed.

Copy link
Contributor

@hidekisaito hidekisaito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jrbyrnes
Copy link
Contributor

jrbyrnes commented Oct 3, 2025

I think that if we want to do this, we need to be consistent across all different masks. For example, why do we constrain for SALU if our mask has both SALU and ALU, but we don't constrain for VMEM_READ if we have both VMEM_READ and VMEM.

Also, I'm not sure what the intended use case is here.

Needs tests.

@jplehr
Copy link
Contributor Author

jplehr commented Oct 4, 2025

I think that if we want to do this, we need to be consistent across all different masks. For example, why do we constrain for SALU if our mask has both SALU and ALU, but we don't constrain for VMEM_READ if we have both VMEM_READ and VMEM.

Excellent points. I agree that it should be handled consistently. If you are ok with the direction that this change is going, then I can go ahead and do the changes and clarifications in the documents.

Also, I'm not sure what the intended use case is here.

The intended use is coming from Triton where I think they want to set everything and then selectively unset certain bits, e.g., MFMA.

I think the overall expectation is: If you specify "everything" (ALU bit) but then want to selectively unset something, the latter should take precedence.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants