[AMDGPU] Enable overwrite ALU bit in sched.barrier mask #160782

jplehr · 2025-09-25T21:36:05Z

The sched.barrier takes a bit mask that determines which instruction categories are allowed to cross the inserted sched.barrier during the igrouplp scheduling pass.

Currently, a set ALU bit results in allowing all ALU instructions to move across the barrier, independent of whether more specific bits have been specified.
The documentation is silent about the semantics in that case.

This PR changes the current handling: When a mask contains both a set ALU bit and a set more-specific bit, the more specific bit is respected and the ALU bit does not imply all other bits.

Current:
0x00000005 -- 0101 set ALU and SALU bit. Currently the ALU bit implies
SALU and VALU and MFMA to be set.

New:
0x00000005 -- 0101 set ALU and SALU bit. SALU bit set, therefore ALU bit
is ignored and only SALU bit is considered.

The sched.barrier takes a bit mask that determines which instruction categories are allowed to cross the inserted sched.barrier during the igrouplp scheduling pass. Currently, a set ALU bit results in allowing all ALU instructions to move across the barrier, independent of whether more specific bits have been specified. The documentation is silent about the semantics in that case. This PR changes the current handling: When a mask contains both a set ALU bit and a set more-specific bit, the more specific bit is respected and the ALU bit does *not imply* all other bits. Current: 0x00000005 -- 0101 set ALU and SALU bit. Currently the ALU bit implies SALU and VALU and MFMA to be set. New: 0x00000005 -- 0101 set ALU and SALU bit. SALU bit set, therefore ALU bit is ignored and only SALU bit is considered.

llvmbot · 2025-09-25T21:36:40Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jan Patrick Lehr (jplehr)

Changes

The sched.barrier takes a bit mask that determines which instruction categories are allowed to cross the inserted sched.barrier during the igrouplp scheduling pass.

Currently, a set ALU bit results in allowing all ALU instructions to move across the barrier, independent of whether more specific bits have been specified.
The documentation is silent about the semantics in that case.

This PR changes the current handling: When a mask contains both a set ALU bit and a set more-specific bit, the more specific bit is respected and the ALU bit does not imply all other bits.

Current:
0x00000005 -- 0101 set ALU and SALU bit. Currently the ALU bit implies
SALU and VALU and MFMA to be set.

New:
0x00000005 -- 0101 set ALU and SALU bit. SALU bit set, therefore ALU bit
is ignored and only SALU bit is considered.

Full diff: https://github.com/llvm/llvm-project/pull/160782.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp (+7-1)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
index dbe74b1b08f8c..edb43627dd51e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
@@ -2615,8 +2615,14 @@ IGroupLPDAGMutation::invertSchedBarrierMask(SchedGroupMask Mask) const {
   // allowed past the SCHED_BARRIER.
   SchedGroupMask InvertedMask = ~Mask;
 
+  // When given, specific bits overrule the more general ALU type.
+  bool HasConcreteClassSpecified =
+      (Mask & (SchedGroupMask::SALU | SchedGroupMask::VALU |
+               SchedGroupMask::MFMA)) != SchedGroupMask::NONE;
+
   // ALU implies VALU, SALU, MFMA, TRANS.
-  if ((InvertedMask & SchedGroupMask::ALU) == SchedGroupMask::NONE)
+  if (!HasConcreteClassSpecified &&
+      (InvertedMask & SchedGroupMask::ALU) == SchedGroupMask::NONE)
     InvertedMask &= ~SchedGroupMask::VALU & ~SchedGroupMask::SALU &
                     ~SchedGroupMask::MFMA & ~SchedGroupMask::TRANS;
   // VALU, SALU, MFMA, TRANS implies ALU.

jplehr · 2025-09-25T21:37:02Z

I am aware that this needs a/several test(s) and I would mostly like to get some feedback on whether this is an acceptable approach.

hidekisaito · 2025-10-03T16:25:26Z

LGTM. This is a compatibility breaking (in perf sense) change. Please plan to clarify in https://llvm.org/docs/AMDGPUUsage.html, saying interpretation has changed.

hidekisaito

LGTM

jrbyrnes · 2025-10-03T17:02:15Z

I think that if we want to do this, we need to be consistent across all different masks. For example, why do we constrain for SALU if our mask has both SALU and ALU, but we don't constrain for VMEM_READ if we have both VMEM_READ and VMEM.

Also, I'm not sure what the intended use case is here.

Needs tests.

jplehr · 2025-10-04T12:27:53Z

I think that if we want to do this, we need to be consistent across all different masks. For example, why do we constrain for SALU if our mask has both SALU and ALU, but we don't constrain for VMEM_READ if we have both VMEM_READ and VMEM.

Excellent points. I agree that it should be handled consistently. If you are ok with the direction that this change is going, then I can go ahead and do the changes and clarifications in the documents.

Also, I'm not sure what the intended use case is here.

The intended use is coming from Triton where I think they want to set everything and then selectively unset certain bits, e.g., MFMA.

I think the overall expectation is: If you specify "everything" (ALU bit) but then want to selectively unset something, the latter should take precedence.

arsenm

Needs tests

jplehr requested review from jrbyrnes and kerbowa September 25, 2025 21:36

llvmbot added the backend:AMDGPU label Sep 25, 2025

jplehr requested review from bcahoon and hidekisaito September 30, 2025 21:38

hidekisaito approved these changes Oct 3, 2025

View reviewed changes

arsenm requested changes Oct 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Enable overwrite ALU bit in sched.barrier mask #160782

[AMDGPU] Enable overwrite ALU bit in sched.barrier mask #160782

Uh oh!

jplehr commented Sep 25, 2025

Uh oh!

llvmbot commented Sep 25, 2025

Uh oh!

jplehr commented Sep 25, 2025

Uh oh!

hidekisaito commented Oct 3, 2025

Uh oh!

hidekisaito left a comment

Uh oh!

jrbyrnes commented Oct 3, 2025

Uh oh!

jplehr commented Oct 4, 2025

Uh oh!

arsenm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AMDGPU] Enable overwrite ALU bit in sched.barrier mask #160782

Are you sure you want to change the base?

[AMDGPU] Enable overwrite ALU bit in sched.barrier mask #160782

Uh oh!

Conversation

jplehr commented Sep 25, 2025

Uh oh!

llvmbot commented Sep 25, 2025

Uh oh!

jplehr commented Sep 25, 2025

Uh oh!

hidekisaito commented Oct 3, 2025

Uh oh!

hidekisaito left a comment

Choose a reason for hiding this comment

Uh oh!

jrbyrnes commented Oct 3, 2025

Uh oh!

jplehr commented Oct 4, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants