Allow Specifying SGMasks for Inline Asm #155491

linuxrocks123 · 2025-08-26T20:24:04Z

Addresses SWDEV-549227

llvmbot · 2025-08-26T20:24:36Z

@llvm/pr-subscribers-backend-amdgpu

Author: Patrick Simmons (linuxrocks123)

Changes

Addresses SWDEV-549227

Full diff: https://github.com/llvm/llvm-project/pull/155491.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp (+10)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
index dbe74b1b08f8c..8c514714bd7dd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
@@ -2391,6 +2391,16 @@ bool SchedGroup::canAddMI(const MachineInstr &MI) const {
   if (MI.isMetaInstruction())
     Result = false;
 
+  else if (MI.isInlineAsm()) {
+    std::string Text = MI.getOperand(0).getSymbolName();
+    if (Text.find("SGMASK:") != std::string::npos) {
+      Text = Text.substr(Text.find("SGMASK:") + strlen("SGMASK:"));
+      Text = Text.substr(0, Text.find_first_of(" \t\r\n"));
+      unsigned long InlineAsmMask = std::stoul(Text, nullptr, 0);
+      Result = ((unsigned long)SGMask & InlineAsmMask) != 0;
+    }
+  }
+
   else if (((SGMask & SchedGroupMask::ALU) != SchedGroupMask::NONE) &&
            (TII->isVALU(MI) || TII->isMFMAorWMMA(MI) || TII->isSALU(MI) ||
             TII->isTRANS(MI)))

linuxrocks123 · 2025-08-26T20:40:39Z

This PR allows the user to specify sched group barrier information for inline asm instructions. The mechanism for doing so is adding a comment with the text "SGMASK:" inside the inline asm string. The contents of "" should be the schedule barrier groups you want to group the inline asm with. So, for example, if your inline asm is a VALU instruction, you would have an inline asm statement that looks like this:

asm ("v_add_f32_e32 %[result], %[lhs], %[rhs] ; SGMASK:0x2"
             : [result] "=v"(result)
             : [lhs] "v"(lhs), [rhs] "v"(rhs));

Because of the "SGMASK:0x2", sched group barrier instructions will now be able to recognize that the inline asm instruction is a VALU instruction and schedule it accordingly.

arsenm

Needs tests. Also don't use c string functions, StringRef has everything on it directly.

I also don't love the idea of parsing something special out of inline assembly. The general way to handle inline asm is answer yes for every question, so just if (isInlineAsm()) return true. If you want to try to be more refined than that, you can try to guess at the contents by the register constraints before dropping to scraping the asm contents

linuxrocks123 · 2025-08-28T16:23:37Z

@arsenm Thanks, I switched to using StringRef.

Regarding the parsing of the inline assembly, the goal of this PR is to allow the user to optionally specify the schedule group mask for a specific inline assembly instruction to allow schedule group barriers to work correctly with that instruction. Parsing a special token out of the inline assembly is the only reasonable mechanism I currently see for accomplishing that, but I am open to suggestions.

jplehr

Same question for the other attributes and metadata.

jplehr · 2025-09-04T11:26:31Z

llvm/test/CodeGen/AMDGPU/inlineasm-sgmask.ll

+@llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_bffb86447932ec40 to ptr)], section "llvm.metadata"
+@__hip_cuid_bffb86447932ec40 = addrspace(1) global i8 0


Is this needed for the test to work?

arsenm · 2025-09-04T12:01:02Z

@arsenm Thanks, I switched to using StringRef.

Regarding the parsing of the inline assembly, the goal of this PR is to allow the user to optionally specify the schedule group mask for a specific inline assembly instruction to allow schedule group barriers to work correctly with that instruction. Parsing a special token out of the inline assembly is the only reasonable mechanism I currently see for accomplishing that, but I am open to suggestions.

I am going to outright reject the concept of trying to parse the content of inline assembly. I am firmly in the camp of wont fixing any performance issue involving inline assembly

arsenm · 2025-09-05T12:00:36Z

#157080 also proposes codifying not doing things like this

linuxrocks123 · 2025-09-05T23:46:37Z

@arsenm, I don't think we're giving users sufficient alternative methods of controlling code generation such that they would be expected not to see a need for using inline assembly as a tool to improve performance.

Instead of this, we could add __builtin_sched_group_barrier_override_sgmask() with the semantics that a nonzero argument will override the sgmask for any instructions found prior to it. A call with 0 could clear the custom mask. That way, we're both giving users more control and not directly adding any code to support inline assembly.

arsenm · 2025-09-06T02:59:48Z

Instead of this, we could add __builtin_sched_group_barrier_override_sgmask() with the semantics that a nonzero argument will override the sgmask for any instructions found prior to it. A call with 0 could clear the custom mask. That way, we're both giving users more control and not directly adding any code to support inline assembly.

All of these built-ins are a hack, and that's adding a hack on top of the hack. All of this stuff is a development resource sink we have to maintain forever and it's not worth it.

Did you try just interpreting inline asm as matching all classes of instructions, or filtering based on whether there are VGPR or SGPR operands?

linuxrocks123 · 2025-09-07T08:58:50Z

Instead of this, we could add __builtin_sched_group_barrier_override_sgmask() with the semantics that a nonzero argument will override the sgmask for any instructions found prior to it. A call with 0 could clear the custom mask. That way, we're both giving users more control and not directly adding any code to support inline assembly.

All of these built-ins are a hack, and that's adding a hack on top of the hack. All of this stuff is a development resource sink we have to maintain forever and it's not worth it.

Did you try just interpreting inline asm as matching all classes of instructions, or filtering based on whether there are VGPR or SGPR operands?

I haven't tried testing the inline assembly operands. I would be hesitant to do that because, although you know the architecture better than I do, I can't imagine every assembly instruction that uses a VGPR is a VALU instruction, and "there's wrong behavior" seems worse than "there's no way to express what I want."

The original user request was to have inline asm match only its own new sgmask bit, which is similar in expressiveness to having it match all classes of instructions. That approach has the advantage of not possibly changing the scheduling behavior of any existing code that uses sched group barriers, which having it match all classes of instructions could do. Although I feel the current PR provides maximum expressiveness, since the user is the one that suggested just adding a new bit to sgmask for inline assembly, doing that would almost certainly be expressive enough for his needs and would avoid the need to parse the inline assembly string.

An implementation of the user's original request is in this PR's history at 6f45055. I can revert this PR back to that commit and then add the tests back if we decide to go that route.

arsenm · 2025-09-15T10:41:51Z

I haven't tried testing the inline assembly operands. I would be hesitant to do that because, although you know the architecture better than I do, I can't imagine every assembly instruction that uses a VGPR is a VALU instruction,

This is the case. There are 0 SALU instructions that can read or write a VGPR. The converse is almost true too; VALU instructions cannot write VGPRs, with the special cases of boolean flags, and readlane/writelane. But you don't need to have a perfect answer, it's still a scheduling heuristic decision.

and "there's wrong behavior" seems worse than "there's no way to express what I want."

There cannot be wrong behavior in scheduling. The scheduling intrinsics can at most be interpreted as an optimization hint; optimizations cannot be semantics.

I can revert this PR back to that commit and then add the tests back if we decide to go that route.

Inline asm really is not a tool for improving performance and should not be treated as such. Inline assembly is a hard break of optimizations of the compiler. If anything we should be de-optimizing it harder (i.e. we could be conservatively resolving more hazards that exist inside inline assembly than we do today)

arsenm · 2025-09-15T16:22:02Z

To make my position perfectly explicit, I think this should be abandoned. The asm constraints suggested in #157080 should also not be pursued

linuxrocks123 · 2025-09-16T04:19:57Z

@arsenm,

There cannot be wrong behavior in scheduling. The scheduling intrinsics can at most be interpreted as an optimization hint; optimizations cannot be semantics.

By "wrong", I meant wrong from an optimal scheduling point of view, not correctness.

This is the case. There are 0 SALU instructions that can read or write a VGPR. The converse is almost true too; VALU instructions cannot write VGPRs, with the special cases of boolean flags, and readlane/writelane. But you don't need to have a perfect answer, it's still a scheduling heuristic decision.

That sounds like a promising approach, in that case.

To make my position perfectly explicit, I think this should be abandoned. The asm constraints suggested in #157080 should also not be pursued

~~Would you be willing to remove your block if I pursued the approach of guessing the instruction type based on the operand constraints?~~

Edit: Po Yen is explaining on the internal issue tracker why guessing the instruction type based on operand constraints will not work.

linuxrocks123 · 2025-10-01T20:28:09Z

Hi @arsenm, thanks for your help reviewing this. I made some changes to the code to try to implement the constraint inference approach. Would you please let me know what you think?

Also, the testcase I added generates a0 instead of a[0-1] which I think is correct. Do you know how to fix that?

linuxrocks123 · 2025-10-01T20:29:02Z

@arsenm I intend to add one testcase per sgmask category, but I wanted to get the code solid first.

llvmbot added the backend:AMDGPU label Aug 26, 2025

arsenm requested changes Aug 27, 2025

View reviewed changes

linuxrocks123 force-pushed the swdev-549227 branch from 3286912 to f0d70b2 Compare August 29, 2025 17:16

jplehr reviewed Sep 4, 2025

View reviewed changes

linuxrocks123 mentioned this pull request Sep 7, 2025

[LangRef] inline asm: the instructions are treated opaquely #157080

Open

linuxrocks123 added 5 commits October 1, 2025 12:45

Attempt to add inline asm to sched group barriers

7af6d4c

Allow specifying sched group barrier masks for inline asm

7b33a53

Switch to StringRef

238b3fa

Add testcase

6db71f7

Guess the constraints instead of using user-provided hints

43d23ab

linuxrocks123 force-pushed the swdev-549227 branch from 415566a to 43d23ab Compare October 1, 2025 20:25

linuxrocks123 requested a review from arsenm October 1, 2025 20:26

		@llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_bffb86447932ec40 to ptr)], section "llvm.metadata"
		@__hip_cuid_bffb86447932ec40 = addrspace(1) global i8 0

Allow Specifying SGMasks for Inline Asm #155491

Are you sure you want to change the base?

Allow Specifying SGMasks for Inline Asm #155491

Conversation

linuxrocks123 commented Aug 26, 2025

Uh oh!

llvmbot commented Aug 26, 2025

Uh oh!

linuxrocks123 commented Aug 26, 2025

Uh oh!

arsenm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linuxrocks123 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jplehr left a comment

Choose a reason for hiding this comment

Uh oh!

jplehr Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm commented Sep 4, 2025

Uh oh!

arsenm commented Sep 5, 2025

Uh oh!

linuxrocks123 commented Sep 5, 2025

Uh oh!

arsenm commented Sep 6, 2025

Uh oh!

linuxrocks123 commented Sep 7, 2025

Uh oh!

arsenm commented Sep 15, 2025

Uh oh!

arsenm commented Sep 15, 2025

Uh oh!

linuxrocks123 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linuxrocks123 commented Oct 1, 2025

Uh oh!

linuxrocks123 commented Oct 1, 2025

Uh oh!

Uh oh!

arsenm left a comment •

edited

Loading

linuxrocks123 commented Aug 28, 2025 •

edited

Loading

linuxrocks123 commented Sep 16, 2025 •

edited

Loading