-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Allow Specifying SGMasks for Inline Asm #155491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Patrick Simmons (linuxrocks123) ChangesAddresses SWDEV-549227 Full diff: https://github.com/llvm/llvm-project/pull/155491.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
index dbe74b1b08f8c..8c514714bd7dd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
@@ -2391,6 +2391,16 @@ bool SchedGroup::canAddMI(const MachineInstr &MI) const {
if (MI.isMetaInstruction())
Result = false;
+ else if (MI.isInlineAsm()) {
+ std::string Text = MI.getOperand(0).getSymbolName();
+ if (Text.find("SGMASK:") != std::string::npos) {
+ Text = Text.substr(Text.find("SGMASK:") + strlen("SGMASK:"));
+ Text = Text.substr(0, Text.find_first_of(" \t\r\n"));
+ unsigned long InlineAsmMask = std::stoul(Text, nullptr, 0);
+ Result = ((unsigned long)SGMask & InlineAsmMask) != 0;
+ }
+ }
+
else if (((SGMask & SchedGroupMask::ALU) != SchedGroupMask::NONE) &&
(TII->isVALU(MI) || TII->isMFMAorWMMA(MI) || TII->isSALU(MI) ||
TII->isTRANS(MI)))
|
This PR allows the user to specify sched group barrier information for inline asm instructions. The mechanism for doing so is adding a comment with the text "SGMASK:" inside the inline asm string. The contents of "" should be the schedule barrier groups you want to group the inline asm with. So, for example, if your inline asm is a VALU instruction, you would have an inline asm statement that looks like this:
Because of the "SGMASK:0x2", sched group barrier instructions will now be able to recognize that the inline asm instruction is a VALU instruction and schedule it accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs tests. Also don't use c string functions, StringRef has everything on it directly.
I also don't love the idea of parsing something special out of inline assembly. The general way to handle inline asm is answer yes for every question, so just if (isInlineAsm()) return true
. If you want to try to be more refined than that, you can try to guess at the contents by the register constraints before dropping to scraping the asm contents
@arsenm Thanks, I switched to using Regarding the parsing of the inline assembly, the goal of this PR is to allow the user to optionally specify the schedule group mask for a specific inline assembly instruction to allow schedule group barriers to work correctly with that instruction. Parsing a special token out of the inline assembly is the only reasonable mechanism I currently see for accomplishing that, but I am open to suggestions. |
3286912
to
f0d70b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for the other attributes and metadata.
@llvm.compiler.used = appending addrspace(1) global [1 x ptr] [ptr addrspacecast (ptr addrspace(1) @__hip_cuid_bffb86447932ec40 to ptr)], section "llvm.metadata" | ||
@__hip_cuid_bffb86447932ec40 = addrspace(1) global i8 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for the test to work?
I am going to outright reject the concept of trying to parse the content of inline assembly. I am firmly in the camp of wont fixing any performance issue involving inline assembly |
#157080 also proposes codifying not doing things like this |
@arsenm, I don't think we're giving users sufficient alternative methods of controlling code generation such that they would be expected not to see a need for using inline assembly as a tool to improve performance. Instead of this, we could add |
All of these built-ins are a hack, and that's adding a hack on top of the hack. All of this stuff is a development resource sink we have to maintain forever and it's not worth it. Did you try just interpreting inline asm as matching all classes of instructions, or filtering based on whether there are VGPR or SGPR operands? |
I haven't tried testing the inline assembly operands. I would be hesitant to do that because, although you know the architecture better than I do, I can't imagine every assembly instruction that uses a VGPR is a VALU instruction, and "there's wrong behavior" seems worse than "there's no way to express what I want." The original user request was to have inline asm match only its own new sgmask bit, which is similar in expressiveness to having it match all classes of instructions. That approach has the advantage of not possibly changing the scheduling behavior of any existing code that uses sched group barriers, which having it match all classes of instructions could do. Although I feel the current PR provides maximum expressiveness, since the user is the one that suggested just adding a new bit to sgmask for inline assembly, doing that would almost certainly be expressive enough for his needs and would avoid the need to parse the inline assembly string. An implementation of the user's original request is in this PR's history at 6f45055. I can revert this PR back to that commit and then add the tests back if we decide to go that route. |
This is the case. There are 0 SALU instructions that can read or write a VGPR. The converse is almost true too; VALU instructions cannot write VGPRs, with the special cases of boolean flags, and readlane/writelane. But you don't need to have a perfect answer, it's still a scheduling heuristic decision.
There cannot be wrong behavior in scheduling. The scheduling intrinsics can at most be interpreted as an optimization hint; optimizations cannot be semantics.
Inline asm really is not a tool for improving performance and should not be treated as such. Inline assembly is a hard break of optimizations of the compiler. If anything we should be de-optimizing it harder (i.e. we could be conservatively resolving more hazards that exist inside inline assembly than we do today) |
To make my position perfectly explicit, I think this should be abandoned. The asm constraints suggested in #157080 should also not be pursued |
By "wrong", I meant wrong from an optimal scheduling point of view, not correctness.
That sounds like a promising approach, in that case.
Edit: Po Yen is explaining on the internal issue tracker why guessing the instruction type based on operand constraints will not work. |
415566a
to
43d23ab
Compare
Hi @arsenm, thanks for your help reviewing this. I made some changes to the code to try to implement the constraint inference approach. Would you please let me know what you think? Also, the testcase I added generates a0 instead of a[0-1] which I think is correct. Do you know how to fix that? |
@arsenm I intend to add one testcase per sgmask category, but I wanted to get the code solid first. |
Addresses SWDEV-549227