-
Notifications
You must be signed in to change notification settings - Fork 15.1k
CodeGen: Handle bundled instructions in two-address-instructions pass #166212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-amdgpu Author: Nicolai Hähnle (nhaehnle) ChangesIf the instruction with tied operands is a BUNDLE instruction and we The test case is degenerate in the sense that the bundle only contains a Stack:
Full diff: https://github.com/llvm/llvm-project/pull/166212.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
index 414e414738b71..1f816b94cf56b 100644
--- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
+++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
@@ -1665,6 +1665,22 @@ void TwoAddressInstructionImpl::processTiedPairs(MachineInstr *MI,
// by SubRegB is compatible with RegA with no subregister. So regardless of
// whether the dest oper writes a subreg, the source oper should not.
MO.setSubReg(0);
+
+ // Update uses of RegB to uses of RegA inside the bundle.
+ if (MI->isBundle()) {
+ for (MachineInstr *InnerMI = MI; InnerMI->isBundledWithSucc();) {
+ InnerMI = InnerMI->getNextNode();
+
+ for (MachineOperand &MO : InnerMI->all_uses()) {
+ if (MO.isReg() && MO.getReg() == RegB) {
+ assert(
+ MO.getSubReg() == 0 &&
+ "tied subregister uses in bundled instructions not supported");
+ MO.setReg(RegA);
+ }
+ }
+ }
+ }
}
if (AllUsesCopied) {
diff --git a/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir b/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir
new file mode 100644
index 0000000000000..696962a88c8b8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir
@@ -0,0 +1,57 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1200 %s --passes=two-address-instruction -verify-each -o - | FileCheck --check-prefixes=GCN %s
+
+# Exercise very basic handling of BUNDLE'd instructions by the two-address-instruction pass.
+
+# This test is an example where it is best to keep the two-address instruction
+# and resolve the tie with a COPY that is expected to be coalesced.
+---
+name: test_fmac_bundle
+body: |
+ bb.0:
+
+ ; GCN-LABEL: name: test_fmac_bundle
+ ; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+ ; GCN-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[COPY1]], 0, implicit $exec
+ ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[V_ADD_U32_e64_]]
+ ; GCN-NEXT: BUNDLE implicit-def [[COPY2]], implicit [[DEF]], implicit [[DEF1]], implicit [[COPY2]](tied-def 0), implicit $mode, implicit $exec {
+ ; GCN-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = V_FMAC_F32_e32 killed [[DEF]], killed [[DEF1]], killed [[COPY2]], implicit $mode, implicit $exec
+ ; GCN-NEXT: }
+ %10:vgpr_32 = COPY $vgpr0
+ %11:vgpr_32 = COPY $vgpr1
+ %2:vgpr_32 = V_ADD_U32_e64 %10, %11, 0, implicit $exec
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = IMPLICIT_DEF
+ BUNDLE implicit-def %3:vgpr_32, implicit %0, implicit %1, implicit killed %2(tied-def 0), implicit $mode, implicit $exec {
+ %3:vgpr_32 = V_FMAC_F32_e32 killed %0, killed %1, killed %2, implicit $mode, implicit $exec
+ }
+
+...
+
+# This test is an example where conversion to three-address form would be beneficial.
+---
+name: test_fmac_reuse_bundle
+body: |
+ bb.0:
+
+ ; GCN-LABEL: name: test_fmac_reuse_bundle
+ ; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
+ ; GCN-NEXT: BUNDLE implicit-def [[COPY1]], implicit [[DEF]], implicit [[DEF1]], implicit [[COPY1]](tied-def 0), implicit $mode, implicit $exec {
+ ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = V_FMAC_F32_e32 killed [[DEF]], killed [[DEF1]], killed [[COPY1]], implicit $mode, implicit $exec
+ ; GCN-NEXT: }
+ ; GCN-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY1]], [[COPY]], 0, implicit $exec
+ %2:vgpr_32 = COPY $vgpr0
+ %0:vgpr_32 = IMPLICIT_DEF
+ %1:vgpr_32 = IMPLICIT_DEF
+ BUNDLE implicit-def %3:vgpr_32, implicit %0, implicit %1, implicit %2(tied-def 0), implicit $mode, implicit $exec {
+ %3:vgpr_32 = V_FMAC_F32_e32 killed %0, killed %1, killed %2, implicit $mode, implicit $exec
+ }
+ %4:vgpr_32 = V_ADD_U32_e64 %3, %2, 0, implicit $exec
+
+...
|
883e563 to
282ab76
Compare
c3abece to
0470587
Compare
282ab76 to
4af0353
Compare
|
|
||
| // Update uses of RegB to uses of RegA inside the bundle. | ||
| if (MI->isBundle()) { | ||
| for (MachineInstr *InnerMI = MI; InnerMI->isBundledWithSucc();) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mi_bundle_ops?
0470587 to
f40a20f
Compare
4af0353 to
97427fa
Compare
jayfoad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered the case where the instructions inside the bundle have two uses of RegB, but only one of them is tied with RegA? I think it is almost impossible to handle that optimally given only the summarised information that you get from the operands of the BUNDLE. It might be worth adding a test case like that, just to check that we don't crash and still generate well formed MIR.
The fundamental question here is, can processTiedPairs really operate at the BUNDLE level (and then fix up the instructions inside)? Or is it going to have to operate on the individual instructions (and then fix up the summary information on the BUNDLE)?
| assert( | ||
| MO.getSubReg() == 0 && | ||
| "tied subregister uses in bundled instructions not supported"); | ||
| MO.setReg(RegA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-bundle code replaces a use of RegB.SubRegB with RegA, so maybe it makes more sense to do exactly the same here?
| assert( | |
| MO.getSubReg() == 0 && | |
| "tied subregister uses in bundled instructions not supported"); | |
| MO.setReg(RegA); | |
| assert( | |
| MO.getSubReg() == SubRegB && | |
| "tied subregister uses in bundled instructions not supported"); | |
| MO.setReg(RegA); | |
| MO.setSubReg(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with this but it got very confusing to me even in the restricted use case that I described in the other comment. I'd prefer to just keep this more restrictive for now, and if we find later on that supporting subregisters is beneficial we relax it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe assert that MO.getSubReg() == 0 && SubRegB == 0? Otherwise this code will replace a use of one reg with a use of another reg with a different size.
f40a20f to
4e20232
Compare
97427fa to
1224dba
Compare
Yes, I thought about it. I did not find a good answer to what tied operands inside of a pre-RA bundle really mean in the general case. Bundles mean different things to different people. The main use of bundles outside of AMDGPU is for VLIW. In AMDGPU so far, it is used for memory clauses (which could potentially have tied operands for atomicrmw, but we only form them post-RA) and for a few niche cases like keeping S_GETPC_B64 together with its uses for PC-relative addressing. What we're working towards here is a new pre-RA use case that can be vaguely described as "decomposing a single instruction into virtual micro-ops during codegen for the purpose of avoiding combinatorial explosion in opcodes etc.". For that use case, the requirements on tied operand support will be fairly restricted, and so I'd rather make this change more conservative and restrictive and not attempt to support something that we don't actually use and don't know how to test properly. And then build on that if and when we actually know what else might be needed. |
1224dba to
b65555b
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
jayfoad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK on the understanding that it is slightly experimental, and after some more experience we may need to change things and/or nail down the exact rules for what cases are and are not supported.
b65555b to
3eb0dc1
Compare
If the instruction with tied operands is a BUNDLE instruction and we handle it by replacing an operand, then we need to update the corresponding internal operands as well. Otherwise, the resulting MIR is invalid. The test case is degenerate in the sense that the bundle only contains a single instruction, but it is sufficient to exercise this issue. commit-id:6760a9b7
3eb0dc1 to
f9253b6
Compare
If the instruction with tied operands is a BUNDLE instruction and we
handle it by replacing an operand, then we need to update the
corresponding internal operands as well. Otherwise, the resulting MIR is
invalid.
The test case is degenerate in the sense that the bundle only contains a
single instruction, but it is sufficient to exercise this issue.
Stack: