Skip to content

Conversation

@nhaehnle
Copy link
Collaborator

@nhaehnle nhaehnle commented Nov 3, 2025

If the instruction with tied operands is a BUNDLE instruction and we
handle it by replacing an operand, then we need to update the
corresponding internal operands as well. Otherwise, the resulting MIR is
invalid.

The test case is degenerate in the sense that the bundle only contains a
single instruction, but it is sufficient to exercise this issue.


Stack:

⚠️ Part of a stack created by spr. Merging this PR using the GitHub UI may have unexpected results.

@llvmbot
Copy link
Member

llvmbot commented Nov 3, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Nicolai Hähnle (nhaehnle)

Changes

If the instruction with tied operands is a BUNDLE instruction and we
handle it by replacing an operand, then we need to update the
corresponding internal operands as well. Otherwise, the resulting MIR is
invalid.

The test case is degenerate in the sense that the bundle only contains a
single instruction, but it is sufficient to exercise this issue.


Stack:

  • [5/5] #166213
  • [4/5] #166212 ⬅
  • [3/5] #166211
  • [2/5] #166210
  • [1/5] #166209

⚠️ Part of a stack created by spr. Merging this PR using the GitHub UI may have unexpected results.


Full diff: https://github.com/llvm/llvm-project/pull/166212.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/TwoAddressInstructionPass.cpp (+16)
  • (added) llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir (+57)
diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
index 414e414738b71..1f816b94cf56b 100644
--- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
+++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
@@ -1665,6 +1665,22 @@ void TwoAddressInstructionImpl::processTiedPairs(MachineInstr *MI,
     // by SubRegB is compatible with RegA with no subregister. So regardless of
     // whether the dest oper writes a subreg, the source oper should not.
     MO.setSubReg(0);
+
+    // Update uses of RegB to uses of RegA inside the bundle.
+    if (MI->isBundle()) {
+      for (MachineInstr *InnerMI = MI; InnerMI->isBundledWithSucc();) {
+        InnerMI = InnerMI->getNextNode();
+
+        for (MachineOperand &MO : InnerMI->all_uses()) {
+          if (MO.isReg() && MO.getReg() == RegB) {
+            assert(
+                MO.getSubReg() == 0 &&
+                "tied subregister uses in bundled instructions not supported");
+            MO.setReg(RegA);
+          }
+        }
+      }
+    }
   }
 
   if (AllUsesCopied) {
diff --git a/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir b/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir
new file mode 100644
index 0000000000000..696962a88c8b8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir
@@ -0,0 +1,57 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1200 %s --passes=two-address-instruction -verify-each -o - | FileCheck --check-prefixes=GCN %s
+
+# Exercise very basic handling of BUNDLE'd instructions by the two-address-instruction pass.
+
+# This test is an example where it is best to keep the two-address instruction
+# and resolve the tie with a COPY that is expected to be coalesced.
+---
+name:            test_fmac_bundle
+body:             |
+  bb.0:
+
+    ; GCN-LABEL: name: test_fmac_bundle
+    ; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[COPY1]], 0, implicit $exec
+    ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[V_ADD_U32_e64_]]
+    ; GCN-NEXT: BUNDLE implicit-def [[COPY2]], implicit [[DEF]], implicit [[DEF1]], implicit [[COPY2]](tied-def 0), implicit $mode, implicit $exec {
+    ; GCN-NEXT:   [[COPY2:%[0-9]+]]:vgpr_32 = V_FMAC_F32_e32 killed [[DEF]], killed [[DEF1]], killed [[COPY2]], implicit $mode, implicit $exec
+    ; GCN-NEXT: }
+    %10:vgpr_32 = COPY $vgpr0
+    %11:vgpr_32 = COPY $vgpr1
+    %2:vgpr_32 = V_ADD_U32_e64 %10, %11, 0, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = IMPLICIT_DEF
+    BUNDLE implicit-def %3:vgpr_32, implicit %0, implicit %1, implicit killed %2(tied-def 0), implicit $mode, implicit $exec {
+      %3:vgpr_32 = V_FMAC_F32_e32 killed %0, killed %1, killed %2, implicit $mode, implicit $exec
+    }
+
+...
+
+# This test is an example where conversion to three-address form would be beneficial.
+---
+name:            test_fmac_reuse_bundle
+body:             |
+  bb.0:
+
+    ; GCN-LABEL: name: test_fmac_reuse_bundle
+    ; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
+    ; GCN-NEXT: BUNDLE implicit-def [[COPY1]], implicit [[DEF]], implicit [[DEF1]], implicit [[COPY1]](tied-def 0), implicit $mode, implicit $exec {
+    ; GCN-NEXT:   [[COPY1:%[0-9]+]]:vgpr_32 = V_FMAC_F32_e32 killed [[DEF]], killed [[DEF1]], killed [[COPY1]], implicit $mode, implicit $exec
+    ; GCN-NEXT: }
+    ; GCN-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY1]], [[COPY]], 0, implicit $exec
+    %2:vgpr_32 = COPY $vgpr0
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = IMPLICIT_DEF
+    BUNDLE implicit-def %3:vgpr_32, implicit %0, implicit %1, implicit %2(tied-def 0), implicit $mode, implicit $exec {
+      %3:vgpr_32 = V_FMAC_F32_e32 killed %0, killed %1, killed %2, implicit $mode, implicit $exec
+    }
+    %4:vgpr_32 = V_ADD_U32_e64 %3, %2, 0, implicit $exec
+
+...

@nhaehnle nhaehnle changed the base branch from users/nhaehnle/spr/main/667859fc to main November 4, 2025 18:59
@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 883e563 to 282ab76 Compare November 4, 2025 18:59
@nhaehnle nhaehnle changed the base branch from main to users/nhaehnle/spr/main/667859fc November 4, 2025 18:59
@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/667859fc branch from c3abece to 0470587 Compare November 5, 2025 01:43
@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 282ab76 to 4af0353 Compare November 5, 2025 01:43

// Update uses of RegB to uses of RegA inside the bundle.
if (MI->isBundle()) {
for (MachineInstr *InnerMI = MI; InnerMI->isBundledWithSucc();) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use mi_bundle_ops?

@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/667859fc branch from 0470587 to f40a20f Compare November 5, 2025 06:15
@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 4af0353 to 97427fa Compare November 5, 2025 06:15
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered the case where the instructions inside the bundle have two uses of RegB, but only one of them is tied with RegA? I think it is almost impossible to handle that optimally given only the summarised information that you get from the operands of the BUNDLE. It might be worth adding a test case like that, just to check that we don't crash and still generate well formed MIR.

The fundamental question here is, can processTiedPairs really operate at the BUNDLE level (and then fix up the instructions inside)? Or is it going to have to operate on the individual instructions (and then fix up the summary information on the BUNDLE)?

Comment on lines 1676 to 1679
assert(
MO.getSubReg() == 0 &&
"tied subregister uses in bundled instructions not supported");
MO.setReg(RegA);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-bundle code replaces a use of RegB.SubRegB with RegA, so maybe it makes more sense to do exactly the same here?

Suggested change
assert(
MO.getSubReg() == 0 &&
"tied subregister uses in bundled instructions not supported");
MO.setReg(RegA);
assert(
MO.getSubReg() == SubRegB &&
"tied subregister uses in bundled instructions not supported");
MO.setReg(RegA);
MO.setSubReg(0);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with this but it got very confusing to me even in the restricted use case that I described in the other comment. I'd prefer to just keep this more restrictive for now, and if we find later on that supporting subregisters is beneficial we relax it separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe assert that MO.getSubReg() == 0 && SubRegB == 0? Otherwise this code will replace a use of one reg with a use of another reg with a different size.

@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/667859fc branch from f40a20f to 4e20232 Compare November 6, 2025 16:12
@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 97427fa to 1224dba Compare November 6, 2025 16:12
@nhaehnle
Copy link
Collaborator Author

nhaehnle commented Nov 6, 2025

Have you considered the case where the instructions inside the bundle have two uses of RegB, but only one of them is tied with RegA? I think it is almost impossible to handle that optimally given only the summarised information that you get from the operands of the BUNDLE. It might be worth adding a test case like that, just to check that we don't crash and still generate well formed MIR.

The fundamental question here is, can processTiedPairs really operate at the BUNDLE level (and then fix up the instructions inside)? Or is it going to have to operate on the individual instructions (and then fix up the summary information on the BUNDLE)?

Yes, I thought about it. I did not find a good answer to what tied operands inside of a pre-RA bundle really mean in the general case. Bundles mean different things to different people. The main use of bundles outside of AMDGPU is for VLIW. In AMDGPU so far, it is used for memory clauses (which could potentially have tied operands for atomicrmw, but we only form them post-RA) and for a few niche cases like keeping S_GETPC_B64 together with its uses for PC-relative addressing.

What we're working towards here is a new pre-RA use case that can be vaguely described as "decomposing a single instruction into virtual micro-ops during codegen for the purpose of avoiding combinatorial explosion in opcodes etc.". For that use case, the requirements on tied operand support will be fairly restricted, and so I'd rather make this change more conservative and restrictive and not attempt to support something that we don't actually use and don't know how to test properly. And then build on that if and when we actually know what else might be needed.

@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 1224dba to b65555b Compare November 6, 2025 17:09
@github-actions
Copy link

github-actions bot commented Nov 6, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK on the understanding that it is slightly experimental, and after some more experience we may need to change things and/or nail down the exact rules for what cases are and are not supported.

Base automatically changed from users/nhaehnle/spr/main/667859fc to main November 6, 2025 17:19
@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from b65555b to 3eb0dc1 Compare November 6, 2025 17:22
If the instruction with tied operands is a BUNDLE instruction and we
handle it by replacing an operand, then we need to update the
corresponding internal operands as well. Otherwise, the resulting MIR is
invalid.

The test case is degenerate in the sense that the bundle only contains a
single instruction, but it is sufficient to exercise this issue.

commit-id:6760a9b7
@nhaehnle nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 3eb0dc1 to f9253b6 Compare November 6, 2025 17:47
@nhaehnle nhaehnle enabled auto-merge (squash) November 6, 2025 17:48
@nhaehnle nhaehnle merged commit 3c31cde into main Nov 6, 2025
9 of 10 checks passed
@nhaehnle nhaehnle deleted the users/nhaehnle/spr/main/6760a9b7 branch November 6, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants