CodeGen: Handle bundled instructions in two-address-instructions pass #166212

nhaehnle · 2025-11-03T19:04:03Z

If the instruction with tied operands is a BUNDLE instruction and we
handle it by replacing an operand, then we need to update the
corresponding internal operands as well. Otherwise, the resulting MIR is
invalid.

The test case is degenerate in the sense that the bundle only contains a
single instruction, but it is sufficient to exercise this issue.

Stack:

[2/2] CodeGen/AMDGPU: Allow 3-address conversion of bundled instructions #166213
[1/2] CodeGen: Handle bundled instructions in two-address-instructions pass #166212 ⬅

⚠️ Part of a stack created by spr. Merging this PR using the GitHub UI may have unexpected results.

llvmbot · 2025-11-03T19:04:37Z

@llvm/pr-subscribers-backend-amdgpu

Author: Nicolai Hähnle (nhaehnle)

Changes

If the instruction with tied operands is a BUNDLE instruction and we
handle it by replacing an operand, then we need to update the
corresponding internal operands as well. Otherwise, the resulting MIR is
invalid.

The test case is degenerate in the sense that the bundle only contains a
single instruction, but it is sufficient to exercise this issue.

Stack:

[5/5] #166213
[4/5] #166212 ⬅
[3/5] #166211
[2/5] #166210
[1/5] #166209

⚠️ Part of a stack created by spr. Merging this PR using the GitHub UI may have unexpected results.

Full diff: https://github.com/llvm/llvm-project/pull/166212.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/TwoAddressInstructionPass.cpp (+16)
(added) llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir (+57)

diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
index 414e414738b71..1f816b94cf56b 100644
--- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
+++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
@@ -1665,6 +1665,22 @@ void TwoAddressInstructionImpl::processTiedPairs(MachineInstr *MI,
     // by SubRegB is compatible with RegA with no subregister. So regardless of
     // whether the dest oper writes a subreg, the source oper should not.
     MO.setSubReg(0);
+
+    // Update uses of RegB to uses of RegA inside the bundle.
+    if (MI->isBundle()) {
+      for (MachineInstr *InnerMI = MI; InnerMI->isBundledWithSucc();) {
+        InnerMI = InnerMI->getNextNode();
+
+        for (MachineOperand &MO : InnerMI->all_uses()) {
+          if (MO.isReg() && MO.getReg() == RegB) {
+            assert(
+                MO.getSubReg() == 0 &&
+                "tied subregister uses in bundled instructions not supported");
+            MO.setReg(RegA);
+          }
+        }
+      }
+    }
   }
 
   if (AllUsesCopied) {
diff --git a/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir b/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir
new file mode 100644
index 0000000000000..696962a88c8b8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/twoaddr-bundle.mir
@@ -0,0 +1,57 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1200 %s --passes=two-address-instruction -verify-each -o - | FileCheck --check-prefixes=GCN %s
+
+# Exercise very basic handling of BUNDLE'd instructions by the two-address-instruction pass.
+
+# This test is an example where it is best to keep the two-address instruction
+# and resolve the tie with a COPY that is expected to be coalesced.
+---
+name:            test_fmac_bundle
+body:             |
+  bb.0:
+
+    ; GCN-LABEL: name: test_fmac_bundle
+    ; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[COPY1]], 0, implicit $exec
+    ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[V_ADD_U32_e64_]]
+    ; GCN-NEXT: BUNDLE implicit-def [[COPY2]], implicit [[DEF]], implicit [[DEF1]], implicit [[COPY2]](tied-def 0), implicit $mode, implicit $exec {
+    ; GCN-NEXT:   [[COPY2:%[0-9]+]]:vgpr_32 = V_FMAC_F32_e32 killed [[DEF]], killed [[DEF1]], killed [[COPY2]], implicit $mode, implicit $exec
+    ; GCN-NEXT: }
+    %10:vgpr_32 = COPY $vgpr0
+    %11:vgpr_32 = COPY $vgpr1
+    %2:vgpr_32 = V_ADD_U32_e64 %10, %11, 0, implicit $exec
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = IMPLICIT_DEF
+    BUNDLE implicit-def %3:vgpr_32, implicit %0, implicit %1, implicit killed %2(tied-def 0), implicit $mode, implicit $exec {
+      %3:vgpr_32 = V_FMAC_F32_e32 killed %0, killed %1, killed %2, implicit $mode, implicit $exec
+    }
+
+...
+
+# This test is an example where conversion to three-address form would be beneficial.
+---
+name:            test_fmac_reuse_bundle
+body:             |
+  bb.0:
+
+    ; GCN-LABEL: name: test_fmac_reuse_bundle
+    ; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
+    ; GCN-NEXT: BUNDLE implicit-def [[COPY1]], implicit [[DEF]], implicit [[DEF1]], implicit [[COPY1]](tied-def 0), implicit $mode, implicit $exec {
+    ; GCN-NEXT:   [[COPY1:%[0-9]+]]:vgpr_32 = V_FMAC_F32_e32 killed [[DEF]], killed [[DEF1]], killed [[COPY1]], implicit $mode, implicit $exec
+    ; GCN-NEXT: }
+    ; GCN-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY1]], [[COPY]], 0, implicit $exec
+    %2:vgpr_32 = COPY $vgpr0
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = IMPLICIT_DEF
+    BUNDLE implicit-def %3:vgpr_32, implicit %0, implicit %1, implicit %2(tied-def 0), implicit $mode, implicit $exec {
+      %3:vgpr_32 = V_FMAC_F32_e32 killed %0, killed %1, killed %2, implicit $mode, implicit $exec
+    }
+    %4:vgpr_32 = V_ADD_U32_e64 %3, %2, 0, implicit $exec
+
+...

arsenm · 2025-11-05T02:30:10Z

llvm/lib/CodeGen/TwoAddressInstructionPass.cpp

+
+    // Update uses of RegB to uses of RegA inside the bundle.
+    if (MI->isBundle()) {
+      for (MachineInstr *InnerMI = MI; InnerMI->isBundledWithSucc();) {


Use mi_bundle_ops?

jayfoad

Have you considered the case where the instructions inside the bundle have two uses of RegB, but only one of them is tied with RegA? I think it is almost impossible to handle that optimally given only the summarised information that you get from the operands of the BUNDLE. It might be worth adding a test case like that, just to check that we don't crash and still generate well formed MIR.

The fundamental question here is, can processTiedPairs really operate at the BUNDLE level (and then fix up the instructions inside)? Or is it going to have to operate on the individual instructions (and then fix up the summary information on the BUNDLE)?

jayfoad · 2025-11-06T11:32:53Z

llvm/lib/CodeGen/TwoAddressInstructionPass.cpp

+            assert(
+                MO.getSubReg() == 0 &&
+                "tied subregister uses in bundled instructions not supported");
+            MO.setReg(RegA);


The non-bundle code replaces a use of RegB.SubRegB with RegA, so maybe it makes more sense to do exactly the same here?

Suggested change

assert(

MO.getSubReg() == 0 &&

"tied subregister uses in bundled instructions not supported");

MO.setReg(RegA);

assert(

MO.getSubReg() == SubRegB &&

"tied subregister uses in bundled instructions not supported");

MO.setReg(RegA);

MO.setSubReg(0);

I played around with this but it got very confusing to me even in the restricted use case that I described in the other comment. I'd prefer to just keep this more restrictive for now, and if we find later on that supporting subregisters is beneficial we relax it separately.

Then maybe assert that MO.getSubReg() == 0 && SubRegB == 0? Otherwise this code will replace a use of one reg with a use of another reg with a different size.

nhaehnle · 2025-11-06T17:02:50Z

Have you considered the case where the instructions inside the bundle have two uses of RegB, but only one of them is tied with RegA? I think it is almost impossible to handle that optimally given only the summarised information that you get from the operands of the BUNDLE. It might be worth adding a test case like that, just to check that we don't crash and still generate well formed MIR.

The fundamental question here is, can processTiedPairs really operate at the BUNDLE level (and then fix up the instructions inside)? Or is it going to have to operate on the individual instructions (and then fix up the summary information on the BUNDLE)?

Yes, I thought about it. I did not find a good answer to what tied operands inside of a pre-RA bundle really mean in the general case. Bundles mean different things to different people. The main use of bundles outside of AMDGPU is for VLIW. In AMDGPU so far, it is used for memory clauses (which could potentially have tied operands for atomicrmw, but we only form them post-RA) and for a few niche cases like keeping S_GETPC_B64 together with its uses for PC-relative addressing.

What we're working towards here is a new pre-RA use case that can be vaguely described as "decomposing a single instruction into virtual micro-ops during codegen for the purpose of avoiding combinatorial explosion in opcodes etc.". For that use case, the requirements on tied operand support will be fairly restricted, and so I'd rather make this change more conservative and restrictive and not attempt to support something that we don't actually use and don't know how to test properly. And then build on that if and when we actually know what else might be needed.

github-actions · 2025-11-06T17:10:37Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jayfoad

Seems OK on the understanding that it is slightly experimental, and after some more experience we may need to change things and/or nail down the exact rules for what cases are and are not supported.

If the instruction with tied operands is a BUNDLE instruction and we handle it by replacing an operand, then we need to update the corresponding internal operands as well. Otherwise, the resulting MIR is invalid. The test case is degenerate in the sense that the bundle only contains a single instruction, but it is sufficient to exercise this issue. commit-id:6760a9b7

llvmbot added backend:AMDGPU llvm:codegen labels Nov 3, 2025

nhaehnle changed the base branch from users/nhaehnle/spr/main/667859fc to main November 4, 2025 18:59

nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 883e563 to 282ab76 Compare November 4, 2025 18:59

nhaehnle mentioned this pull request Nov 4, 2025

AMDGPU: Pre-commit a test #166414

Merged

nhaehnle changed the base branch from main to users/nhaehnle/spr/main/667859fc November 4, 2025 18:59

nhaehnle force-pushed the users/nhaehnle/spr/main/667859fc branch from c3abece to 0470587 Compare November 5, 2025 01:43

nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 282ab76 to 4af0353 Compare November 5, 2025 01:43

arsenm reviewed Nov 5, 2025

View reviewed changes

nhaehnle force-pushed the users/nhaehnle/spr/main/667859fc branch from 0470587 to f40a20f Compare November 5, 2025 06:15

nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 4af0353 to 97427fa Compare November 5, 2025 06:15

jayfoad reviewed Nov 6, 2025

View reviewed changes

nhaehnle force-pushed the users/nhaehnle/spr/main/667859fc branch from f40a20f to 4e20232 Compare November 6, 2025 16:12

nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 97427fa to 1224dba Compare November 6, 2025 16:12

nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 1224dba to b65555b Compare November 6, 2025 17:09

jayfoad approved these changes Nov 6, 2025

View reviewed changes

Base automatically changed from users/nhaehnle/spr/main/667859fc to main November 6, 2025 17:19

nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from b65555b to 3eb0dc1 Compare November 6, 2025 17:22

nhaehnle force-pushed the users/nhaehnle/spr/main/6760a9b7 branch from 3eb0dc1 to f9253b6 Compare November 6, 2025 17:47

nhaehnle enabled auto-merge (squash) November 6, 2025 17:48

nhaehnle merged commit 3c31cde into main Nov 6, 2025
9 of 10 checks passed

nhaehnle deleted the users/nhaehnle/spr/main/6760a9b7 branch November 6, 2025 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CodeGen: Handle bundled instructions in two-address-instructions pass #166212

CodeGen: Handle bundled instructions in two-address-instructions pass #166212

nhaehnle commented Nov 3, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 3, 2025

Uh oh!

arsenm Nov 5, 2025

Uh oh!

jayfoad left a comment

Uh oh!

jayfoad Nov 6, 2025

Uh oh!

nhaehnle Nov 6, 2025

Uh oh!

jayfoad Nov 6, 2025

Uh oh!

nhaehnle commented Nov 6, 2025

Uh oh!

github-actions bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

jayfoad left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CodeGen: Handle bundled instructions in two-address-instructions pass #166212

CodeGen: Handle bundled instructions in two-address-instructions pass #166212

Conversation

nhaehnle commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 3, 2025

Uh oh!

arsenm Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

jayfoad Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

nhaehnle Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

nhaehnle commented Nov 6, 2025

Uh oh!

github-actions bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nhaehnle commented Nov 3, 2025 •

edited

Loading

github-actions bot commented Nov 6, 2025 •

edited

Loading