[AMDGPU] Emit b32 movs if (a)v_mov_b64_pseudo dest vgprs are misaligned #160547

JanekvO · 2025-09-24T15:52:07Z

#154115 Exposed a possible destination misaligned v_mov_b64

si-load-store-opt would emit a REG_SEQUENCE with a b64 register pair after b32 register, resulting in a misaligned vgpr pair. machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2. This patch ensures that the b64 v_mov pseudo instruction will check for correct vgpr alignment.

…rs are misaligned

llvmbot · 2025-09-24T15:52:44Z

@llvm/pr-subscribers-backend-amdgpu

Author: Janek van Oirschot (JanekvO)

Changes

#154115 Exposed a possible destination misaligned v_mov_b64

si-load-store-opt would emit a REG_SEQUENCE with a b64 register pair after b32 register, resulting in a misaligned vgpr pair. machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2. This patch ensures that the b64 v_mov pseudo instruction will check for correct vgpr alignment.

Full diff: https://github.com/llvm/llvm-project/pull/160547.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+5-2)
(added) llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir (+33)
(added) llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir (+31)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 84886d7780888..76a1cce98c75f 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2149,7 +2149,9 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
     const MachineOperand &SrcOp = MI.getOperand(1);
     // FIXME: Will this work for 64-bit floating point immediates?
     assert(!SrcOp.isFPImm());
-    if (ST.hasMovB64()) {
+    MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
+    const TargetRegisterClass *RC = RI.getRegClassForReg(MRI, Dst);
+    if (ST.hasMovB64() && RI.isProperlyAlignedRC(*RC)) {
       MI.setDesc(get(AMDGPU::V_MOV_B64_e32));
       if (SrcOp.isReg() || isInlineConstant(MI, 1) ||
           isUInt<32>(SrcOp.getImm()) || ST.has64BitLiterals())
@@ -2159,7 +2161,8 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
       APInt Imm(64, SrcOp.getImm());
       APInt Lo(32, Imm.getLoBits(32).getZExtValue());
       APInt Hi(32, Imm.getHiBits(32).getZExtValue());
-      if (ST.hasPkMovB32() && Lo == Hi && isInlineConstant(Lo)) {
+      if (ST.hasPkMovB32() && Lo == Hi && isInlineConstant(Lo) &&
+          RI.isProperlyAlignedRC(*RC)) {
         BuildMI(MBB, MI, DL, get(AMDGPU::V_PK_MOV_B32), Dst)
           .addImm(SISrcMods::OP_SEL_1)
           .addImm(Lo.getSExtValue())
diff --git a/llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir b/llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir
new file mode 100644
index 0000000000000..a42a74597a1e9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir
@@ -0,0 +1,33 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -start-after=si-load-store-opt %s -o - | FileCheck %s
+
+# CHECK: "misaligned-regsequence":
+# CHECK: ; %bb.0:
+# CHECK:         s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+# CHECK:         s_load_dwordx2 s[0:1], s[4:5], 0x0
+# CHECK:         v_mov_b32_e32 v5, 0
+# CHECK:         v_mov_b32_e32 v4, 0
+# CHECK:         v_mov_b32_e32 v6, 0
+# CHECK:         s_waitcnt lgkmcnt(0)
+# CHECK:         v_mov_b64_e32 v[2:3], s[0:1]
+# CHECK:         flat_store_dwordx3 v[2:3], v[4:6]
+# CHECK:         s_endpgm
+
+--- |
+  define void @misaligned-regsequence() { ret void }
+...
+---
+name: misaligned-regsequence
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $sgpr4_sgpr5
+
+    %3:sgpr_64(p4) = COPY $sgpr4_sgpr5
+    %8:sreg_64_xexec = S_LOAD_DWORDX2_IMM %3:sgpr_64(p4), 0, 0 :: (dereferenceable invariant load (s64), align 16, addrspace 4)
+    %9:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+    %10:vreg_64_align2 = COPY %8:sreg_64_xexec
+    %11:vreg_64_align2 = V_MOV_B64_PSEUDO 0, implicit $exec
+    %13:vreg_96_align2 = REG_SEQUENCE killed %9:vgpr_32, %subreg.sub0, killed %11:vreg_64_align2, %subreg.sub1_sub2
+    FLAT_STORE_DWORDX3 %10:vreg_64_align2, killed %13:vreg_96_align2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into `ptr addrspace(1) undef`, align 4)
+    S_ENDPGM 0
+...
diff --git a/llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir b/llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir
new file mode 100644
index 0000000000000..672a52a0e4bd3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir
@@ -0,0 +1,31 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -start-before=postrapseudos %s -o - | FileCheck %s
+
+# CHECK: v_mov_b64_misalign:
+# CHECK: ; %bb.0:
+# CHECK:         s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+# CHECK:         s_load_dwordx2 s[0:1], s[4:5], 0x0
+# CHECK:         v_mov_b32_e32 v5, 0
+# CHECK:         v_mov_b32_e32 v4, 0
+# CHECK:         v_mov_b32_e32 v6, 0
+# CHECK:         s_waitcnt lgkmcnt(0)
+# CHECK:         v_mov_b64_e32 v[2:3], s[0:1]
+# CHECK:         flat_store_dwordx3 v[2:3], v[4:6]
+# CHECK:         s_endpgm
+
+---
+name:            v_mov_b64_misalign
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $sgpr4_sgpr5
+  
+    frame-setup CFI_INSTRUCTION escape 0x0f, 0x04, 0x30, 0x36, 0xe9, 0x02
+    frame-setup CFI_INSTRUCTION undefined $pc_reg
+    renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s64), align 16, addrspace 4)
+    renamable $vgpr4 = AV_MOV_B32_IMM_PSEUDO 0, implicit $exec
+    renamable $vgpr5_vgpr6 = AV_MOV_B64_IMM_PSEUDO 0, implicit $exec
+    renamable $vgpr2_vgpr3 = COPY killed renamable $sgpr0_sgpr1, implicit $exec
+    FLAT_STORE_DWORDX3 killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5_vgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into `ptr addrspace(1) undef`, align 4)
+    S_ENDPGM 0
+...
+

jayfoad · 2025-09-24T16:27:24Z

machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2.

That doesn't sound right - if V_MOV_B64_PSEUDO uses aligned register classes then machine-cp should not do this, because it should be checking register class constraints?

JanekvO · 2025-09-24T17:53:36Z

machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2.

That doesn't sound right - if V_MOV_B64_PSEUDO uses aligned register classes then machine-cp should not do this, because it should be checking register class constraints?

The machine-cp of interest happens after RA:

# Machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $sgpr4_sgpr5

0B	bb.0.entry:
	  liveins: $sgpr4_sgpr5
32B	  renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s64) from %ir..kernarg.offset1, align 16, addrspace 4)
48B	  renamable $vgpr4 = AV_MOV_B32_IMM_PSEUDO 0, implicit $exec
80B	  renamable $vgpr0_vgpr1 = AV_MOV_B64_IMM_PSEUDO 0, implicit $exec
96B	  renamable $vgpr2_vgpr3 = COPY killed renamable $sgpr0_sgpr1
128B	  renamable $vgpr5_vgpr6 = COPY killed renamable $vgpr0_vgpr1
144B	  FLAT_STORE_DWORDX3 killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5_vgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into %ir..load, align 4)
160B	  S_ENDPGM 0

# End machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_.

# *** IR Dump After Machine Copy Propagation Pass (machine-cp) ***:
# Machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $sgpr4_sgpr5

bb.0.entry:
  liveins: $sgpr4_sgpr5
  renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s64) from %ir..kernarg.offset1, align 16, addrspace 4)
  renamable $vgpr4 = AV_MOV_B32_IMM_PSEUDO 0, implicit $exec
  renamable $vgpr5_vgpr6 = AV_MOV_B64_IMM_PSEUDO 0, implicit $exec
  renamable $vgpr2_vgpr3 = COPY killed renamable $sgpr0_sgpr1
  FLAT_STORE_DWORDX3 killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5_vgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into %ir..load, align 4)
  S_ENDPGM 0

Where renamable $vgpr5_vgpr6 = COPY killed renamable $vgpr0_vgpr1 gets machine-cp'ed into a misaligned AV_MOV_B64_PSEUDO. This COPY originates from the si-load-store-opt emitted %13:vreg_96_align2 = REG_SEQUENCE killed %9:vgpr_32, %subreg.sub0, killed %11:vreg_64_align2, %subreg.sub1_sub2 where the vreg96 and vreg64 alignments already don't make sense.

arsenm · 2025-09-24T23:38:05Z

That doesn't sound right - if V_MOV_B64_PSEUDO uses aligned register classes then machine-cp should not do this, because it should be checking register class constraints?

Relaxing this to use unaligned classes is on my todo list, so we should do this anyway. But AV_MOV_B64_IMM_PSEUDO does not require aligned classes as it is

llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/test/CodeGen/AMDGPU/v_mov_b64_expansion.mir

JanekvO · 2025-09-26T15:56:04Z

llvm/test/CodeGen/AMDGPU/av_movimm_pseudo_expansion.mir

    ; CHECK: $vgpr1 = V_MOV_B32_e32 9, implicit $exec, implicit-def $vgpr1_vgpr2
    ; CHECK-NEXT: $vgpr2 = V_MOV_B32_e32 -16, implicit $exec, implicit-def $vgpr1_vgpr2
    $vgpr1_vgpr2 = AV_MOV_B64_IMM_PSEUDO 18446744004990074889, implicit $exec
 ...


These seems to have the intent of the tests I'm adding, but seem to be eliding the verify error due to check on isUInt<32> for the immediate. Should these be changed or merged with the new tests?

AV_MOV_B64_IMM_PSEUDO already doesn't require alignment, I think this already works.

In the case of gfx942 (and without the changes of this PR) my added test for vgpr5_vgpr6 seems to emit a misaligned v_mov_b64 (https://godbolt.org/z/ch9qcWc3s)

llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir

arsenm · 2025-09-26T23:45:31Z

llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir

@@ -0,0 +1,33 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -start-after=si-load-store-opt %s -o - | FileCheck %s


This is a weird way to write the test. Can you replace this with an end to end IR test that hit this problem

I can't add the end-to-end IR as #154115 was reverted which showed the code path towards misaligned v_mov_b64. I was planning to add the IR test as a reland of #154115 after this has landed but I can add it as a sort of precommit in here since it's related.

JanekvO · 2025-09-29T10:58:24Z

llvm/test/CodeGen/AMDGPU/v_mov_b64_expansion.mir

-# GCN-LABEL: name: v_mov_b64_misalign
-# GCN: $vgpr5 = V_MOV_B32_e32 0, implicit $exec, implicit-def $vgpr5_vgpr6
-# GCN: $vgpr6 = V_MOV_B32_e32 0, implicit $exec, implicit-def $vgpr5_vgpr6
-name: v_mov_b64_misalign
-body: |
-  bb.0:
-    $vgpr5_vgpr6 = V_MOV_B64_PSEUDO 0, implicit $exec
-...


Because V_MOV_B64_PSEUDO is regclass constrained, even the MIRParser will error on a misaligned register pair. Therefore, removing this test.

Please restore the test and add a special case to make sure this supports the unaligned case

jayfoad

As far as I can tell there is no bug here, and if there was a bug it would be in whatever created V_MOV_B64_PSEUDO with unaligned VGPRs in the first place.

arsenm · 2025-09-29T13:05:24Z

As far as I can tell there is no bug here, and if there was a bug it would be in whatever created V_MOV_B64_PSEUDO with unaligned VGPRs in the first place.

V_MOV_B64_PSEUDO should support unaligned registers, any change should be to relax the restriction

jayfoad · 2025-09-29T13:12:25Z

As far as I can tell there is no bug here, and if there was a bug it would be in whatever created V_MOV_B64_PSEUDO with unaligned VGPRs in the first place.

V_MOV_B64_PSEUDO should support unaligned registers, any change should be to relax the restriction

No objection to that. But that's not what this patch currently does.

JanekvO · 2025-09-29T15:02:45Z

The bug I'm seeing is more so with AV_MOV_B64_IMM_PSEUDO than V_MOV_B64_PSEUDO but they're related as AV_MOV_B64_IMM_PSEUDO does a fallthrough into V_MOV_B64_PSEUDO which already assumes an aligned pair whereas AV_MOV_B64_IMM_PSEUDO doesn't. This means that it may emit a v_mov_b64 on a misaligned register pair for AV_MOV_B64_IMM_PSEUDO, see https://godbolt.org/z/ch9qcWc3s

Should I include the relaxation of V_MOV_B64_PSEUDO's register alignment restriction in this PR?

arsenm · 2025-10-01T15:44:02Z

The bug I'm seeing is more so with AV_MOV_B64_IMM_PSEUDO than V_MOV_B64_PSEUDO but they're related as AV_MOV_B64_IMM_PSEUDO does a fallthrough into V_MOV_B64_PSEUDO which already assumes an aligned pair whereas AV_MOV_B64_IMM_PSEUDO doesn't. This means that it may emit a v_mov_b64 on a misaligned register pair for AV_MOV_B64_IMM_PSEUDO, see https://godbolt.org/z/ch9qcWc3s

Should I include the relaxation of V_MOV_B64_PSEUDO's register alignment restriction in this PR?

Might as well, they're kind of a pair

[AMDGPU] Emit separate v_mov_b32s if v_mov_b64_pseudo destination vgp…

b93b955

…rs are misaligned

JanekvO requested a review from arsenm September 24, 2025 15:52

llvmbot added the backend:AMDGPU label Sep 24, 2025

arsenm reviewed Sep 24, 2025

View reviewed changes

Fix test cases, test register compatibility with regclass used

6d0f672

arsenm reviewed Sep 26, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/v_mov_b64_expansion.mir Show resolved Hide resolved

Feedback

d83de84

JanekvO commented Sep 26, 2025

View reviewed changes

arsenm reviewed Sep 26, 2025

View reviewed changes

Feedback, correct tests

1eeadc2

JanekvO commented Sep 29, 2025

View reviewed changes

jayfoad requested changes Sep 29, 2025

View reviewed changes

Allow V_MOV_B64_PSEUDO unaligned dst registers

7942675

JanekvO requested review from arsenm and jayfoad October 1, 2025 11:21

JanekvO mentioned this pull request Oct 10, 2025

[AMDGPU] siloadstoreopt generate REG_SEQUENCE with aligned operands #162088

Open

		@@ -0,0 +1,33 @@
		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -start-after=si-load-store-opt %s -o - \| FileCheck %s

[AMDGPU] Emit b32 movs if (a)v_mov_b64_pseudo dest vgprs are misaligned #160547

Are you sure you want to change the base?

[AMDGPU] Emit b32 movs if (a)v_mov_b64_pseudo dest vgprs are misaligned #160547

Uh oh!

Conversation

JanekvO commented Sep 24, 2025

Uh oh!

llvmbot commented Sep 24, 2025

Uh oh!

jayfoad commented Sep 24, 2025

Uh oh!

JanekvO commented Sep 24, 2025

Uh oh!

arsenm commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JanekvO Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

JanekvO Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

JanekvO Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

JanekvO Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Sep 29, 2025

Uh oh!

jayfoad commented Sep 29, 2025

Uh oh!

JanekvO commented Sep 29, 2025

Uh oh!

arsenm commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arsenm commented Sep 24, 2025 •

edited

Loading