Skip to content

Conversation

JanekvO
Copy link
Contributor

@JanekvO JanekvO commented Sep 24, 2025

#154115 Exposed a possible destination misaligned v_mov_b64

si-load-store-opt would emit a REG_SEQUENCE with a b64 register pair after b32 register, resulting in a misaligned vgpr pair. machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2. This patch ensures that the b64 v_mov pseudo instruction will check for correct vgpr alignment.

@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Janek van Oirschot (JanekvO)

Changes

#154115 Exposed a possible destination misaligned v_mov_b64

si-load-store-opt would emit a REG_SEQUENCE with a b64 register pair after b32 register, resulting in a misaligned vgpr pair. machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2. This patch ensures that the b64 v_mov pseudo instruction will check for correct vgpr alignment.


Full diff: https://github.com/llvm/llvm-project/pull/160547.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+5-2)
  • (added) llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir (+33)
  • (added) llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir (+31)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 84886d7780888..76a1cce98c75f 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -2149,7 +2149,9 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
     const MachineOperand &SrcOp = MI.getOperand(1);
     // FIXME: Will this work for 64-bit floating point immediates?
     assert(!SrcOp.isFPImm());
-    if (ST.hasMovB64()) {
+    MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
+    const TargetRegisterClass *RC = RI.getRegClassForReg(MRI, Dst);
+    if (ST.hasMovB64() && RI.isProperlyAlignedRC(*RC)) {
       MI.setDesc(get(AMDGPU::V_MOV_B64_e32));
       if (SrcOp.isReg() || isInlineConstant(MI, 1) ||
           isUInt<32>(SrcOp.getImm()) || ST.has64BitLiterals())
@@ -2159,7 +2161,8 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
       APInt Imm(64, SrcOp.getImm());
       APInt Lo(32, Imm.getLoBits(32).getZExtValue());
       APInt Hi(32, Imm.getHiBits(32).getZExtValue());
-      if (ST.hasPkMovB32() && Lo == Hi && isInlineConstant(Lo)) {
+      if (ST.hasPkMovB32() && Lo == Hi && isInlineConstant(Lo) &&
+          RI.isProperlyAlignedRC(*RC)) {
         BuildMI(MBB, MI, DL, get(AMDGPU::V_PK_MOV_B32), Dst)
           .addImm(SISrcMods::OP_SEL_1)
           .addImm(Lo.getSExtValue())
diff --git a/llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir b/llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir
new file mode 100644
index 0000000000000..a42a74597a1e9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir
@@ -0,0 +1,33 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -start-after=si-load-store-opt %s -o - | FileCheck %s
+
+# CHECK: "misaligned-regsequence":
+# CHECK: ; %bb.0:
+# CHECK:         s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+# CHECK:         s_load_dwordx2 s[0:1], s[4:5], 0x0
+# CHECK:         v_mov_b32_e32 v5, 0
+# CHECK:         v_mov_b32_e32 v4, 0
+# CHECK:         v_mov_b32_e32 v6, 0
+# CHECK:         s_waitcnt lgkmcnt(0)
+# CHECK:         v_mov_b64_e32 v[2:3], s[0:1]
+# CHECK:         flat_store_dwordx3 v[2:3], v[4:6]
+# CHECK:         s_endpgm
+
+--- |
+  define void @misaligned-regsequence() { ret void }
+...
+---
+name: misaligned-regsequence
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $sgpr4_sgpr5
+
+    %3:sgpr_64(p4) = COPY $sgpr4_sgpr5
+    %8:sreg_64_xexec = S_LOAD_DWORDX2_IMM %3:sgpr_64(p4), 0, 0 :: (dereferenceable invariant load (s64), align 16, addrspace 4)
+    %9:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+    %10:vreg_64_align2 = COPY %8:sreg_64_xexec
+    %11:vreg_64_align2 = V_MOV_B64_PSEUDO 0, implicit $exec
+    %13:vreg_96_align2 = REG_SEQUENCE killed %9:vgpr_32, %subreg.sub0, killed %11:vreg_64_align2, %subreg.sub1_sub2
+    FLAT_STORE_DWORDX3 %10:vreg_64_align2, killed %13:vreg_96_align2, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into `ptr addrspace(1) undef`, align 4)
+    S_ENDPGM 0
+...
diff --git a/llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir b/llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir
new file mode 100644
index 0000000000000..672a52a0e4bd3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/vgpr-mov64-align.mir
@@ -0,0 +1,31 @@
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -start-before=postrapseudos %s -o - | FileCheck %s
+
+# CHECK: v_mov_b64_misalign:
+# CHECK: ; %bb.0:
+# CHECK:         s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+# CHECK:         s_load_dwordx2 s[0:1], s[4:5], 0x0
+# CHECK:         v_mov_b32_e32 v5, 0
+# CHECK:         v_mov_b32_e32 v4, 0
+# CHECK:         v_mov_b32_e32 v6, 0
+# CHECK:         s_waitcnt lgkmcnt(0)
+# CHECK:         v_mov_b64_e32 v[2:3], s[0:1]
+# CHECK:         flat_store_dwordx3 v[2:3], v[4:6]
+# CHECK:         s_endpgm
+
+---
+name:            v_mov_b64_misalign
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $sgpr4_sgpr5
+  
+    frame-setup CFI_INSTRUCTION escape 0x0f, 0x04, 0x30, 0x36, 0xe9, 0x02
+    frame-setup CFI_INSTRUCTION undefined $pc_reg
+    renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s64), align 16, addrspace 4)
+    renamable $vgpr4 = AV_MOV_B32_IMM_PSEUDO 0, implicit $exec
+    renamable $vgpr5_vgpr6 = AV_MOV_B64_IMM_PSEUDO 0, implicit $exec
+    renamable $vgpr2_vgpr3 = COPY killed renamable $sgpr0_sgpr1, implicit $exec
+    FLAT_STORE_DWORDX3 killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5_vgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into `ptr addrspace(1) undef`, align 4)
+    S_ENDPGM 0
+...
+

@jayfoad
Copy link
Contributor

jayfoad commented Sep 24, 2025

machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2.

That doesn't sound right - if V_MOV_B64_PSEUDO uses aligned register classes then machine-cp should not do this, because it should be checking register class constraints?

@JanekvO
Copy link
Contributor Author

JanekvO commented Sep 24, 2025

machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2.

That doesn't sound right - if V_MOV_B64_PSEUDO uses aligned register classes then machine-cp should not do this, because it should be checking register class constraints?

The machine-cp of interest happens after RA:

# Machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $sgpr4_sgpr5

0B	bb.0.entry:
	  liveins: $sgpr4_sgpr5
32B	  renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s64) from %ir..kernarg.offset1, align 16, addrspace 4)
48B	  renamable $vgpr4 = AV_MOV_B32_IMM_PSEUDO 0, implicit $exec
80B	  renamable $vgpr0_vgpr1 = AV_MOV_B64_IMM_PSEUDO 0, implicit $exec
96B	  renamable $vgpr2_vgpr3 = COPY killed renamable $sgpr0_sgpr1
128B	  renamable $vgpr5_vgpr6 = COPY killed renamable $vgpr0_vgpr1
144B	  FLAT_STORE_DWORDX3 killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5_vgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into %ir..load, align 4)
160B	  S_ENDPGM 0

# End machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_.

# *** IR Dump After Machine Copy Propagation Pass (machine-cp) ***:
# Machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $sgpr4_sgpr5

bb.0.entry:
  liveins: $sgpr4_sgpr5
  renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s64) from %ir..kernarg.offset1, align 16, addrspace 4)
  renamable $vgpr4 = AV_MOV_B32_IMM_PSEUDO 0, implicit $exec
  renamable $vgpr5_vgpr6 = AV_MOV_B64_IMM_PSEUDO 0, implicit $exec
  renamable $vgpr2_vgpr3 = COPY killed renamable $sgpr0_sgpr1
  FLAT_STORE_DWORDX3 killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5_vgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into %ir..load, align 4)
  S_ENDPGM 0

Where renamable $vgpr5_vgpr6 = COPY killed renamable $vgpr0_vgpr1 gets machine-cp'ed into a misaligned AV_MOV_B64_PSEUDO. This COPY originates from the si-load-store-opt emitted %13:vreg_96_align2 = REG_SEQUENCE killed %9:vgpr_32, %subreg.sub0, killed %11:vreg_64_align2, %subreg.sub1_sub2 where the vreg96 and vreg64 alignments already don't make sense.

@arsenm
Copy link
Contributor

arsenm commented Sep 24, 2025

That doesn't sound right - if V_MOV_B64_PSEUDO uses aligned register classes then machine-cp should not do this, because it should be checking register class constraints?

Relaxing this to use unaligned classes is on my todo list, so we should do this anyway. But AV_MOV_B64_IMM_PSEUDO does not require aligned classes as it is

; CHECK: $vgpr1 = V_MOV_B32_e32 9, implicit $exec, implicit-def $vgpr1_vgpr2
; CHECK-NEXT: $vgpr2 = V_MOV_B32_e32 -16, implicit $exec, implicit-def $vgpr1_vgpr2
$vgpr1_vgpr2 = AV_MOV_B64_IMM_PSEUDO 18446744004990074889, implicit $exec
...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seems to have the intent of the tests I'm adding, but seem to be eliding the verify error due to check on isUInt<32> for the immediate. Should these be changed or merged with the new tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AV_MOV_B64_IMM_PSEUDO already doesn't require alignment, I think this already works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of gfx942 (and without the changes of this PR) my added test for vgpr5_vgpr6 seems to emit a misaligned v_mov_b64 (https://godbolt.org/z/ch9qcWc3s)

@@ -0,0 +1,33 @@
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -start-after=si-load-store-opt %s -o - | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird way to write the test. Can you replace this with an end to end IR test that hit this problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't add the end-to-end IR as #154115 was reverted which showed the code path towards misaligned v_mov_b64. I was planning to add the IR test as a reland of #154115 after this has landed but I can add it as a sort of precommit in here since it's related.

Comment on lines 97 to 104
# GCN-LABEL: name: v_mov_b64_misalign
# GCN: $vgpr5 = V_MOV_B32_e32 0, implicit $exec, implicit-def $vgpr5_vgpr6
# GCN: $vgpr6 = V_MOV_B32_e32 0, implicit $exec, implicit-def $vgpr5_vgpr6
name: v_mov_b64_misalign
body: |
bb.0:
$vgpr5_vgpr6 = V_MOV_B64_PSEUDO 0, implicit $exec
...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because V_MOV_B64_PSEUDO is regclass constrained, even the MIRParser will error on a misaligned register pair. Therefore, removing this test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore the test and add a special case to make sure this supports the unaligned case

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell there is no bug here, and if there was a bug it would be in whatever created V_MOV_B64_PSEUDO with unaligned VGPRs in the first place.

@arsenm
Copy link
Contributor

arsenm commented Sep 29, 2025

As far as I can tell there is no bug here, and if there was a bug it would be in whatever created V_MOV_B64_PSEUDO with unaligned VGPRs in the first place.

V_MOV_B64_PSEUDO should support unaligned registers, any change should be to relax the restriction

@jayfoad
Copy link
Contributor

jayfoad commented Sep 29, 2025

As far as I can tell there is no bug here, and if there was a bug it would be in whatever created V_MOV_B64_PSEUDO with unaligned VGPRs in the first place.

V_MOV_B64_PSEUDO should support unaligned registers, any change should be to relax the restriction

No objection to that. But that's not what this patch currently does.

@JanekvO
Copy link
Contributor Author

JanekvO commented Sep 29, 2025

The bug I'm seeing is more so with AV_MOV_B64_IMM_PSEUDO than V_MOV_B64_PSEUDO but they're related as AV_MOV_B64_IMM_PSEUDO does a fallthrough into V_MOV_B64_PSEUDO which already assumes an aligned pair whereas AV_MOV_B64_IMM_PSEUDO doesn't. This means that it may emit a v_mov_b64 on a misaligned register pair for AV_MOV_B64_IMM_PSEUDO, see https://godbolt.org/z/ch9qcWc3s

Should I include the relaxation of V_MOV_B64_PSEUDO's register alignment restriction in this PR?

@JanekvO JanekvO requested review from arsenm and jayfoad October 1, 2025 11:21
@arsenm
Copy link
Contributor

arsenm commented Oct 1, 2025

The bug I'm seeing is more so with AV_MOV_B64_IMM_PSEUDO than V_MOV_B64_PSEUDO but they're related as AV_MOV_B64_IMM_PSEUDO does a fallthrough into V_MOV_B64_PSEUDO which already assumes an aligned pair whereas AV_MOV_B64_IMM_PSEUDO doesn't. This means that it may emit a v_mov_b64 on a misaligned register pair for AV_MOV_B64_IMM_PSEUDO, see https://godbolt.org/z/ch9qcWc3s

Should I include the relaxation of V_MOV_B64_PSEUDO's register alignment restriction in this PR?

Might as well, they're kind of a pair

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants