[AMDGPU] Initialize FrameOffsetReg for amdgpu_cs_chain functions #165518

jofrn · 2025-10-29T07:02:44Z

Functions with the amdgpu_cs_chain calling convention were not initializing FrameOffsetReg, leaving it as FP_REG. This caused machine code verification failures
as SCRATCH_STORE_DWORD_SADDR instructions require the saddr operand to be in the SReg_32_XEXEC_HI register class.

llvmbot · 2025-10-29T07:03:15Z

@llvm/pr-subscribers-backend-amdgpu

Author: None (jofrn)

Changes

Functions with the amdgpu_cs_chain calling convention were not initializing FrameOffsetReg, leaving it as FP_REG. This caused machine code verification failures
as SCRATCH_STORE_DWORD_SADDR instructions require the saddr operand to be in the SReg_32_XEXEC_HI register class.

Full diff: https://github.com/llvm/llvm-project/pull/165518.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+1)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll (+64)

diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index b398db4f7caff..a9f1e9a2ad31d 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -98,6 +98,7 @@ SIMachineFunctionInfo::SIMachineFunctionInfo(const Function &F,
     // set one up. For now, we can use s32 to match what amdgpu_gfx functions
     // would use if called, but this can be revisited.
     // FIXME: Only reserve this if we actually need it.
+    FrameOffsetReg = AMDGPU::SGPR33;
     StackPtrOffsetReg = AMDGPU::SGPR32;
 
     ScratchRSrcReg = AMDGPU::SGPR48_SGPR49_SGPR50_SGPR51;
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll
new file mode 100644
index 0000000000000..3e2ecad67fbec
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx942 -O0 -verify-machineinstrs < %s | FileCheck %s
+
+define amdgpu_cs_chain void @indirect(ptr %callee) {
+; CHECK-LABEL: indirect:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    s_mov_b32 s32, 16
+; CHECK-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; CHECK-NEXT:    scratch_store_dword off, v40, off ; 4-byte Folded Spill
+; CHECK-NEXT:    scratch_store_dword off, v41, off offset:4 ; 4-byte Folded Spill
+; CHECK-NEXT:    s_mov_b64 exec, s[0:1]
+; CHECK-NEXT:    ; implicit-def: $sgpr6_sgpr7
+; CHECK-NEXT:    ; implicit-def: $sgpr10_sgpr11
+; CHECK-NEXT:    ; implicit-def: $sgpr0
+; CHECK-NEXT:    s_mov_b32 s1, 0
+; CHECK-NEXT:    ; implicit-def: $vgpr40 : SGPR spill to VGPR lane
+; CHECK-NEXT:    v_writelane_b32 v40, s1, 0
+; CHECK-NEXT:    v_mov_b32_e32 v0, s1
+; CHECK-NEXT:    v_mov_b32_e32 v1, s1
+; CHECK-NEXT:    s_mov_b64 s[4:5], s[6:7]
+; CHECK-NEXT:    s_mov_b64 s[8:9], 36
+; CHECK-NEXT:    s_mov_b32 s12, s0
+; CHECK-NEXT:    s_mov_b32 s13, s0
+; CHECK-NEXT:    s_mov_b32 s14, s0
+; CHECK-NEXT:    s_mov_b32 s15, s0
+; CHECK-NEXT:    v_mov_b32_e32 v31, s0
+; CHECK-NEXT:    s_getpc_b64 s[0:1]
+; CHECK-NEXT:    s_add_u32 s0, s0, indirect@gotpcrel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s1, s1, indirect@gotpcrel32@hi+12
+; CHECK-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0x0
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[0:1]
+; CHECK-NEXT:    v_readlane_b32 s3, v40, 0
+; CHECK-NEXT:    s_nop 1
+; CHECK-NEXT:    v_mov_b32_e32 v0, s3
+; CHECK-NEXT:    s_nop 0
+; CHECK-NEXT:    v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:    v_mov_b32_e32 v0, s3
+; CHECK-NEXT:    s_nop 0
+; CHECK-NEXT:    v_readfirstlane_b32 s1, v0
+; CHECK-NEXT:    v_mov_b32_e32 v0, s3
+; CHECK-NEXT:    s_nop 0
+; CHECK-NEXT:    v_readfirstlane_b32 s2, v0
+; CHECK-NEXT:    v_mov_b32_e32 v8, s3
+; CHECK-NEXT:    s_mov_b32 s4, 0
+; CHECK-NEXT:    v_mov_b32_e32 v9, s4
+; CHECK-NEXT:    v_mov_b32_e32 v10, s3
+; CHECK-NEXT:    v_mov_b32_e32 v11, s3
+; CHECK-NEXT:    s_mov_b64 s[4:5], 0
+; CHECK-NEXT:    v_readlane_b32 s3, v41, 0
+; CHECK-NEXT:    s_xor_saveexec_b64 s[6:7], -1
+; CHECK-NEXT:    scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
+; CHECK-NEXT:    scratch_load_dword v41, off, s33 offset:4 ; 4-byte Folded Reload
+; CHECK-NEXT:    s_mov_b64 exec, s[6:7]
+; CHECK-NEXT:    s_mov_b32 s33, s3
+; CHECK-NEXT:    s_mov_b64 exec, 0
+; CHECK-NEXT:    s_setpc_b64 s[4:5]
+  call void @indirect(ptr null)
+  call void (ptr, i64, <3 x i32>, { i32, ptr addrspace(5), i32, i32 }, i32, ...) @llvm.amdgcn.cs.chain.p0.i64.v3i32.sl_i32p5i32i32s(ptr null, i64 0, <3 x i32> inreg zeroinitializer, { i32, ptr addrspace(5), i32, i32 } zeroinitializer, i32 0)
+  unreachable
+}
+
+declare void @llvm.amdgcn.cs.chain.p0.i64.v3i32.sl_i32p5i32i32s(ptr, i64, <3 x i32>, { i32, ptr addrspace(5), i32, i32 }, i32 immarg, ...)

shiltian · 2025-11-05T04:58:19Z

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

    // set one up. For now, we can use s32 to match what amdgpu_gfx functions
    // would use if called, but this can be revisited.
    // FIXME: Only reserve this if we actually need it.
+    FrameOffsetReg = AMDGPU::SGPR33;


Based on the comment, it doesn't seem like this is needed. The test case has a UB in the first place, because indirect should not be called directly. I'm not sure if we want to do this. @jayfoad @rovka

It still needs to compile without verifier error if it's UB. You don't have the option of rejecting a frame pointer, you can explicitly request it (and things like dynamic alloca force it too)

arsenm · 2025-11-05T07:13:07Z

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

    // set one up. For now, we can use s32 to match what amdgpu_gfx functions
    // would use if called, but this can be revisited.
    // FIXME: Only reserve this if we actually need it.
+    FrameOffsetReg = AMDGPU::SGPR33;


It still needs to compile without verifier error if it's UB. You don't have the option of rejecting a frame pointer, you can explicitly request it (and things like dynamic alloca force it too)

arsenm · 2025-11-05T07:13:39Z

llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll

+  call void (ptr, i64, <3 x i32>, { i32, ptr addrspace(5), i32, i32 }, i32, ...) @llvm.amdgcn.cs.chain.p0.i64.v3i32.sl_i32p5i32i32s(ptr null, i64 0, <3 x i32> inreg zeroinitializer, { i32, ptr addrspace(5), i32, i32 } zeroinitializer, i32 0)
+  unreachable
+}
+


Can you add some other straightforward examples that need an FP, like dynamic alloca and/or "frame-pointer"="all"? Do those fail without -O0?

We already have tests for some of these cases in https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll

It would be nice to consolidate all FP-related tests in a single file, so they're easy to find.

The straightforward fp all example failed with -O0, so I added it to amdgpu-cs-chain-fp-nosave.ll. The recursive example is still in its original file, and the chain intrinsic call prevents compilation of it to gfx1200, so because fp-nosave.ll has gfx1200 within one of the RUN lines, then we may want to keep it in a different file for now.

This should definitely work on GFX12 and -O0! If you feel like the issues are not related to your patch, let me know and I can look into fixing them. Thanks :)

I had a typo. I meant to say the recurse test only fails with -O0 and -global-isel=1. The fp all test can be moved because it can fail without -O0; it can go back to the new file too since it fails either way. Sorry about that.

You may try as well. Thank you for the help so far.

rovka · 2025-11-05T09:07:52Z

llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll

+; CHECK-NEXT:    s_mov_b64 s[4:5], 0
+; CHECK-NEXT:    v_readlane_b32 s3, v41, 0
+; CHECK-NEXT:    s_xor_saveexec_b64 s[6:7], -1
+; CHECK-NEXT:    scratch_load_dword v40, off, s33 ; 4-byte Folded Reload


This isn't correct, since the stores don't use fp_reg, they just use plain offsets. It looks like this is just an oversight where we used the wrong register for the restores. You should check who's inserting these spills and restores and make sure they're handled the same way, rather than force the FP to s33. (I could've sworn I fixed something like this a while ago but I can't find it right now).

Removed the hardcoding of s33 and added check to avoid restoring FP for chain functions. Thanks.

Functions with the amdgpu_cs_chain calling convention were not initializing FrameOffsetReg, leaving it as FP_REG. This caused machine code verification failures as SCRATCH_STORE_DWORD_SADDR instructions require the saddr operand to be in the SReg_32_XEXEC_HI register class. This LLVM defect was identified via the AMD Fuzzing project.

…andle chain functions in same way as non-FPSaved case, an offset. Remove prior hardcode to s33 and do not restore FP for chain functions.

…fx1200. Move fp_all test to fp-nosave since it compiles on both gfx942 and gfx1200.

arsenm · 2025-11-11T03:52:03Z

llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll

+  call void (ptr, i64, <3 x i32>, { i32, ptr addrspace(5), i32, i32 }, i32, ...) @llvm.amdgcn.cs.chain.p0.i64.v3i32.sl_i32p5i32i32s(ptr null, i64 0, <3 x i32> inreg zeroinitializer, { i32, ptr addrspace(5), i32, i32 } zeroinitializer, i32 0)
+  unreachable
+}
+


dynamic alloca would be easier to see

There are dynamic allocas in the other test file.

arsenm · 2025-11-11T03:52:34Z

llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll

+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_mov_b32_e32 v0, 0
+; GFX942-NEXT:    scratch_store_dword off, v0, off


This looks like it the frame pointer was just ignored? If we require a frame pointer it still needs to be set up

rovka · 2025-11-11T08:30:44Z

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

+              .getRegisterInfo()
+              ->hasStackRealignment(MF) ||
+          mayReserveScratchForCWSR(MF) ||
+          MF.getTarget().Options.DisableFramePointerElim(MF));


I think chain functions should honour DisableFramePointerElim.

As far as I know, we don't have a test with the frameaddress intrinsic, we should probably add one (if it does the wrong thing, add a FIXME and I can have a look at it if you don't have the time).

I have also just realized that CWSR support is probably broken for chain functions. That has a bit of a bigger scope than your current patch, so I can look into that one too (unless you're very interested in learning about it).

Alright, will add a test for frameaddress. I will take CWSR into consideration...

You may work on it if you'd like as well.

rovka · 2025-11-11T08:34:16Z

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

  }

-  if (FPSaved) {
+  if (FPSaved && !FuncInfo->isChainFunction()) {


What's this for? If you're in a chain function where the FP needs to be spilled, then you'll call emitCSRSpillRestores once in the branch above, and then again from the else on this branch.

To be fair this whole code looks unnecessarily complicated, can we just merge the 2 ifs? They originally had the same condition and it's probably good to keep it that way.

Yes, they can be merged. Thank you!

rovka · 2025-11-11T08:41:09Z

llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll

+  call void (ptr, i64, <3 x i32>, { i32, ptr addrspace(5), i32, i32 }, i32, ...) @llvm.amdgcn.cs.chain.p0.i64.v3i32.sl_i32p5i32i32s(ptr null, i64 0, <3 x i32> inreg zeroinitializer, { i32, ptr addrspace(5), i32, i32 } zeroinitializer, i32 0)
+  unreachable
+}
+


This should definitely work on GFX12 and -O0! If you feel like the issues are not related to your patch, let me know and I can look into fixing them. Thanks :)

jofrn · 2025-11-17T15:29:44Z

This should definitely work on GFX12 and -O0! If you feel like the issues are not related to your patch, let me know and I can look into fixing them. Thanks :)

recurse test can fail with or without -O0; I had a typo, clarified above.

The current issue occurs from the fp_all test case. The original patch for recurse requires only

return !MF.getInfo<SIMachineFunctionInfo>()->isChainFunction() && (frameTriviallyRequiresSP(MFI) || MFI.isFrameAddressTaken() ...

fp_all requires perhaps the additional modifications to SIFrameLowering::emitEpilogue.

llvmbot added the backend:AMDGPU label Oct 29, 2025

jofrn force-pushed the users/jofrn/fix-amdgpu-cs-chain-frame-pointer branch from 18331e0 to 76189d6 Compare October 29, 2025 07:04

jofrn requested review from Pierre-vh, arsenm and shiltian October 29, 2025 07:04

shiltian reviewed Nov 5, 2025

View reviewed changes

arsenm reviewed Nov 5, 2025

View reviewed changes

rovka reviewed Nov 5, 2025

View reviewed changes

jofrn added 3 commits November 10, 2025 07:19

For emitEpilogue, emitCSRSpillRestores' argument in FPSaved case to h…

d4eecd0

…andle chain functions in same way as non-FPSaved case, an offset. Remove prior hardcode to s33 and do not restore FP for chain functions.

Added test case to verify no FP setup occurs with "frame-pointer"="all".

820b9c3

jofrn force-pushed the users/jofrn/fix-amdgpu-cs-chain-frame-pointer branch from 6e8105c to 820b9c3 Compare November 10, 2025 13:09

Rename indirect to recurse and keep in original file -- invalid for g…

506b411

…fx1200. Move fp_all test to fp-nosave since it compiles on both gfx942 and gfx1200.

arsenm reviewed Nov 11, 2025

View reviewed changes

rovka reviewed Nov 11, 2025

View reviewed changes

[AMDGPU] Initialize FrameOffsetReg for amdgpu_cs_chain functions #165518

Are you sure you want to change the base?

[AMDGPU] Initialize FrameOffsetReg for amdgpu_cs_chain functions #165518

Conversation

jofrn commented Oct 29, 2025

Uh oh!

llvmbot commented Oct 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jofrn Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jofrn Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jofrn commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jofrn Nov 10, 2025 •

edited

Loading

jofrn Nov 17, 2025 •

edited

Loading

jofrn commented Nov 17, 2025 •

edited

Loading