Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use R12 for indirect tail calls with PACBTI #82661

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

eleanor-arm
Copy link
Contributor

@eleanor-arm eleanor-arm commented Feb 22, 2024

When compiling for thumbv8.1m with +pacbti and making an indirect tail
call, the compiler was free to put the function pointer into R12.

This is incorrect because R12 is restored to contain authentication code
for the caller's return address.

This patch excludes R12 from the set of registers the compiler can put
the function pointer in.

Fixes #75998

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 22, 2024

@llvm/pr-subscribers-backend-arm

Author: Eleanor Bonnici (eleanor-arm)

Changes

When compiling for thumbv8.1m with +pacbti and making an indirect tail
call, the compiler was free to put the function pointer into R12.

This is incorrect because R12 is restored to contain authentication code
for the caller's return address.

This patch excludes R12 from the set of registers the compiler can put
the function pointer in.


Full diff: https://github.com/llvm/llvm-project/pull/82661.diff

2 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMRegisterInfo.td (+2-1)
  • (added) llvm/test/CodeGen/Thumb2/pacbti-m-indirect-tail-call-no-r12.ll (+78)
diff --git a/llvm/lib/Target/ARM/ARMRegisterInfo.td b/llvm/lib/Target/ARM/ARMRegisterInfo.td
index 194d65cad8d170..025ca62d54376b 100644
--- a/llvm/lib/Target/ARM/ARMRegisterInfo.td
+++ b/llvm/lib/Target/ARM/ARMRegisterInfo.td
@@ -369,7 +369,8 @@ def hGPR : RegisterClass<"ARM", [i32], 32, (sub GPR, tGPR)> {
 def tcGPR : RegisterClass<"ARM", [i32], 32, (add R0, R1, R2, R3, R12)> {
   let AltOrders = [(and tcGPR, tGPR)];
   let AltOrderSelect = [{
-      return MF.getSubtarget<ARMSubtarget>().isThumb1Only();
+      return MF.getSubtarget<ARMSubtarget>().isThumb1Only() ||
+        MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress(true);
   }];
 }
 
diff --git a/llvm/test/CodeGen/Thumb2/pacbti-m-indirect-tail-call-no-r12.ll b/llvm/test/CodeGen/Thumb2/pacbti-m-indirect-tail-call-no-r12.ll
new file mode 100644
index 00000000000000..844baa15500a3d
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb2/pacbti-m-indirect-tail-call-no-r12.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc %s -mtriple=thumbv8.1m.main -mattr=+pacbti -o - | FileCheck %s
+
+target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv8.1m.main-m.main-unknown"
+
+; Function Attrs: minsize nounwind optsize sspstrong
+define dso_local arm_aapcscc i32 @i3c_dev_do_priv_xfers_locked(ptr noundef %dev, ptr noundef %xfers, i32 noundef %nxfers) local_unnamed_addr #0 {
+; CHECK-LABEL: i3c_dev_do_priv_xfers_locked:
+; CHECK:       @ %bb.0: @ %entry
+; CHECK-NEXT:    pacbti r12, lr, sp
+; CHECK-NEXT:    push {r4, lr}
+; CHECK-NEXT:    str r12, [sp, #-4]!
+; CHECK-NEXT:    mov r3, r0
+; CHECK-NEXT:    mvn r0, #21
+; CHECK-NEXT:    cbz r1, .LBB0_5
+; CHECK-NEXT:  @ %bb.1: @ %entry
+; CHECK-NEXT:    ldr r4, [r3]
+; CHECK-NEXT:    cbz r4, .LBB0_5
+; CHECK-NEXT:  @ %bb.2: @ %if.end
+; CHECK-NEXT:    ldr r0, [r4]
+; CHECK-NEXT:    ldr r
+; CHECK-SAME-NOT      12
+; CHECK-SAME             , [r0]
+; CHECK-NEXT:    cbz r0, .LBB0_4
+; CHECK-NEXT:  @ %bb.3: @ %if.end5
+; CHECK-NEXT:    mov r4, r0
+; CHECK-NEXT:    mov r0, r3
+; CHECK-NEXT:    mov r3, r4
+; CHECK-NEXT:    ldr r12, [sp], #4
+; CHECK-NEXT:    pop.w {r4, lr}
+; CHECK-NEXT:    aut r12, lr, sp
+; CHECK-NEXT:    bx r
+; CHECK-SAME-NOT     12
+entry:
+  %0 = load ptr, ptr %dev, align 4, !tbaa !6
+  %tobool = icmp ne ptr %0, null
+  %tobool2 = icmp ne ptr %xfers, null
+  %or.cond = and i1 %tobool2, %tobool
+  br i1 %or.cond, label %if.end, label %cleanup
+
+if.end:                                           ; preds = %entry
+
+  %1 = load ptr, ptr %0, align 4, !tbaa !12
+  %2 = load ptr, ptr %1, align 4, !tbaa !14
+  %tobool3.not = icmp eq ptr %2, null
+  br i1 %tobool3.not, label %cleanup, label %if.end5
+
+if.end5:                                          ; preds = %if.end
+  %call = tail call arm_aapcscc i32 %2(ptr noundef nonnull %dev, ptr noundef nonnull %xfers, i32 noundef %nxfers) #1
+  br label %cleanup
+
+cleanup:                                          ; preds = %if.end, %entry, %if.end5
+  %retval.0 = phi i32 [ %call, %if.end5 ], [ -22, %entry ], [ -138, %if.end ]
+  ret i32 %retval.0
+}
+
+attributes #0 = { minsize sspstrong "target-features"="+armv8.1-m.main" }
+
+!llvm.module.flags = !{!0, !1, !2, !3, !4}
+!llvm.ident = !{!5}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 1, !"min_enum_size", i32 1}
+!2 = !{i32 8, !"branch-target-enforcement", i32 1}
+!3 = !{i32 8, !"guarded-control-stack", i32 1}
+!4 = !{i32 8, !"sign-return-address", i32 1}
+!5 = !{!"clang version 18.0.0git (git@github.com:llvm/llvm-project.git 2db9244b6f66a4fd4c937de7b33a5b6dda4a06a8)"}
+!6 = !{!7, !9, i64 0}
+!7 = !{!"i3c_dev_desc", !8, i64 0}
+!8 = !{!"i3c_i2c_dev_desc", !9, i64 0}
+!9 = !{!"any pointer", !10, i64 0}
+!10 = !{!"omnipotent char", !11, i64 0}
+!11 = !{!"Simple C/C++ TBAA"}
+!12 = !{!13, !9, i64 0}
+!13 = !{!"i3c_master_controller", !9, i64 0}
+!14 = !{!15, !9, i64 0}
+!15 = !{!"i3c_master_controller_ops", !9, i64 0}

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the first line of your commit message describes what you're fixing, instead of pointing at the bug number. "Fixes #75998" should generally be the last line of the commit message.

@@ -369,7 +369,8 @@ def hGPR : RegisterClass<"ARM", [i32], 32, (sub GPR, tGPR)> {
def tcGPR : RegisterClass<"ARM", [i32], 32, (add R0, R1, R2, R3, R12)> {
let AltOrders = [(and tcGPR, tGPR)];
let AltOrderSelect = [{
return MF.getSubtarget<ARMSubtarget>().isThumb1Only();
return MF.getSubtarget<ARMSubtarget>().isThumb1Only() ||
MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the same check we do in ARMTargetLowering::IsEligibleForTailCallOptimization; can we unify them, so it's clear they're checking the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are a bit similar but not the same. The optimization filter in ARMTargetLowering::IsEligibleForTailCallOptimization also checks the number of arguments to decide whether it's possible to optimize, which is the most relevant factor for that purpose. Here we just encode which registers to use if the optimization is already happening.

; CHECK-NEXT: @ %bb.2: @ %if.end
; CHECK-NEXT: ldr r0, [r4]
; CHECK-NEXT: ldr r
; CHECK-SAME-NOT 12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these CHECK-SAME lines are actually being parsed without the ":"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the hand edits to the auto-generated assertions.

@@ -0,0 +1,78 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to hand-edit the CHECK lines, please mark the file with "NOTE: Do not autogenerate" (see https://llvm.org/docs/TestingGuide.html#generating-assertions-in-regression-tests) .

In this case, I'm not sure it's worth hand-editing the CHECK lines.

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all the bits that aren't relevant to what this is testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@ostannard ostannard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think changing the AltOrders actually prevents the removed registers in all cases, in this case r12 is still used:

typedef void(func)(void);

void foo(func *fptr_arg) {
  register func *fptr_r12 asm("r12") = fptr_arg;

  // Force the function pointer into r12, and clobber lr to force a stack frame
  // to be created.
  asm("" : "+r" (fptr_r12) : : "lr");

  // Indirect tail call through r12.
  fptr_r12();
}
$ /work/llvm/build/bin/clang --target=arm-none-eabi -march=armv8.1-m.main -c indirect-tail.c -o - -S -O1 -mbranch-protection=pac-ret
...
        ldr     r12, [sp], #4
        pop.w   {r7, lr}
        aut     r12, lr, sp
        bx      r12
...

I recently fixed a similar issue in AArch64 (#81020), would that method of creating multiple pseudo-instructions with different register classes work here too?

@eleanor-arm eleanor-arm changed the title Fix https://github.com/llvm/llvm-project/issues/75998 Do not use R12 for indirect tail calls with PACBTI Mar 26, 2024
@eleanor-arm
Copy link
Contributor Author

I don't think changing the AltOrders actually prevents the removed registers in all cases, in this case r12 is still used:

typedef void(func)(void);

void foo(func *fptr_arg) {
  register func *fptr_r12 asm("r12") = fptr_arg;

  // Force the function pointer into r12, and clobber lr to force a stack frame
  // to be created.
  asm("" : "+r" (fptr_r12) : : "lr");

  // Indirect tail call through r12.
  fptr_r12();
}
$ /work/llvm/build/bin/clang --target=arm-none-eabi -march=armv8.1-m.main -c indirect-tail.c -o - -S -O1 -mbranch-protection=pac-ret
...
        ldr     r12, [sp], #4
        pop.w   {r7, lr}
        aut     r12, lr, sp
        bx      r12
...

I recently fixed a similar issue in AArch64 (#81020), would that method of creating multiple pseudo-instructions with different register classes work here too?

I've reviewed #81020 but didn't find it directly applicable to this problem. The fix code is not run when compiling any C/C++ testcases I have available with Aarch64 target. When I tried to add similar code to ARM the checks were run and the compiler crashed because MF is uninitialized at that point.
It looks like to address this case when R12 is forced in assembly, more refactoring would be needed which I currently don't have time to look into. It's a very uncommon case so I don't think it should blog this fix that will fix the defect for everyone who doesn't try to specify registers explicitly in integrated assembly.

@efriedma-quic
Copy link
Collaborator

I don't think the inline asm is actually necessary to reproduce the issue, it just makes it easier to write a small testcase by controlling the behavior of the register allocator.

The fix code is not run when compiling any C/C++ testcases I have available with Aarch64 target.

I'm not sure what you're referring to here. The important change is the change to AArch64InstrInfo.td.

@eleanor-arm
Copy link
Contributor Author

There are two ways to observe the issue, either through a very specific C/C++ code or the integrated asm code. The difference is that the first code path is easier to fix, it seems much harder to fix the second path. I think this fix should fix it for all C/C++ testcases but it won't work if integrated asm is used.

The important change is the change to AArch64InstrInfo.td.

Yes. I see the checks like MF->getInfo<AArch64FunctionInfo>()->branchTargetEnforcement() && !MF->getInfo<AArch64FunctionInfo>()->branchProtectionPAuthLR(). They are all in computeAvailableFunctionFeatures() (when built) and I struggled to see this function being executed. I only saw it executed when running one of the llc tests added in #81020 when global-isel is enabled. It looks like #81020 might be fixing the issue for the cases when global-isel is enabled only.

@ostannard
Copy link
Collaborator

They are all in computeAvailableFunctionFeatures() (when built) and I struggled to see this function being executed.

I see copies of this code being built into computeAvailableFunctionFeatures (in AArch64GenGlobalISel.inc), and in CheckPatternPredicate (in AArch64GenDAGISel.inc), have you checked that latter? Global isel isn't enabled by default for 32-bit ARM, so I'd expect DAGISel to be the important one here.

Copy link

github-actions bot commented Apr 15, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@eleanor-arm
Copy link
Contributor Author

I made a similar change to #81020 and it now passes the IR tests just like #81020. However, if the register is forced in integrated assembly this fix doesn't work (and neither does #81020 I don't think. It looks like clang doesn't translate this to correct IR.

@ostannard
Copy link
Collaborator

What do you mean by "doesn't work"? I can see that you've added tests which use inline assembly to force the function pointer into r12, and they appear to generate correct code. Does clang generate different IR to pacbti-indirect-tail-calls.ll, and if so what is wrong with it?

@eleanor-arm
Copy link
Contributor Author

Actually the problem looks quite simple. When compiling with -mbranch-protection=standard, I'd expect to see "sign-return-address"="all" as one of the function attributes in the IR but it's not there.

@ostannard
Copy link
Collaborator

It looks like clang emits a module flag instead of a function attribute, so it applies to all functions in the module:

!4 = !{i32 8, !"sign-return-address", i32 1}

The attribute and flag are read by the GetSignReturnAddress function in ARMMachineFunctionInfo.cpp, with the function attribute taking priority over the module flag if both are present. This seems to work correctly for normal code-generation, so I'd expect it to work for these new instruction selection patterns too. Are you saying that this does not work if you test it end-to-end with clang, or when generating IR with clang?

For #81020, do you have a reproducer showing it not working correctly? I've tried a few simple examples and it seems to be working correctly, despite clang using module flags instead of function attributes:

typedef void(func)(void);

void foo(func *fptr_arg) {
  register func *fptr_x16 asm("x16") = fptr_arg;
  asm("" : "+r" (fptr_x16) : : "lr");
  fptr_x16();
}
$ /work/llvm/build/bin/clang --target=aarch64-none-elf -march=armv8.3-a -c indirect-tail.c -o - -S -O1 -mbranch-protection=pac-ret+pc | sed -n '/foo:/,/func_end/p'
foo:                                    // @foo
// %bb.0:                               // %entry
        hint    #39
.Ltmp0:
        paciasp
        str     x30, [sp, #-16]!                // 8-byte Folded Spill
        mov     x16, x0
        //APP
        //NO_APP
        mov     x0, x16
        ldr     x30, [sp], #16                  // 8-byte Folded Reload
        adr     x16, .Ltmp0
        hint    #39
        autiasp
        br      x0
.Lfunc_end0:

@eleanor-arm
Copy link
Contributor Author

You are right, Clang generates module attributes instead. The attribute that needs to be set explicitly for AArch64 to work is branch-protection-pauth-lr in fact, not sign-return-address. I am not sure why sign-return-address is even in the test. The test passes without it being used for any of the functions.

As far as ARM goes, sign-return-address seems to be the attribute that's set when pacbti is enabled, as when that's set as a function attribute the IR test passes. Why the codegen is not correct when it set as a module attribute I am not sure. The MF->getInfo<ARMFunctionInfo>()->shouldSignReturnAddress() check is present in ARMGenDAGIsel.inc and appears to be executing. I don't see any obvious difference between AArch64 and ARM atm.

@eleanor-arm
Copy link
Contributor Author

I think what clang needs to generate is

!4 = !{i32 8, !"sign-return-address", i32 1}                                                                                             
!5 = !{i32 8, !"sign-return-address-all", i32 1} 

then the codegen is correct. Would you agree?

@ostannard
Copy link
Collaborator

The attribute that needs to be set explicitly for AArch64 to work is branch-protection-pauth-lr in fact, not sign-return-address. I am not sure why sign-return-address is even in the test. The test passes without it being used for any of the functions.

That AArch64 change was for a bug in branch-protection-pauth-lr, which is a new feature added in Armv9.5-A, and doesn't have an equivalent in M-profile. For M-profile PAC/BTI, we only have sign-return-address and branch-target-enforcement, and I don't think the latter is relevant to this bug.

Why the codegen is not correct when it set as a module attribute I am not sure.

I can't really help with this if you don't tell me what codegen is incorrect. Do you have an IR file for which LLVM continues to generate code which incorrectly uses r12 in the prologue, or something else?

I think what clang needs to generate is

!4 = !{i32 8, !"sign-return-address", i32 1}
!5 = !{i32 8, !"sign-return-address-all", i32 1} 

then the codegen is correct. Would you agree?

The sign-return-address-all flag should cause the PAC/AUT instructions to be used in all functions, not just ones which spill lr to the stack. That isn't enabled by -mbranch-protection=standard, instead it should only be turned on by -mbranch-protection=pac-ret+leaf.

@eleanor-arm
Copy link
Contributor Author

That AArch64 change was for a bug in branch-protection-pauth-lr

Thanks for the clarification. I don't think you need to use "sign-return-address"="all" in the test here then.

I can't really help with this if you don't tell me what codegen is incorrect. Do you have an IR file for which LLVM continues to generate code which incorrectly uses r12 in the prologue, or something else?

This

define dso_local void @func1(ptr noundef readonly %fptr_arg) local_unnamed_addr {
entry:
  %0 = tail call ptr asm "", "={r12},{r12},~{lr}"(ptr %fptr_arg)
  tail call void %0()
  ret void
}
!llvm.module.flags = !{!4}

!4 = !{i32 8, !"sign-return-address", i32 1}

won't generate correct code.

func1:
	.fnstart
@ %bb.0:                                @ %entry
	pac	r12, lr, sp
	.save	{r7, lr}
	push	{r7, lr}
	.save	{ra_auth_code}
	str	r12, [sp, #-4]!
	mov	r12, r0
	@APP
	@NO_APP
	ldr	r12, [sp], #4
	pop.w	{r7, lr}
	aut	r12, lr, sp
	bx	r12
.Lfunc_end0:

I don't see the r14 spilled here. So does this mean the PAC/AUT should not have been added to this function?

This code is correct

define dso_local void @func2(ptr noundef readonly %fptr_arg) local_unnamed_addr #0 {
entry:
  %0 = tail call ptr asm "", "={r12},{r12},~{lr}"(ptr %fptr_arg)
  tail call void %0()
  ret void
}

!llvm.module.flags = !{!4, !5}

!4 = !{i32 8, !"sign-return-address", i32 1}
!5 = !{i32 8, !"sign-return-address-all", i32 1}
func2:
	.fnstart
@ %bb.0:                                @ %entry
	pac	r12, lr, sp
	.save	{r7, lr}
	push	{r7, lr}
	.save	{ra_auth_code}
	str	r12, [sp, #-4]!
	mov	r12, r0
	@APP
	@NO_APP
	mov	r0, r12
	ldr	r12, [sp], #4
	pop.w	{r7, lr}
	aut	r12, lr, sp
	bx	r0
.Lfunc_end0:

@ostannard
Copy link
Collaborator

I don't think you need to use "sign-return-address"="all" in the test here then.

That's true, I don't think I realised at the time that it wasn't needed, but since sign-return-address is automatically enabled by branch-protection-pauth-lr, I don't think we need change the test.

I don't see the r14 spilled here. So does this mean the PAC/AUT should not have been added to this function?

r14 is being spilled by the push {r7, lr} instruction (lr is an alias for r14), because it is clobbered by the inline assembly (~{lr} in the constraint string), so the compiler is correct to emit the PAC/AUT instructions here. Instead, your change needs to trigger here, and prevent r12 being used by the bx instruction.

@eleanor-arm
Copy link
Contributor Author

r14 is being spilled by the push {r7, lr} instruction

Yes, of course. Thanks for pointing it out.

I think it is difficult with this implementation to check whether LR is actually spilled on not. The instruction selection happens before the register allocation in the pipeline. Therefore taking a conservative approach and not using R12 for tail calls when shouldSignReturnAddress=true, shouldSignReturnAddressAll=false

@@ -10,10 +10,8 @@
//
//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unrelated change should be dropped.

@@ -228,6 +228,10 @@ def DontGenExecuteOnly : Predicate<"!Subtarget->genExecuteOnly()">;
def GenT1ExecuteOnly : Predicate<"Subtarget->genExecuteOnly() && "
"Subtarget->isThumb1Only() && "
"!Subtarget->hasV8MBaselineOps()">;
let RecomputePerFunction = 1 in {
def TailCallNotR12 : Predicate<[{ MF->getInfo<ARMFunctionInfo>()->shouldSignReturnAddress(true) }]>;
def TailCall : Predicate<[{ !MF->getInfo<ARMFunctionInfo>()->shouldSignReturnAddress(true) }]>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name isn't very descriptive, it doesn't affect the presence of tail-calls, just one detail of them. How about TailCallAnyReg, or describe the condition instead of the effect, like SignRetAddr and NoSignRetAddr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. I've changed the name.

Copy link
Collaborator

@ostannard ostannard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if you remove the unrelated comment/whitespace change.

@@ -10,10 +10,8 @@
//
//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unrelated change should be dropped.

@eleanor-arm eleanor-arm merged commit c12bc57 into llvm:main Apr 30, 2024
3 of 4 checks passed
@eleanor-arm eleanor-arm deleted the TCRETURN branch April 30, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In the -Oz optimization mode, when enabling -mbranch-protection=standard, the program encounters an error.
4 participants