-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use R12 for indirect tail calls with PACBTI #82661
Conversation
@llvm/pr-subscribers-backend-arm Author: Eleanor Bonnici (eleanor-arm) ChangesWhen compiling for thumbv8.1m with +pacbti and making an indirect tail This is incorrect because R12 is restored to contain authentication code This patch excludes R12 from the set of registers the compiler can put Full diff: https://github.com/llvm/llvm-project/pull/82661.diff 2 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMRegisterInfo.td b/llvm/lib/Target/ARM/ARMRegisterInfo.td
index 194d65cad8d170..025ca62d54376b 100644
--- a/llvm/lib/Target/ARM/ARMRegisterInfo.td
+++ b/llvm/lib/Target/ARM/ARMRegisterInfo.td
@@ -369,7 +369,8 @@ def hGPR : RegisterClass<"ARM", [i32], 32, (sub GPR, tGPR)> {
def tcGPR : RegisterClass<"ARM", [i32], 32, (add R0, R1, R2, R3, R12)> {
let AltOrders = [(and tcGPR, tGPR)];
let AltOrderSelect = [{
- return MF.getSubtarget<ARMSubtarget>().isThumb1Only();
+ return MF.getSubtarget<ARMSubtarget>().isThumb1Only() ||
+ MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress(true);
}];
}
diff --git a/llvm/test/CodeGen/Thumb2/pacbti-m-indirect-tail-call-no-r12.ll b/llvm/test/CodeGen/Thumb2/pacbti-m-indirect-tail-call-no-r12.ll
new file mode 100644
index 00000000000000..844baa15500a3d
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb2/pacbti-m-indirect-tail-call-no-r12.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc %s -mtriple=thumbv8.1m.main -mattr=+pacbti -o - | FileCheck %s
+
+target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv8.1m.main-m.main-unknown"
+
+; Function Attrs: minsize nounwind optsize sspstrong
+define dso_local arm_aapcscc i32 @i3c_dev_do_priv_xfers_locked(ptr noundef %dev, ptr noundef %xfers, i32 noundef %nxfers) local_unnamed_addr #0 {
+; CHECK-LABEL: i3c_dev_do_priv_xfers_locked:
+; CHECK: @ %bb.0: @ %entry
+; CHECK-NEXT: pacbti r12, lr, sp
+; CHECK-NEXT: push {r4, lr}
+; CHECK-NEXT: str r12, [sp, #-4]!
+; CHECK-NEXT: mov r3, r0
+; CHECK-NEXT: mvn r0, #21
+; CHECK-NEXT: cbz r1, .LBB0_5
+; CHECK-NEXT: @ %bb.1: @ %entry
+; CHECK-NEXT: ldr r4, [r3]
+; CHECK-NEXT: cbz r4, .LBB0_5
+; CHECK-NEXT: @ %bb.2: @ %if.end
+; CHECK-NEXT: ldr r0, [r4]
+; CHECK-NEXT: ldr r
+; CHECK-SAME-NOT 12
+; CHECK-SAME , [r0]
+; CHECK-NEXT: cbz r0, .LBB0_4
+; CHECK-NEXT: @ %bb.3: @ %if.end5
+; CHECK-NEXT: mov r4, r0
+; CHECK-NEXT: mov r0, r3
+; CHECK-NEXT: mov r3, r4
+; CHECK-NEXT: ldr r12, [sp], #4
+; CHECK-NEXT: pop.w {r4, lr}
+; CHECK-NEXT: aut r12, lr, sp
+; CHECK-NEXT: bx r
+; CHECK-SAME-NOT 12
+entry:
+ %0 = load ptr, ptr %dev, align 4, !tbaa !6
+ %tobool = icmp ne ptr %0, null
+ %tobool2 = icmp ne ptr %xfers, null
+ %or.cond = and i1 %tobool2, %tobool
+ br i1 %or.cond, label %if.end, label %cleanup
+
+if.end: ; preds = %entry
+
+ %1 = load ptr, ptr %0, align 4, !tbaa !12
+ %2 = load ptr, ptr %1, align 4, !tbaa !14
+ %tobool3.not = icmp eq ptr %2, null
+ br i1 %tobool3.not, label %cleanup, label %if.end5
+
+if.end5: ; preds = %if.end
+ %call = tail call arm_aapcscc i32 %2(ptr noundef nonnull %dev, ptr noundef nonnull %xfers, i32 noundef %nxfers) #1
+ br label %cleanup
+
+cleanup: ; preds = %if.end, %entry, %if.end5
+ %retval.0 = phi i32 [ %call, %if.end5 ], [ -22, %entry ], [ -138, %if.end ]
+ ret i32 %retval.0
+}
+
+attributes #0 = { minsize sspstrong "target-features"="+armv8.1-m.main" }
+
+!llvm.module.flags = !{!0, !1, !2, !3, !4}
+!llvm.ident = !{!5}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 1, !"min_enum_size", i32 1}
+!2 = !{i32 8, !"branch-target-enforcement", i32 1}
+!3 = !{i32 8, !"guarded-control-stack", i32 1}
+!4 = !{i32 8, !"sign-return-address", i32 1}
+!5 = !{!"clang version 18.0.0git (git@github.com:llvm/llvm-project.git 2db9244b6f66a4fd4c937de7b33a5b6dda4a06a8)"}
+!6 = !{!7, !9, i64 0}
+!7 = !{!"i3c_dev_desc", !8, i64 0}
+!8 = !{!"i3c_i2c_dev_desc", !9, i64 0}
+!9 = !{!"any pointer", !10, i64 0}
+!10 = !{!"omnipotent char", !11, i64 0}
+!11 = !{!"Simple C/C++ TBAA"}
+!12 = !{!13, !9, i64 0}
+!13 = !{!"i3c_master_controller", !9, i64 0}
+!14 = !{!15, !9, i64 0}
+!15 = !{!"i3c_master_controller_ops", !9, i64 0}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure the first line of your commit message describes what you're fixing, instead of pointing at the bug number. "Fixes #75998" should generally be the last line of the commit message.
@@ -369,7 +369,8 @@ def hGPR : RegisterClass<"ARM", [i32], 32, (sub GPR, tGPR)> { | |||
def tcGPR : RegisterClass<"ARM", [i32], 32, (add R0, R1, R2, R3, R12)> { | |||
let AltOrders = [(and tcGPR, tGPR)]; | |||
let AltOrderSelect = [{ | |||
return MF.getSubtarget<ARMSubtarget>().isThumb1Only(); | |||
return MF.getSubtarget<ARMSubtarget>().isThumb1Only() || | |||
MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the same check we do in ARMTargetLowering::IsEligibleForTailCallOptimization; can we unify them, so it's clear they're checking the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are a bit similar but not the same. The optimization filter in ARMTargetLowering::IsEligibleForTailCallOptimization also checks the number of arguments to decide whether it's possible to optimize, which is the most relevant factor for that purpose. Here we just encode which registers to use if the optimization is already happening.
; CHECK-NEXT: @ %bb.2: @ %if.end | ||
; CHECK-NEXT: ldr r0, [r4] | ||
; CHECK-NEXT: ldr r | ||
; CHECK-SAME-NOT 12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these CHECK-SAME lines are actually being parsed without the ":"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the hand edits to the auto-generated assertions.
@@ -0,0 +1,78 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to hand-edit the CHECK lines, please mark the file with "NOTE: Do not autogenerate" (see https://llvm.org/docs/TestingGuide.html#generating-assertions-in-regression-tests) .
In this case, I'm not sure it's worth hand-editing the CHECK lines.
!llvm.module.flags = !{!0, !1, !2, !3, !4} | ||
!llvm.ident = !{!5} | ||
|
||
!0 = !{i32 1, !"wchar_size", i32 4} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all the bits that aren't relevant to what this is testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think changing the AltOrders
actually prevents the removed registers in all cases, in this case r12
is still used:
typedef void(func)(void);
void foo(func *fptr_arg) {
register func *fptr_r12 asm("r12") = fptr_arg;
// Force the function pointer into r12, and clobber lr to force a stack frame
// to be created.
asm("" : "+r" (fptr_r12) : : "lr");
// Indirect tail call through r12.
fptr_r12();
}
$ /work/llvm/build/bin/clang --target=arm-none-eabi -march=armv8.1-m.main -c indirect-tail.c -o - -S -O1 -mbranch-protection=pac-ret
...
ldr r12, [sp], #4
pop.w {r7, lr}
aut r12, lr, sp
bx r12
...
I recently fixed a similar issue in AArch64 (#81020), would that method of creating multiple pseudo-instructions with different register classes work here too?
I've reviewed #81020 but didn't find it directly applicable to this problem. The fix code is not run when compiling any C/C++ testcases I have available with Aarch64 target. When I tried to add similar code to ARM the checks were run and the compiler crashed because |
I don't think the inline asm is actually necessary to reproduce the issue, it just makes it easier to write a small testcase by controlling the behavior of the register allocator.
I'm not sure what you're referring to here. The important change is the change to AArch64InstrInfo.td. |
There are two ways to observe the issue, either through a very specific C/C++ code or the integrated asm code. The difference is that the first code path is easier to fix, it seems much harder to fix the second path. I think this fix should fix it for all C/C++ testcases but it won't work if integrated asm is used.
Yes. I see the checks like |
I see copies of this code being built into |
✅ With the latest revision this PR passed the C/C++ code formatter. |
What do you mean by "doesn't work"? I can see that you've added tests which use inline assembly to force the function pointer into |
Actually the problem looks quite simple. When compiling with |
It looks like clang emits a module flag instead of a function attribute, so it applies to all functions in the module:
The attribute and flag are read by the For #81020, do you have a reproducer showing it not working correctly? I've tried a few simple examples and it seems to be working correctly, despite clang using module flags instead of function attributes: typedef void(func)(void);
void foo(func *fptr_arg) {
register func *fptr_x16 asm("x16") = fptr_arg;
asm("" : "+r" (fptr_x16) : : "lr");
fptr_x16();
}
|
You are right, Clang generates module attributes instead. The attribute that needs to be set explicitly for AArch64 to work is As far as ARM goes, |
I think what clang needs to generate is
then the codegen is correct. Would you agree? |
That AArch64 change was for a bug in
I can't really help with this if you don't tell me what codegen is incorrect. Do you have an IR file for which LLVM continues to generate code which incorrectly uses
The |
Thanks for the clarification. I don't think you need to use
This
won't generate correct code.
I don't see the r14 spilled here. So does this mean the PAC/AUT should not have been added to this function? This code is correct
|
That's true, I don't think I realised at the time that it wasn't needed, but since
|
Yes, of course. Thanks for pointing it out. I think it is difficult with this implementation to check whether LR is actually spilled on not. The instruction selection happens before the register allocation in the pipeline. Therefore taking a conservative approach and not using R12 for tail calls when |
llvm/lib/Target/ARM/ARMInstrInfo.td
Outdated
@@ -10,10 +10,8 @@ | |||
// | |||
//===----------------------------------------------------------------------===// | |||
|
|||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unrelated change should be dropped.
llvm/lib/Target/ARM/ARMPredicates.td
Outdated
@@ -228,6 +228,10 @@ def DontGenExecuteOnly : Predicate<"!Subtarget->genExecuteOnly()">; | |||
def GenT1ExecuteOnly : Predicate<"Subtarget->genExecuteOnly() && " | |||
"Subtarget->isThumb1Only() && " | |||
"!Subtarget->hasV8MBaselineOps()">; | |||
let RecomputePerFunction = 1 in { | |||
def TailCallNotR12 : Predicate<[{ MF->getInfo<ARMFunctionInfo>()->shouldSignReturnAddress(true) }]>; | |||
def TailCall : Predicate<[{ !MF->getInfo<ARMFunctionInfo>()->shouldSignReturnAddress(true) }]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name isn't very descriptive, it doesn't affect the presence of tail-calls, just one detail of them. How about TailCallAnyReg
, or describe the condition instead of the effect, like SignRetAddr
and NoSignRetAddr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. I've changed the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if you remove the unrelated comment/whitespace change.
llvm/lib/Target/ARM/ARMInstrInfo.td
Outdated
@@ -10,10 +10,8 @@ | |||
// | |||
//===----------------------------------------------------------------------===// | |||
|
|||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unrelated change should be dropped.
When compiling for thumbv8.1m with +pacbti and making an indirect tail
call, the compiler was free to put the function pointer into R12.
This is incorrect because R12 is restored to contain authentication code
for the caller's return address.
This patch excludes R12 from the set of registers the compiler can put
the function pointer in.
Fixes #75998