Skip to content

Conversation

@bgergely0
Copy link
Contributor

@bgergely0 bgergely0 commented Nov 17, 2025

Added the --force-bti-plt flag to make sure the PLTs emitted by AArch64BtiPac have a BTI c landing pad regardless of indirect calls inside the linked binary.
Post-link optimizations may require adding new thunks. Having this flag would simplify these optimizations by requiring less patching by tools.

Note that always adding a BTI c to PLT stubs is the default behaviour for GNU BFD.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@bgergely0 bgergely0 marked this pull request as ready for review November 17, 2025 13:28
@bgergely0 bgergely0 requested a review from MaskRay November 17, 2025 13:28
@llvmbot
Copy link
Member

llvmbot commented Nov 17, 2025

@llvm/pr-subscribers-lld-elf

Author: Gergely Bálint (bgergely0)

Changes

Added the --force-bti-plt flag to make sure the PacBtiPlts have a BTI c landing pad regardless of indirect calls inside the linked binary.
Post-link optimizations may require adding new thunks. Having this flag would simplify these optimizations by requiring less patching by tools.

Note that always adding a BTI c to PLT stubs is the default behaviour for GNU BFD.


Full diff: https://github.com/llvm/llvm-project/pull/168365.diff

5 Files Affected:

  • (modified) lld/ELF/Arch/AArch64.cpp (+2-2)
  • (modified) lld/ELF/Config.h (+1)
  • (modified) lld/ELF/Driver.cpp (+1)
  • (modified) lld/ELF/Options.td (+2)
  • (modified) lld/test/ELF/aarch64-feature-bti.s (+7)
diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 2a97df4785ecb..a9702fde05b9b 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -1186,8 +1186,8 @@ void AArch64BtiPac::writePlt(uint8_t *buf, const Symbol &sym,
   // address may escape if referenced by a direct relocation. If relative
   // vtables are used then if the vtable is in a shared object the offsets will
   // be to the PLT entry. The condition is conservative.
-  bool hasBti = btiHeader &&
-                (sym.hasFlag(NEEDS_COPY) || sym.isInIplt || sym.thunkAccessed);
+  bool hasBti = btiHeader && (sym.hasFlag(NEEDS_COPY) || sym.isInIplt ||
+                              sym.thunkAccessed || ctx.arg.forceBtiPlt);
   if (hasBti) {
     memcpy(buf, btiData, sizeof(btiData));
     buf += sizeof(btiData);
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 8ec5a2c04e71c..b1060bf165ac8 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -330,6 +330,7 @@ struct Config {
   bool exportDynamic;
   bool fixCortexA53Errata843419;
   bool fixCortexA8;
+  bool forceBtiPlt = false;
   bool formatBinary = false;
   bool fortranCommon;
   bool gcSections;
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 8647752be31fe..14c6b6f8a1dad 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1486,6 +1486,7 @@ static void readConfigs(Ctx &ctx, opt::InputArgList &args) {
   ctx.arg.nmagic = args.hasFlag(OPT_nmagic, OPT_no_nmagic, false);
   ctx.arg.noinhibitExec = args.hasArg(OPT_noinhibit_exec);
   ctx.arg.nostdlib = args.hasArg(OPT_nostdlib);
+  ctx.arg.forceBtiPlt = args.hasArg(OPT_bti_plt);
   ctx.arg.oFormatBinary = isOutputFormatBinary(ctx, args);
   ctx.arg.omagic = args.hasFlag(OPT_omagic, OPT_no_omagic, false);
   ctx.arg.optRemarksFilename = args.getLastArgValue(OPT_opt_remarks_filename);
diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td
index 75184de496448..2059e0c62d748 100644
--- a/lld/ELF/Options.td
+++ b/lld/ELF/Options.td
@@ -212,6 +212,8 @@ defm eh_frame_hdr: B<"eh-frame-hdr",
 
 def emit_relocs: F<"emit-relocs">, HelpText<"Generate relocations in output">;
 
+def bti_plt: FF<"force-bti-plt">, HelpText<"Force all PLT entries to have BTI">;
+
 def enable_new_dtags: F<"enable-new-dtags">,
   HelpText<"Enable new dynamic tags (default)">;
 
diff --git a/lld/test/ELF/aarch64-feature-bti.s b/lld/test/ELF/aarch64-feature-bti.s
index 8d7c1f2826c17..191e3710054f8 100644
--- a/lld/test/ELF/aarch64-feature-bti.s
+++ b/lld/test/ELF/aarch64-feature-bti.s
@@ -262,6 +262,13 @@
 # REPORT-ERR: error: unknown -z bti-report= value: u{{$}}
 # REPORT-EMPTY:
 
+## Force all PLT entries to start with BTI with the force-bti-plt command line option.
+# RUN: ld.lld %t.o %t2.o -z force-bti --force-bti-plt %t.so -o %tforcebtiplt.exe
+# RUN: llvm-objdump --no-print-imm-hex -d --mattr=+bti --no-show-raw-insn %tforcebtiplt.exe | FileCheck --check-prefix=FORCE-BTI %s
+
+# FORCE-BTI: 00000000002103a0 <func2@plt>:
+# FORCE-BTI-NEXT:   2103a0: bti c
+
 .section ".note.gnu.property", "a"
 .long 4
 .long 0x10

@llvmbot
Copy link
Member

llvmbot commented Nov 17, 2025

@llvm/pr-subscribers-lld

Author: Gergely Bálint (bgergely0)

Changes

Added the --force-bti-plt flag to make sure the PacBtiPlts have a BTI c landing pad regardless of indirect calls inside the linked binary.
Post-link optimizations may require adding new thunks. Having this flag would simplify these optimizations by requiring less patching by tools.

Note that always adding a BTI c to PLT stubs is the default behaviour for GNU BFD.


Full diff: https://github.com/llvm/llvm-project/pull/168365.diff

5 Files Affected:

  • (modified) lld/ELF/Arch/AArch64.cpp (+2-2)
  • (modified) lld/ELF/Config.h (+1)
  • (modified) lld/ELF/Driver.cpp (+1)
  • (modified) lld/ELF/Options.td (+2)
  • (modified) lld/test/ELF/aarch64-feature-bti.s (+7)
diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 2a97df4785ecb..a9702fde05b9b 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -1186,8 +1186,8 @@ void AArch64BtiPac::writePlt(uint8_t *buf, const Symbol &sym,
   // address may escape if referenced by a direct relocation. If relative
   // vtables are used then if the vtable is in a shared object the offsets will
   // be to the PLT entry. The condition is conservative.
-  bool hasBti = btiHeader &&
-                (sym.hasFlag(NEEDS_COPY) || sym.isInIplt || sym.thunkAccessed);
+  bool hasBti = btiHeader && (sym.hasFlag(NEEDS_COPY) || sym.isInIplt ||
+                              sym.thunkAccessed || ctx.arg.forceBtiPlt);
   if (hasBti) {
     memcpy(buf, btiData, sizeof(btiData));
     buf += sizeof(btiData);
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 8ec5a2c04e71c..b1060bf165ac8 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -330,6 +330,7 @@ struct Config {
   bool exportDynamic;
   bool fixCortexA53Errata843419;
   bool fixCortexA8;
+  bool forceBtiPlt = false;
   bool formatBinary = false;
   bool fortranCommon;
   bool gcSections;
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 8647752be31fe..14c6b6f8a1dad 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1486,6 +1486,7 @@ static void readConfigs(Ctx &ctx, opt::InputArgList &args) {
   ctx.arg.nmagic = args.hasFlag(OPT_nmagic, OPT_no_nmagic, false);
   ctx.arg.noinhibitExec = args.hasArg(OPT_noinhibit_exec);
   ctx.arg.nostdlib = args.hasArg(OPT_nostdlib);
+  ctx.arg.forceBtiPlt = args.hasArg(OPT_bti_plt);
   ctx.arg.oFormatBinary = isOutputFormatBinary(ctx, args);
   ctx.arg.omagic = args.hasFlag(OPT_omagic, OPT_no_omagic, false);
   ctx.arg.optRemarksFilename = args.getLastArgValue(OPT_opt_remarks_filename);
diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td
index 75184de496448..2059e0c62d748 100644
--- a/lld/ELF/Options.td
+++ b/lld/ELF/Options.td
@@ -212,6 +212,8 @@ defm eh_frame_hdr: B<"eh-frame-hdr",
 
 def emit_relocs: F<"emit-relocs">, HelpText<"Generate relocations in output">;
 
+def bti_plt: FF<"force-bti-plt">, HelpText<"Force all PLT entries to have BTI">;
+
 def enable_new_dtags: F<"enable-new-dtags">,
   HelpText<"Enable new dynamic tags (default)">;
 
diff --git a/lld/test/ELF/aarch64-feature-bti.s b/lld/test/ELF/aarch64-feature-bti.s
index 8d7c1f2826c17..191e3710054f8 100644
--- a/lld/test/ELF/aarch64-feature-bti.s
+++ b/lld/test/ELF/aarch64-feature-bti.s
@@ -262,6 +262,13 @@
 # REPORT-ERR: error: unknown -z bti-report= value: u{{$}}
 # REPORT-EMPTY:
 
+## Force all PLT entries to start with BTI with the force-bti-plt command line option.
+# RUN: ld.lld %t.o %t2.o -z force-bti --force-bti-plt %t.so -o %tforcebtiplt.exe
+# RUN: llvm-objdump --no-print-imm-hex -d --mattr=+bti --no-show-raw-insn %tforcebtiplt.exe | FileCheck --check-prefix=FORCE-BTI %s
+
+# FORCE-BTI: 00000000002103a0 <func2@plt>:
+# FORCE-BTI-NEXT:   2103a0: bti c
+
 .section ".note.gnu.property", "a"
 .long 4
 .long 0x10

@bgergely0
Copy link
Contributor Author

bgergely0 commented Nov 17, 2025

I'm interested in reviewer's thoughts on the security implications: I'd say that being able to call all PLT entries is not a significant reduction in the security guarantees provided by BTI.

(Especially since this is the default in the GNU toolchain.)

Thoughts?

@smithp35
Copy link
Collaborator

My thoughts on security.

The PLT sequence is itself an indirect jump, so a target function defined in the same executable/shared-library that the PLT entry is transferring control to must have a BTI compatible instruction. If an attacker has an arbitrary write primitive then they could just target the function in question directly. Adding a BTI is not a significant loss in that case.

If the PLT entry is targeting another executable/shared-library then the destination may not have a BTI compatible instruction (BTI property is per shared-object not per program), and with lazy loading the attacker may not know the address of the destination (but they do know the address of the PLT). Not putting a BTI at the start of the PLT entry for "imported" symbols does offer some additional protection.

It is difficult to quantify how much effect on security this would have in practice. Given that on most systems using BTI, the whole program (executable and shared-objects) are likely to have BTI enabled, I suspect not much.

One thing that could be measured is whether the addition of an additional BTI instruction in all PLT sequences has any measurable affect on performance. My expectation is that this will be lost in the noise of an out-of-order implementation, but on the simplest in-order implementations like the A320 it may be noticeable if loops are involved.

Another factor to consider is how much tool implementation detail BOLT is willing to depend on. For example GNU ld happens to put BTI at the start of each PLT entry, but are there any guarantees that will continue? Perhaps this should be written into the ELF conventions https://github.com/ARM-software/abi-aa/tree/main/baabielf64 if BOLT is considering this ABI.

Copy link
Contributor Author

Thanks for the feedback Peter!

how much effect on security this would have in practice. Given that on most systems using BTI, the whole program (executable and shared-objects) are likely to have BTI enabled, I suspect not much.

Yes, this is in line with what I imagined.

on the simplest in-order implementations like the A320 it may be noticeable if loops are involved

thanks, good point to consider!

how much tool implementation detail BOLT is willing to depend on

We can easily check if the PLTs are generated according to expectations in BOLT, so it wouldn't create binaries with incorrect BTI property.
As of writing it into the ELF conventions, I'm not sure if that would be necessary. I believe this problem (patching PLTs) could be solved in BOLT, and if so, changing the ABI would be overkill. (Does it even make sense to add non-default behaviours in the ABI?)

The reason I still opened this PR is that while technically speaking this should be solvable in BOLT, it would definately be a much more involved change than this (+7 line code change, and +6 line test).

@smithp35
Copy link
Collaborator

smithp35 commented Nov 17, 2025

Personally I would prefer not to have a command-line option as I think it would be easily missed by users. For example if I'm using BOLT with -mbranch-protection then I must remember to add a new flag. I'd prefer that we either choose to move LLD to always put BTI, or BOLT assumes that it has to deal with a PLT entry without one. Would like to get some feedback from other reviewers on that.

Looking at GNU ld in more detail it looks like it can generate PLT[N] entries without a BTI in some circumstances. This might tip the scales towards BOLT having to handle PLT entries without one, or else we'll need a corresponding binutils change.

Checking back through the history. When I first wrote the patch in https://reviews.llvm.org/D62609 I had a BTI on each entry with a FIXME in the code to optimise later. I found this line in the review:

ld.bfd does not put "bti c" on the plt[N] entries when --pie is used. This is because ld.bfd can guarantee that the PLT address won't escape, however there is one small case in LLD where an ifunc has its address taken using a non-got reference when the PLT address can escape. There is potential for optimisation there, but at the moment we have to choose the PLT size early so it isn't easy to apply.

I can confirm this with something like:

aarch64-none-linux-gnu-gcc sharedused.c -fpie -mbranch-protection=standard -osharedused.axf shared.so -Wl,-z force-bti --pie

For a trivial shared library with two functions f and g

00000000000006c0 <.plt>:
     6c0: d503245f      bti     c
     6c4: a9bf7bf0      stp     x16, x30, [sp, #-0x10]!
     6c8: f00000f0      adrp    x16, 0x1f000 <__FRAME_END__+0x1e668>
     6cc: f947fe11      ldr     x17, [x16, #0xff8]
     6d0: 913fe210      add     x16, x16, #0xff8
     6d4: d61f0220      br      x17
     6d8: d503201f      nop
     6dc: d503201f      nop
...
0000000000000700 <f@plt>:
     700: 90000110      adrp    x16, 0x20000 <_GLOBAL_OFFSET_TABLE_+0x48>
     704: f9400a11      ldr     x17, [x16, #0x10]
     708: 91004210      add     x16, x16, #0x10
     70c: d61f0220      br      x17
...
0000000000000730 <g@plt>:
     730: 90000110      adrp    x16, 0x20000 <_GLOBAL_OFFSET_TABLE_+0x48>
     734: f9401611      ldr     x17, [x16, #0x28]
     738: 9100a210      add     x16, x16, #0x28

@bgergely0
Copy link
Contributor Author

Thanks. I missed this part. Rechecking now: it's also not true that for BTI executables, PLTs always have BTIs. In smaller examples I don't even see nop placeholders. I assume unless at least one PLT is called indirectly, they are all without BTI to save space. (This is still BFD.)

Conclusion: however tempting this small change was, it is insufficient to solve the problem it aimed to solve.

Thanks for taking a look, and sharing your insights @smithp35!

@bgergely0 bgergely0 closed this Nov 17, 2025
@bgergely0 bgergely0 deleted the users/bgergely0/lld-force-bti-plt branch November 17, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants