-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MTE] [lld] Don't tag symbols in sections with implicit start/stop #73531
Conversation
Found after a new Mali GPU driver drop into Android, but I'm guessing the reason why the implicit start/stop feature exists is to facilitate exactly this type of iteration. When GVs are explicitly placed into a section, and that section is a valid C identifier, then lld will create synthetic _start and _stop symbols for the bounds of that section. The Mali code in question then iterates over the GVs by using the _start and _stop symbols as bounds. This falls over under MTE, as each individual GV has tags (and the _start and _stop symbols are untagged), and so you get SIGSEGV/MTE[AS]ERR at runtime. Given this is probably a by-design pattern, let's exclude any GVs from being tagged if they're going into a section with implicit start/stop symbols.
@llvm/pr-subscribers-lld-elf @llvm/pr-subscribers-lld Author: Mitch Phillips (hctim) ChangesFound after a new Mali GPU driver drop into Android, but I'm guessing When GVs are explicitly placed into a section, and that section is a Given this is probably a by-design pattern, let's exclude any GVs from Full diff: https://github.com/llvm/llvm-project/pull/73531.diff 2 Files Affected:
diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 048f0ec30ebd283..0dff31c6d0379ed 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -999,6 +999,18 @@ addTaggedSymbolReferences(InputSectionBase &sec,
// like functions or TLS symbols.
if (sym.type != STT_OBJECT)
continue;
+ // Global variables can be explicitly put into sections where the section
+ // name is a valid C identifier. In these cases, the linker will create
+ // implicit __start and __stop symbols for the section. Globals that are in
+ // these special sections often are done there intentionally so they can be
+ // iterated over by going from __start_<secname> -> __stop_<secname>. This
+ // doesn't work under MTE globals, because each GV would have its own unique
+ // tag. So, symbols that are destined for a special section with start/stop
+ // symbols should go untagged implicitly.
+ const Defined* defined_sym = dyn_cast<Defined>(&sym);
+ if (defined_sym && defined_sym->section &&
+ isValidCIdentifier(defined_sym->section->name))
+ continue;
// STB_LOCAL symbols can't be referenced from outside the object file, and
// thus don't need to be checked for references from other object files.
if (sym.binding == STB_LOCAL) {
diff --git a/lld/test/ELF/aarch64-memtag-startstop.s b/lld/test/ELF/aarch64-memtag-startstop.s
new file mode 100644
index 000000000000000..a8bdcd8929d4879
--- /dev/null
+++ b/lld/test/ELF/aarch64-memtag-startstop.s
@@ -0,0 +1,48 @@
+# REQUIRES: aarch64
+# RUN: llvm-mc -filetype=obj -triple=aarch64-none-linux-android %s -o %t
+# RUN: ld.lld %t -o %t.so -shared --android-memtag-mode=sync
+
+## Normally relocations are printed before the symbol tables, so reorder it a
+## bit to make it easier on matching addresses of relocations up with the
+## symbols.
+# RUN: llvm-readelf %t.so -s > %t.out
+# RUN: llvm-readelf %t.so --section-headers --relocs --memtag >> %t.out
+# RUN: FileCheck %s < %t.out
+
+# CHECK: Symbol table '.dynsym' contains
+# CHECK-DAG: [[#%x,GLOBAL:]] 16 OBJECT GLOBAL DEFAULT [[#]] global{{$}}
+# CHECK-DAG: [[#%x,GLOBAL_IN_SECTION:]] 16 OBJECT GLOBAL DEFAULT [[#]] global_in_section{{$}}
+
+# CHECK: Section Headers:
+# CHECK: .memtag.globals.dynamic AARCH64_MEMTAG_GLOBALS_DYNAMIC
+# CHECK-NOT: .memtag.globals.static
+# CHECK-NOT: AARCH64_MEMTAG_GLOBALS_STATIC
+
+# CHECK: Memtag Dynamic Entries
+# CHECK-NEXT: AARCH64_MEMTAG_MODE: Synchronous (0)
+# CHECK-NEXT: AARCH64_MEMTAG_HEAP: Disabled (0)
+# CHECK-NEXT: AARCH64_MEMTAG_STACK: Disabled (0)
+# CHECK-NEXT: AARCH64_MEMTAG_GLOBALS: 0x{{[0-9a-f]+}}
+# CHECK-NEXT: AARCH64_MEMTAG_GLOBALSSZ: 3
+
+# CHECK: Memtag Global Descriptors:
+# CHECK-NEXT: 0x[[#GLOBAL]]: 0x10
+# CHECK-NOT: 0x
+
+ .memtag global
+ .type global,@object
+ .bss
+ .globl global
+ .p2align 4, 0x0
+global:
+ .zero 16
+ .size global, 16
+
+ .memtag global_in_section // @global_in_section
+ .type global_in_section,@object
+ .section metadata_strings,"aw",@progbits
+ .globl global_in_section
+ .p2align 4, 0x0
+global_in_section:
+ .zero 16
+ .size global_in_section, 16
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
This is an interesting case.
Consider just adding a link.
Double underscores: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
|
||
# CHECK: Memtag Global Descriptors: | ||
# CHECK-NEXT: 0x[[#GLOBAL]]: 0x10 | ||
# CHECK-NOT: 0x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# CHECK-NOT: {{.}}
since there is an EOF
@@ -0,0 +1,48 @@ | |||
# REQUIRES: aarch64 | |||
# RUN: llvm-mc -filetype=obj -triple=aarch64-none-linux-android %s -o %t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%t.o
for relocatable files
@@ -0,0 +1,48 @@ | |||
# REQUIRES: aarch64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an extra test for the basic test aarch64-memtag-globals.s
, so the name can be aarch64-memtag-globals-startstop.s
// doesn't work under MTE globals, because each GV would have its own unique | ||
// tag. So, symbols that are destined for a special section with start/stop | ||
// symbols should go untagged implicitly. | ||
const Defined *defined_sym = dyn_cast<Defined>(&sym); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use the a_b
naming. You can just name it d
: const auto *d = dyn_cast<Defined>(&sym)
@@ -999,6 +999,18 @@ addTaggedSymbolReferences(InputSectionBase &sec, | |||
// like functions or TLS symbols. | |||
if (sym.type != STT_OBJECT) | |||
continue; | |||
// Global variables can be explicitly put into sections where the section | |||
// name is a valid C identifier. In these cases, the linker will create | |||
// implicit __start and __stop symbols for the section. Globals that are in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is inaccurate: __start_/__stop_
are only defined when used. This sentence reads as the linker always defines the symbols.
Suggest: For a global variable placed in a section whose name is a valid C identifier, it's possible that the variable may be accessed using addStartStopSymbols defined symbols. The pointer accessing the global variable is derived from the untagged start/stop symbols, so the global variable should be untagged as well.
.memtag global_in_section // @global_in_section | ||
.type global_in_section,@object | ||
.section metadata_strings,"aw",@progbits | ||
.globl global_in_section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we also need a test case for a STB_LOCAL symbol.
What about things like We hit a similar issue with ELF's lack of expressiveness for CHERI and have to add more heuristics than just this case. |
STT_OBJECT defined in custom section names (possibly non-C-identifier-name) with manually-defined encapsulation symbols. Interesting case. Do you have some examples? I haven't seen such code. |
|
It's best to do this in the LLVM pass instead. Will make the changes over there. |
Thanks. Handling global variable instrumentation with an explicit section in the first place may be more elegant and handle more cases (for example user-defined encapsulation symbols in linker scripts). |
Found after a new Mali GPU driver drop into Android, but I'm guessing
the reason why the implicit start/stop feature exists is to facilitate
exactly this type of iteration.
When GVs are explicitly placed into a section, and that section is a
valid C identifier, then lld will create synthetic _start and _stop
symbols for the bounds of that section. The Mali code in question then
iterates over the GVs by using the _start and _stop symbols as bounds.
This falls over under MTE, as each individual GV has tags (and the
_start and _stop symbols are untagged), and so you get
SIGSEGV/MTE[AS]ERR at runtime.
Given this is probably a by-design pattern, let's exclude any GVs from
being tagged if they're going into a section with implicit start/stop
symbols.