-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
Problem
(This was originally discovered in numba/numba#8738 and is reduced down from there)
In finalizeLoad()
the GOTSectionID
and CurrentGOTIndex
are reset to 0 with the intention of resetting everything related to GOT sections for the next object, but the GOTOffsetMap
isn't cleared:
llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
Lines 2409 to 2410 in ff11d6b
GOTSectionID = 0; | |
CurrentGOTIndex = 0; |
This leaves stale entries in the map, which can prevent the creation of a GOT section and/or entries in the GOT for subsequent objects. If a GOT relocation in a later object happens to match one in the map when findOrAllocateGOTEntry()
checks to see if an entry already exists:
llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
Lines 2259 to 2261 in ff11d6b
auto E = GOTOffsetMap.insert({Value, 0}); | |
if (E.second) { | |
uint64_t GOTOffset = allocateGOTEntries(1); |
then it will never allocate a GOT entry for the new relocation (and doesn't allocate a GOT section at all, e.g. if there was just one GOT relocation). GOT relocations then get replaced with references to addresses in the first section (section 0), which is always invalid.
Reproducer
This can manifest on AArch64 with the following example:
accept_pointer.c
:
void accept_pointer(void (*f)(void)) { f(); }
send_pointer1.c
:
void accept_pointer(void *p);
void f1();
void send_pointer1() {
accept_pointer((void*)f1);
}
void f1() {}
send_pointer2.c
:
void accept_pointer(void *p);
void f1();
void send_pointer2() {
accept_pointer((void*)f1);
}
All compiled with:
gcc -fPIC -c accept_pointer.c
gcc -fPIC -c send_pointer1.c
gcc -fPIC -c send_pointer2.c
(gcc --version
is gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
on my system, but I don't think it matters too much for these simple files).
send_pointer1()
and send_pointer2()
have a GOT relocation to f1()
in them:
$ objdump -dr send_pointer1.o
send_pointer1.o: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <send_pointer1>:
0: a9bf7bfd stp x29, x30, [sp, #-16]!
4: 910003fd mov x29, sp
8: 90000000 adrp x0, 20 <f1>
8: R_AARCH64_ADR_GOT_PAGE f1
c: f9400000 ldr x0, [x0]
c: R_AARCH64_LD64_GOT_LO12_NC f1
10: 94000000 bl 0 <accept_pointer>
10: R_AARCH64_CALL26 accept_pointer
14: d503201f nop
18: a8c17bfd ldp x29, x30, [sp], #16
1c: d65f03c0 ret
0000000000000020 <f1>:
20: d503201f nop
24: d65f03c0 ret
$ objdump -dr send_pointer2.o
send_pointer2.o: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <send_pointer2>:
0: a9bf7bfd stp x29, x30, [sp, #-16]!
4: 910003fd mov x29, sp
8: 90000000 adrp x0, 0 <f1>
8: R_AARCH64_ADR_GOT_PAGE f1
c: f9400000 ldr x0, [x0]
c: R_AARCH64_LD64_GOT_LO12_NC f1
10: 94000000 bl 0 <accept_pointer>
10: R_AARCH64_CALL26 accept_pointer
14: d503201f nop
18: a8c17bfd ldp x29, x30, [sp], #16
1c: d65f03c0 ret
Loading these objects with llvm-rtdyld
and entering send_pointer2
results in a segfault:
$ llvm-rtdyld --execute --entry send_pointer2 accept_pointer.o send_pointer1.o send_pointer2.o
loaded 'send_pointer2' at: 0xffffb18ac000
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /home/gmarkall/numbadev/install-llvm/main-20230310/bin/llvm-rtdyld --execute --entry send_pointer2 c/accept_pointer.o c/send_pointer1.o c/send_pointer2.o
Segmentation fault (core dumped)
When I attached with GDB and looked at the send_pointer2
function, it looked like:
ffffafb6a078: a9bf7bfd stp x29, x30, [sp, #-16]!
ffffafb6a07c: 910003fd mov x29, sp
ffffafb6a080: 90000000 adrp x0, 0xffffafb6a000
ffffafb6a084: f9400000 ldr x0, [x0]
ffffafb6a088: 97ffffde bl 0xffffafb6a000
ffffafb6a08c: d503201f nop
ffffafb6a090: a8c17bfd ldp x29, x30, [sp], #16
ffffafb6a094: d65f03c0 ret
ffffafb6a098: d503201f nop
ffffafb6a09c: d65f03c0 ret
The address in the adrp
instruction, 0xffffafb6a000
, is the beginning of the first section, the .text
section of accept_pointer()
:
(gdb) x/2x 0xffffafb6a000
0xffffafb6a000 <accept_pointer>: 0xd10043ff 0xf90007e0
i.e. offset 0 in section 0, which was mistaken for an already-extant GOT entry due to the stale entry in the GOTOffsetMap
.
Files
I couldn't attach the objects here because GitHub doesn't support them, but the object files can be obtained from https://github.com/gmarkall/numba-issue-8738/tree/main/c for convenience:
Patch / fix
I think the way to resolve this is to clear the GOTOffsetMap
in finalizeLoad()
:
diff --git a/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp b/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
index 3c7f4ec47eb8..205ee5273b27 100644
--- a/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
+++ b/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
@@ -2407,6 +2407,7 @@ Error RuntimeDyldELF::finalizeLoad(const ObjectFile &Obj,
}
GOTSectionID = 0;
+ GOTOffsetMap.clear();
CurrentGOTIndex = 0;
return Error::success();
Making this change and rebuilding results in:
- The reproducer above running to completion as expected on AArch64.
- All the failing tests identified in LLVM14 linux-aarch64 blocker numba/numba#8738 running successfully to completion on AArch64.
- All unit and regression tests (
ninja check-llvm
andninja check-llvm-unit
) green (no unexpected passes / fails) on AArch64 and x86_64.
I had planned to submit a patch - however, I'm not quite sure how to go about adding a test case for this scenario given that it involves multiple objects - can you provide any guidance here please?