Skip to content

RuntimeDyldELF doesn't clear GOTOffsetMap in finalizeLoad(), leading to invalid GOT relocations on AArch64 #61402

@gmarkall

Description

@gmarkall

Problem

(This was originally discovered in numba/numba#8738 and is reduced down from there)

In finalizeLoad() the GOTSectionID and CurrentGOTIndex are reset to 0 with the intention of resetting everything related to GOT sections for the next object, but the GOTOffsetMap isn't cleared:

GOTSectionID = 0;
CurrentGOTIndex = 0;

This leaves stale entries in the map, which can prevent the creation of a GOT section and/or entries in the GOT for subsequent objects. If a GOT relocation in a later object happens to match one in the map when findOrAllocateGOTEntry() checks to see if an entry already exists:

auto E = GOTOffsetMap.insert({Value, 0});
if (E.second) {
uint64_t GOTOffset = allocateGOTEntries(1);

then it will never allocate a GOT entry for the new relocation (and doesn't allocate a GOT section at all, e.g. if there was just one GOT relocation). GOT relocations then get replaced with references to addresses in the first section (section 0), which is always invalid.

Reproducer

This can manifest on AArch64 with the following example:

accept_pointer.c:

void accept_pointer(void (*f)(void)) { f(); }

send_pointer1.c:

void accept_pointer(void *p);
void f1();

void send_pointer1() {
  accept_pointer((void*)f1);
}

void f1() {}

send_pointer2.c:

void accept_pointer(void *p);
void f1();

void send_pointer2() {
  accept_pointer((void*)f1);
}

All compiled with:

gcc -fPIC -c accept_pointer.c
gcc -fPIC -c send_pointer1.c
gcc -fPIC -c send_pointer2.c

(gcc --version is gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on my system, but I don't think it matters too much for these simple files).

send_pointer1() and send_pointer2() have a GOT relocation to f1() in them:

$ objdump -dr send_pointer1.o 

send_pointer1.o:     file format elf64-littleaarch64


Disassembly of section .text:

0000000000000000 <send_pointer1>:
   0:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
   4:	910003fd 	mov	x29, sp
   8:	90000000 	adrp	x0, 20 <f1>
			8: R_AARCH64_ADR_GOT_PAGE	f1
   c:	f9400000 	ldr	x0, [x0]
			c: R_AARCH64_LD64_GOT_LO12_NC	f1
  10:	94000000 	bl	0 <accept_pointer>
			10: R_AARCH64_CALL26	accept_pointer
  14:	d503201f 	nop
  18:	a8c17bfd 	ldp	x29, x30, [sp], #16
  1c:	d65f03c0 	ret

0000000000000020 <f1>:
  20:	d503201f 	nop
  24:	d65f03c0 	ret
$ objdump -dr send_pointer2.o 

send_pointer2.o:     file format elf64-littleaarch64


Disassembly of section .text:

0000000000000000 <send_pointer2>:
   0:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
   4:	910003fd 	mov	x29, sp
   8:	90000000 	adrp	x0, 0 <f1>
			8: R_AARCH64_ADR_GOT_PAGE	f1
   c:	f9400000 	ldr	x0, [x0]
			c: R_AARCH64_LD64_GOT_LO12_NC	f1
  10:	94000000 	bl	0 <accept_pointer>
			10: R_AARCH64_CALL26	accept_pointer
  14:	d503201f 	nop
  18:	a8c17bfd 	ldp	x29, x30, [sp], #16
  1c:	d65f03c0 	ret

Loading these objects with llvm-rtdyld and entering send_pointer2 results in a segfault:

$ llvm-rtdyld --execute --entry send_pointer2 accept_pointer.o send_pointer1.o send_pointer2.o
loaded 'send_pointer2' at: 0xffffb18ac000
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/gmarkall/numbadev/install-llvm/main-20230310/bin/llvm-rtdyld --execute --entry send_pointer2 c/accept_pointer.o c/send_pointer1.o c/send_pointer2.o
Segmentation fault (core dumped)

When I attached with GDB and looked at the send_pointer2 function, it looked like:

    ffffafb6a078:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
    ffffafb6a07c:       910003fd        mov     x29, sp
    ffffafb6a080:       90000000        adrp    x0, 0xffffafb6a000
    ffffafb6a084:       f9400000        ldr     x0, [x0]
    ffffafb6a088:       97ffffde        bl      0xffffafb6a000
    ffffafb6a08c:       d503201f        nop
    ffffafb6a090:       a8c17bfd        ldp     x29, x30, [sp], #16
    ffffafb6a094:       d65f03c0        ret
    ffffafb6a098:       d503201f        nop
    ffffafb6a09c:       d65f03c0        ret

The address in the adrp instruction, 0xffffafb6a000, is the beginning of the first section, the .text section of accept_pointer():

(gdb) x/2x 0xffffafb6a000
0xffffafb6a000 <accept_pointer>:        0xd10043ff      0xf90007e0

i.e. offset 0 in section 0, which was mistaken for an already-extant GOT entry due to the stale entry in the GOTOffsetMap.

Files

I couldn't attach the objects here because GitHub doesn't support them, but the object files can be obtained from https://github.com/gmarkall/numba-issue-8738/tree/main/c for convenience:

Patch / fix

I think the way to resolve this is to clear the GOTOffsetMap in finalizeLoad():

diff --git a/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp b/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
index 3c7f4ec47eb8..205ee5273b27 100644
--- a/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
+++ b/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
@@ -2407,6 +2407,7 @@ Error RuntimeDyldELF::finalizeLoad(const ObjectFile &Obj,
   }

   GOTSectionID = 0;
+  GOTOffsetMap.clear();
   CurrentGOTIndex = 0;

   return Error::success();

Making this change and rebuilding results in:

  • The reproducer above running to completion as expected on AArch64.
  • All the failing tests identified in LLVM14 linux-aarch64 blocker numba/numba#8738 running successfully to completion on AArch64.
  • All unit and regression tests (ninja check-llvm and ninja check-llvm-unit) green (no unexpected passes / fails) on AArch64 and x86_64.

I had planned to submit a patch - however, I'm not quite sure how to go about adding a test case for this scenario given that it involves multiple objects - can you provide any guidance here please?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions