Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LD_PRELOAD jemalloc with LD_AUDIT library segfaults #2472

Open
ianoversbygs opened this issue Jul 5, 2023 · 9 comments
Open

LD_PRELOAD jemalloc with LD_AUDIT library segfaults #2472

ianoversbygs opened this issue Jul 5, 2023 · 9 comments

Comments

@ianoversbygs
Copy link

What version of jemalloc are you using?

Version 5.3.0
Also tested trunk

What operating system and version?

Linux - Red hat Enterprise Linux 8 (rpm --query redhat-release reports: redhat-release-8.6-0.1.el8.x86_64)
Also tested on debian

What runtime / compiler are you using?

g++ 6.3.1

What did you do?

Created a simple library suitable for loading with LD_AUDIT
Ran 'ls' with the following LD_PRELOAD and LD_AUDIT:
LD_PRELOAD=$MY_SCRATCH_DIR/mylibs/lib/libjemalloc.so LD_AUDIT=libsimple.so ls

(note: this causes all executables that I have tried to fail in the same way)

What did you expect to see?

The output from ls

What did you see instead?

When jemalloc is compiled with the defaults ls gives the following error message:

ls: error while loading shared libraries: libjemalloc.so: cannot allocate memory in static TLS block

When jemalloc is compiled with the --disable-initial-exec-tls flag ls segfaults
I found the flag mentioned in this issue: #1237

Source Code for repro:

#include <link.h>

extern "C" unsigned int la_version( unsigned int version )
{
    return 1;
}

Compile with: g++ -m64 simple.cpp -shared -ldl -o libsimple.so

The stack trace of the core dump from the test using a jemalloc built with --disable-initial-exec-tls is:

#0  __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#1  0x00007f52de2bb999 in te_malloc_fastpath_ctx (threshold=<optimized out>, allocated=<optimized out>, tsd=<optimized out>)
    at include/jemalloc/internal/thread_event.h:115
#2  imalloc_fastpath (fallback_alloc=0x7f52de2bb050 <je_malloc_default>, size=2632) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:291
#3  malloc (size=2632) at src/jemalloc.c:2746
#4  0x00007f52df86a71e in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56
#5  allocate_dtv_entry (size=<optimized out>, alignment=8) at ../elf/dl-tls.c:668
#6  allocate_and_init (map=0x7f52dfa6a210) at ../elf/dl-tls.c:693
#7  tls_get_addr_tail (dtv=0x7f52dfa7f800, the_map=0x7f52dfa6a210, ti=<optimized out>, ti=<optimized out>) at ../elf/dl-tls.c:891
#8  0x00007f52df854dbc in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#9  0x00007f52de2bb999 in te_malloc_fastpath_ctx (threshold=<optimized out>, allocated=<optimized out>, tsd=<optimized out>)
    at include/jemalloc/internal/thread_event.h:115
#10 imalloc_fastpath (fallback_alloc=0x7f52de2bb050 <je_malloc_default>, size=2632) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:291
#11 malloc (size=2632) at src/jemalloc.c:2746
#12 0x00007f52df86a71e in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56
#13 allocate_dtv_entry (size=<optimized out>, alignment=8) at ../elf/dl-tls.c:668
#14 allocate_and_init (map=0x7f52dfa6a210) at ../elf/dl-tls.c:693
#15 tls_get_addr_tail (dtv=0x7f52dfa7f800, the_map=0x7f52dfa6a210, ti=<optimized out>, ti=<optimized out>) at ../elf/dl-tls.c:891
#16 0x00007f52df854dbc in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#17 0x00007f52de2bb999 in te_malloc_fastpath_ctx (threshold=<optimized out>, allocated=<optimized out>, tsd=<optimized out>)
    at include/jemalloc/internal/thread_event.h:115
#18 imalloc_fastpath (fallback_alloc=0x7f52de2bb050 <je_malloc_default>, size=2632) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:291
#19 malloc (size=2632) at src/jemalloc.c:2746
#20 0x00007f52df86a71e in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56
#21 allocate_dtv_entry (size=<optimized out>, alignment=8) at ../elf/dl-tls.c:668
#22 allocate_and_init (map=0x7f52dfa6a210) at ../elf/dl-tls.c:693
#23 tls_get_addr_tail (dtv=0x7f52dfa7f800, the_map=0x7f52dfa6a210, ti=<optimized out>, ti=<optimized out>) at ../elf/dl-tls.c:891
#24 0x00007f52df854dbc in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55

Note: https://bugzilla.redhat.com/show_bug.cgi?id=1878932 mentions a similar sounding problem.

@interwq
Copy link
Member

interwq commented Jul 5, 2023

Similar to the issue you linked, this appears to be a circular dependency / init order issue, i.e. jemalloc is used before the TLS gets initialized, while jemalloc depends on a functioning TLS. The stack trace you shared shows: TLS-init calls malloc (which links to jemalloc), and then jemalloc tries to access TLS, which in turns triggers TLS init again.

I can't think of a good solution right now. Maybe try avoiding allocations in the AUDIT lib, or make it not using jemalloc? TLS and pthreads are hard dependencies and it almost certainly will cause issues if not initialized before jemalloc.

@ianoversbygs
Copy link
Author

The test audit lib just returns 1 from la_version so is it the loader that is allocating?

Would it make sense to have a bootstrap TLS that has a simple allocator that doesn't use malloc, e.g. allocating out of a static char array?

@interwq
Copy link
Member

interwq commented Jul 6, 2023

The TLS storage actually isn't managed by jemalloc. On Linux it's accessing the __thread directly, e.g.:

Does this only happen when adding the AUDIT lib? Can you try changing the below to true and see if it fixes the issue?

@ianoversbygs
Copy link
Author

With that change it still crashes but in a different place:

#0  0x00007fa56fb172c8 in clock_gettime@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x00007fa57029d80f in nstime_get (time=0x7ffc2dd3e870) at src/nstime.c:192
#2  nstime_update_impl (time=0x7ffc2dd3e870) at src/nstime.c:268
#3  je_nstime_init_update (time=time@entry=0x7ffc2dd3e870) at src/nstime.c:280
#4  0x00007fa57029d252 in je_malloc_mutex_lock_slow (mutex=mutex@entry=0x7fa56e8000e0) at src/mutex.c:76
#5  0x00007fa570248ba1 in malloc_mutex_lock (mutex=0x7fa56e8000e0, tsdn=0x0) at include/jemalloc/internal/mutex.h:217
#6  base_alloc_impl (esn=0x0, alignment=64, size=<optimized out>, base=0x7fa56e8000c0, tsdn=0x0) at src/base.c:442
#7  je_base_alloc (tsdn=tsdn@entry=0x0, base=base@entry=0x7fa56e8000c0, size=<optimized out>, alignment=alignment@entry=64) at src/base.c:479
#8  0x00007fa5702421ec in je_arena_new (tsdn=tsdn@entry=0x0, ind=ind@entry=0, config=config@entry=0x7fa5704cad30 <je_arena_config_default>) at src/arena.c:1615
#9  0x00007fa570232385 in arena_init_locked (config=0x7fa5704cad30 <je_arena_config_default>, ind=0, tsdn=0x0) at src/jemalloc.c:415
#10 je_arena_init (tsdn=tsdn@entry=0x0, ind=ind@entry=0, config=0x7fa5704cad30 <je_arena_config_default>) at src/jemalloc.c:443
#11 0x00007fa5702325cc in malloc_init_hard_a0_locked () at src/jemalloc.c:1885
#12 0x00007fa57023289e in malloc_init_hard () at src/jemalloc.c:2129
#13 0x00007fa570237b6d in malloc_init () at src/jemalloc.c:298
#14 imalloc_init_check (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2658
#15 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2676
#16 calloc (num=32, size=4) at src/jemalloc.c:2852
#17 0x00007fa5717e0cd9 in calloc (b=<optimized out>, a=32) at ../include/rtld-malloc.h:44
#18 _dl_relocate_object (scope=<optimized out>, reloc_mode=reloc_mode@entry=0, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:292
#19 0x00007fa5717d7535 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2485
#20 0x00007fa5717ed25e in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7ffc2dd41ac0, dl_main=dl_main@entry=0x7fa5717d5410 <dl_main>)
    at ../elf/dl-sysdep.c:253
#21 0x00007fa5717d4fcb in _dl_start_final (arg=0x7ffc2dd41ac0) at rtld.c:487
#22 _dl_start (arg=0x7ffc2dd41ac0) at rtld.c:580
#23 0x00007fa5717d3fe8 in _start () from /lib64/ld-linux-x86-64.so.2

@akostadinov
Copy link

This is a very nasty issue that prevents reliable jemalloc usage with LD_PRELOAD. It's kind of a gamble whether it will work or not with certain programs.

Can't some suitable thread local storage implementation be statically linked into it or a workaround that detects whether allocation was called during initialization and perform an alternative allocation, just for TLS?

@Atry
Copy link

Atry commented Mar 27, 2024

I built a simple LD_AUDIT shared library statically linked with musl, built from musl-gcc -shared -static. When using LD_PRELOADed jemalloc with the LD_AUDIT, I got the following error:

cannot allocate memory in static TLS block

When jemalloc is compiled with the --disable-initial-exec-tls flag, it is a segment fault.

So I believe the issue is not related to glibc brought from LD_AUDIT.

@Atry
Copy link

Atry commented Mar 27, 2024

Note: https://bugzilla.redhat.com/show_bug.cgi?id=1878932 mentions a similar sounding problem.

That problem should have been fixed since https://src.fedoraproject.org/rpms/glibc/c/8aee7e3563ec434ce692fbce0b81ef9ba53c2a0a?branch=rawhide

This issue looks like a regression.

@mkenigs
Copy link

mkenigs commented May 30, 2024

I'm wondering if this might actually be a glibc bug not a jemalloc bug? It seems like the combination of TLS usage in LD_PRELOAD with LD_AUDIT causes the failure.

Looking at the glibc-2.36 source on Debian, the TLS error is thrown at elf/dl-reloc.c:140

I set a breakpoint there and printed a backtrace:

cat <<EOF > break-error
set breakpoint pending on
directory ~/glibc-2.36
b elf/dl-reloc.c:140
r
bt
c
quit
EOF

gdb -x break-error --args env LD_PRELOAD=./lib/libjemalloc.so LD_AUDIT=./audit.so ls

The backtrace is:

Breakpoint 1, _dl_allocate_static_tls (map=0xfffff7ff5b10, map@entry=0x0) at ./elf/dl-reloc.c:140
140	      _dl_signal_error (0, map->l_name, NULL, N_("\
#0  _dl_allocate_static_tls (map=0xfffff7ff5b10, map@entry=0x0) at ./elf/dl-reloc.c:140
#1  0x0000fffff7fcc2bc in elf_machine_rela (skip_ifunc=<optimized out>,
    reloc_addr_arg=0xfffff7d7ffc0, version=0xfffff7ff37a0, sym=0xfffff7c00320,
    reloc=0xfffff7c08f00, scope=0x0, map=0xfffff7ff5b10) at ../sysdeps/aarch64/dl-machine.h:273
#2  elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>,
    nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>,
    scope=<optimized out>, map=0xfffff7ff5b10) at ./elf/do-rel.h:147
#3  _dl_relocate_object (l=l@entry=0xfffff7ff5b10, scope=<optimized out>,
    reloc_mode=<optimized out>, consider_profiling=<optimized out>, consider_profiling@entry=0)
    at ./elf/dl-reloc.c:301
#4  0x0000fffff7fd66f8 in dl_main (phdr=<optimized out>, phnum=<optimized out>,
    user_entry=<optimized out>, auxv=<optimized out>) at ./elf/rtld.c:2322
#5  0x0000fffff7fd368c in _dl_sysdep_start (start_argptr=start_argptr@entry=0x0,
    dl_main=dl_main@entry=0xfffff7fd4f40 <dl_main>) at ../sysdeps/unix/sysv/linux/dl-sysdep.c:140
#6  0x0000fffff7fd4c70 in _dl_start_final (arg=0x0) at ./elf/rtld.c:497
#7  _dl_start (arg=<optimized out>) at ./elf/rtld.c:584
#8  0x0000fffff7fd8bd4 in _start () at ../sysdeps/aarch64/dl-start.S:30
ls: error while loading shared libraries: ./lib/libjemalloc.so: cannot allocate memory in static TLS block

I don't claim to understand rtld.c, but hazarding a rough guess from the backtrace, it looks like the problem is caused by _dl_relocate_object.

Here's a smaller reproducer:

cat <<EOF > preload.c
__thread int preload_data[457] __attribute__((tls_model("initial-exec")));

int preload_function() {
    return preload_data[0];
}
EOF

gcc -shared preload.c -o preload.so

cat <<EOF > audit.c
unsigned int
la_version( unsigned int version )
{
  return version;
}
EOF

gcc -shared audit.c -o audit.so

cat <<EOF > main.c
#include <stdio.h>

int main()
{
    printf("Hello, World!\n");
    return 0;
}
EOF

gcc main.c -o main

./main # succeeds
LD_PRELOAD=./preload.so ./main # succeeds
LD_AUDIT=./audit.so ./main # succeeds
LD_PRELOAD=./preload.so LD_AUDIT=./audit.so ./main # fails

This results in:

./main: error while loading shared libraries: ./preload.so: cannot allocate memory in static TLS block

If I make the size of preload_data any smaller than 457, there isn't an error. Using gdb to print sizeof(tsd_tls), it looks tsd_tls has size of 2640, so it would trigger the error.

This is on aarch64 Debian:

$ uname -a
Linux debian 5.10.0-29-arm64 #1 SMP Debian 5.10.216-1 (2024-05-03) aarch64 GNU/Linux

Is there any limit on TLS size? If not it seems like this may be a glibc bug?

@martyngigg
Copy link

martyngigg commented Jun 7, 2024

We've also run into something similar with the LD_PRELOAD behaviour but in our case we don't use LD_AUDIT. We've been using jemalloc, via LD_PRELOAD with a Python application for some time and it's been fine. We used to use system-built packages from CentOS & Ubuntu but now use the Conda package manager, much like #937. Our package had been pinned to 5.2.0 and we were trying to move forward but any Conda package >= 5.2.1 causes a segfault upon startup of our application.

We've stripped things back and built jemalloc from source. The segfault occurs if we just run:

LD_PRELOAD=$HOME/opt/lib/libjemalloc.so python3 -c "import numpy"

gdb gives the following stacktrace:

Details

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7de310f in _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:823
823	../elf/dl-tls.c: No such file or directory.
Missing separate debuginfos, use: yum debuginfo-install platform-python-3.6.8-62.el8_10.rocky.0.x86_64
(gdb) bt
#0  0x00007ffff7de310f in _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:823
#1  0x00007ffff7de31fc in update_get_addr (ti=0x7ffff7bb9f80, gen=) at ../elf/dl-tls.c:917
#2  0x00007ffff7dcee9c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#3  0x00007ffff7940da4 in tsd_state_get (tsd=) at include/jemalloc/internal/tsd.h:341
#4  tsd_fast (tsd=) at include/jemalloc/internal/tsd.h:336
#5  free_fastpath (size_hint=false, size=0, ptr=0x0) at src/jemalloc.c:2728
#6  free (ptr=0x0) at src/jemalloc.c:2786
#7  0x00007ffff7de3129 in free (ptr=) at ../include/rtld-malloc.h:50
#8  _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:823
#9  0x00007ffff7de31fc in update_get_addr (ti=0x7ffff7bb9f80, gen=) at ../elf/dl-tls.c:917
#10 0x00007ffff7dcee9c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#11 0x00007ffff7940da4 in tsd_state_get (tsd=) at include/jemalloc/internal/tsd.h:341
#12 tsd_fast (tsd=) at include/jemalloc/internal/tsd.h:336
#13 free_fastpath (size_hint=false, size=0, ptr=0x0) at src/jemalloc.c:2728
#14 free (ptr=0x0) at src/jemalloc.c:2786
#15 0x00007ffff7de3129 in free (ptr=) at ../include/rtld-malloc.h:50
#16 _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:823
#17 0x00007ffff7de31fc in update_get_addr (ti=0x7ffff7bb9f80, gen=) at ../elf/dl-tls.c:917
#18 0x00007ffff7dcee9c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#19 0x00007ffff7940da4 in tsd_state_get (tsd=) at include/jemalloc/internal/tsd.h:341
#20 tsd_fast (tsd=) at include/jemalloc/internal/tsd.h:336
#21 free_fastpath (size_hint=false, size=0, ptr=0x0) at src/jemalloc.c:2728
#22 free (ptr=0x0) at src/jemalloc.c:2786
#23 0x00007ffff7de3129 in free (ptr=) at ../include/rtld-malloc.h:50
#24 _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:823
#25 0x00007ffff7de31fc in update_get_addr (ti=0x7ffff7bb9f80, gen=) at ../elf/dl-tls.c:917
#26 0x00007ffff7dcee9c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#27 0x00007ffff7940da4 in tsd_state_get (tsd=) at include/jemalloc/internal/tsd.h:341
#28 tsd_fast (tsd=) at include/jemalloc/internal/tsd.h:336
#29 free_fastpath (size_hint=false, size=0, ptr=0x0) at src/jemalloc.c:2728
#30 free (ptr=0x0) at src/jemalloc.c:2786
#31 0x00007ffff7de3129 in free (ptr=) at ../include/rtld-malloc.h:50
#32 _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:823
#33 0x00007ffff7de31fc in update_get_addr (ti=0x7ffff7bb9f80, gen=) at ../elf/dl-tls.c:917
#34 0x00007ffff7dcee9c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#35 0x00007ffff7940da4 in tsd_state_get (tsd=) at include/jemalloc/internal/tsd.h:341
#36 tsd_fast (tsd=) at include/jemalloc/internal/tsd.h:336
#37 free_fastpath (size_hint=false, size=0, ptr=0x0) at src/jemalloc.c:2728
#38 free (ptr=0x0) at src/jemalloc.c:2786
#39 0x00007ffff7de3129 in free (ptr=) at ../include/rtld-malloc.h:50

jemalloc was built using the same options as the conda-forge feedstock:

./configure --prefix=$HOME/opt \
            --disable-static \
            --disable-tls \
            --disable-initial-exec-tls

We ran a git bisect from the HEAD of dev back to 5.0.0 (as it turned out that 5.2.0 built using the above options also segfaulted but --disable-initial-exec-tls was only used for conda versions >=5.2.1). This dropped us at 794e29c as the failing commit, a part of #1365.

At this point we are a bit out of our depth. The stacktrace above suggests some type of chicken-and-egg problem with dynamic loading and tls...? We have noticed that if we remove the --disable-initial-exec-tls flag then we do not get a segfault.

Does this look like the same issue to you think or should I open a separate issue for this?

System details

  • lsb_release -a:
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	Rocky
Description:	Rocky Linux release 8.10 (Green Obsidian)
Release:	8.10
Codename:	GreenObsidian
  • uname -a: Linux host-172-16-101-92 4.18.0-553.el8_10.x86_64 #1 SMP Fri May 24 13:05:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

  • rpm -q glibc: glibc-2.28-251.el8_10.2.x86_64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@Atry @martyngigg @interwq @akostadinov @mkenigs @ianoversbygs and others