Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in _ULx86_64_dwarf_callback #648

Open
gleocadie opened this issue Oct 25, 2023 · 0 comments
Open

Crash in _ULx86_64_dwarf_callback #648

gleocadie opened this issue Oct 25, 2023 · 0 comments
Assignees

Comments

@gleocadie
Copy link
Contributor

gleocadie commented Oct 25, 2023

Hi all,
a bit of context:
I work at Datadog and we built the .NET continuous profiler: Datadog.Profiler.Native.so. We use libunwind (statically linked) to collect callstacks. In a signal handler, we call unw_backtrace2 with the context (provided by the handler) to get the instruction pointers.
We also have a ld_preloaded library (Datadog.Linux.ApiWrapper.x64.so) which is merely a proxy to problematic situation (ex: thread interrupted while acquiring a lock in dl_iterate_phdr example) and gives information to the profiler if it's safe to stackwalk or not.

Recently we got a crash report from one of our customer:

(lldb) bt
* thread #1, name = 'dotnet', stop reason = signal SIGSEGV
  * frame #0: 0x00007fd13200defb Datadog.Profiler.Native.so`_ULx86_64_dwarf_callback(info=<unavailable>, size=<unavailable>, ptr=<unavailable>) at Gfind_proc_info-lsb.c:0:11
    frame #1: 0x00007fd1ad574ef0 libc.so.6`dl_iterate_phdr + 352
    frame #2: 0x00007fd1adbc71a5 Datadog.Linux.ApiWrapper.x64.so`dl_iterate_phdr(callback=(Datadog.Profiler.Native.so`_ULx86_64_dwarf_callback at Gfind_proc_info-lsb.c:574), data=0x00007fd110135530) at functions_to_wrap.c:73:18
    frame #3: 0x00007fd13200e245 Datadog.Profiler.Native.so`_ULx86_64_dwarf_find_proc_info(as=0x00007fd1323bc600, ip=140538528257024, pi=0x00007fd110135b78, need_unwind_info=1, arg=0x00007fd110136211) at Gfind_proc_info-lsb.c:807:9
    frame #4: 0x00007fd13200a9a6 Datadog.Profiler.Native.so`_ULx86_64_dwarf_step [inlined] fetch_proc_info(c=0x00007fd110135a20, ip=140538528257024) at Gparser.c:473:18
    frame #5: 0x00007fd13200a92d Datadog.Profiler.Native.so`_ULx86_64_dwarf_step at Gparser.c:1021:13
    frame #6: 0x00007fd13200a7b4 Datadog.Profiler.Native.so`_ULx86_64_dwarf_step(c=0x00007fd110135a20) at Gparser.c:1066:14
    frame #7: 0x00007fd132008f89 Datadog.Profiler.Native.so`_ULx86_64_step(cursor=0x00007fd110135a20) at Gstep.c:93:9
    frame #8: 0x00007fd132009baf Datadog.Profiler.Native.so`_ULx86_64_tdep_trace [inlined] trace_init_addr(f=<unavailable>, cursor=0x00007fd110135a20, cfa=140535894602616, rip=<unavailable>, rbp=140527566523136, rsp=140535894602616) at Gtrace.c:249:10
    frame #9: 0x00007fd132009b2d Datadog.Profiler.Native.so`_ULx86_64_tdep_trace [inlined] trace_lookup(cursor=0x00007fd110135a20, cache=<unavailable>, cfa=140535894602616, rip=<unavailable>, rbp=140527566523136, rsp=140535894602616) at Gtrace.c:331:10
    frame #10: 0x00007fd132009b2d Datadog.Profiler.Native.so`_ULx86_64_tdep_trace(cursor=0x00007fd110135a20, buffer=0x000055ac419d81c0, size=0x00007fd110135a0c) at Gtrace.c:449:27
    frame #11: 0x00007fd132008834 Datadog.Profiler.Native.so`unw_backtrace2(buffer=<unavailable>, size=2049, uc2=0x00007fd110136680) at backtrace.c:113:7
    frame #12: 0x00007fd131f92b1a Datadog.Profiler.Native.so`LinuxStackFramesCollector::CollectCallStackCurrentThread(void*) [inlined] LinuxStackFramesCollector::CollectStackWithBacktrace2(this=0x000055ac41869f20, ctx=0x00007fd110136680) at LinuxStackFramesCollector.cpp:250:18
    frame #13: 0x00007fd131f92b04 Datadog.Profiler.Native.so`LinuxStackFramesCollector::CollectCallStackCurrentThread(this=0x000055ac41869f20, ctx=0x00007fd110136680) at LinuxStackFramesCollector.cpp:177:33
    frame #14: 0x00007fd131f9208a Datadog.Profiler.Native.so`LinuxStackFramesCollector::CollectStackSampleSignalHandler(signal=<unavailable>, info=<unavailable>, context=0x00007fd110136680) at LinuxStackFramesCollector.cpp:293:60
    frame #15: 0x00007fd131f94f11 Datadog.Profiler.Native.so`ProfilerSignalManager::SignalHandler(int, siginfo_t*, void*) [inlined] ProfilerSignalManager::CallCustomHandler(this=<unavailable>, signal=10, info=0x00007fd1101367b0, context=0x00007fd110136680) at ProfilerSignalManager.cpp:157:34
    frame #16: 0x00007fd131f94efb Datadog.Profiler.Native.so`ProfilerSignalManager::SignalHandler(signal=10, info=0x00007fd1101367b0, context=0x00007fd110136680) at ProfilerSignalManager.cpp:148:25
    frame #17: 0x00007fd1ad442520 libc.so.6`___lldb_unnamed_symbol3237 + 1
    frame #18: 0x00007fd1ad0dd400 libcoreclr.so`sigsegv_handler(code=11, siginfo=0x00007fd1101374b0, context=0x00007fd110137380) at signal.cpp:548
    frame #19: 0x00007fd13c5556ce
    frame #20: 0x00007fd137b80351

Info:

  • Frames 19 and 20 are managed frames (according to MS Engineers, managed callstack can be unwound using the frame-based pointer approach. According to our tests, it looks like so)
  • The thread was interrupted while executing the sigsegv handler setup by the .NET CLR (frame 18)
  • In the .NET CLR, they use the SIGSEGV handler to manage NullReferenceException, in this case it should be a recoverable SIGSEV.
(lldb) fr v
(dl_phdr_info *) info = <Could not evaluate DW_OP_entry_value.>

(size_t) size = <Could not evaluate DW_OP_entry_value.>

(void *) ptr = <Could not evaluate DW_OP_entry_value.>

(dwarf_eh_frame_hdr) synth_eh_frame_hdr = {
  version = <read memory from 0x7fd110134400 failed (0 of 1 bytes read)>

  eh_frame_ptr_enc = <read memory from 0x7fd110134401 failed (0 of 1 bytes read)>

  fde_count_enc = <read memory from 0x7fd110134402 failed (0 of 1 bytes read)>

  table_enc = <read memory from 0x7fd110134403 failed (0 of 1 bytes read)>

  eh_frame = <read memory from 0x7fd110134404 failed (0 of 8 bytes read)>

}
(dwarf_callback_data *) cb_data = <variable not available>

(unw_dyn_info_t *) di = <variable not available>

(Elf64_Addr) max_load_addr = 140538530461312
(int) need_unwind_info = 1
(unw_proc_info_t *) pi = 0x00007fd110135b78
(dwarf_eh_frame_hdr *) hdr = <variable not available>

(int) found = 0
(unw_word_t) ip = 140538528257024
(const Elf64_Phdr *) phdr = <variable not available>

(Elf64_Addr) load_base = 140538523156480
(const Elf64_Phdr *) p_text = 0x00007fd1acc00078
(const Elf64_Phdr *) p_eh_hdr = <variable not available>

(const Elf64_Phdr *) p_dynamic = 0x00007fd1acc000e8
(long) n = <variable not available>

(unw_accessors_t *) a = <variable not available>

(unw_word_t) addr = <variable not available>

(unw_word_t) eh_frame_start = <variable not available>

(int) ret = <variable not available>

(unw_word_t) fde_count = <variable not available>

(unw_word_t) eh_frame_end = <variable not available>

and from the ip field

(lldb) image lookup --address 140538528257024
      Address: libcoreclr.so[0x00000000004dd400] (libcoreclr.so.PT_LOAD[0]..text + 4615920)
      Summary: libcoreclr.so`sigsegv_handler(int, siginfo_t*, void*) at signal.cpp:548

I'm not sure if I can give you the coredump file but I can provide you the libcoreclr.so and the corresponding so.dbg file. Feel free to ask me to run additional command-lines (lldb).

I hope you will be able to help me.
Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants