You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all,
a bit of context:
I work at Datadog and we built the .NET continuous profiler: Datadog.Profiler.Native.so. We use libunwind (statically linked) to collect callstacks. In a signal handler, we call unw_backtrace2 with the context (provided by the handler) to get the instruction pointers.
We also have a ld_preloadedlibrary (Datadog.Linux.ApiWrapper.x64.so) which is merely a proxy to problematic situation (ex: thread interrupted while acquiring a lock in dl_iterate_phdrexample) and gives information to the profiler if it's safe to stackwalk or not.
Recently we got a crash report from one of our customer:
(lldb) bt
* thread #1, name = 'dotnet', stop reason = signal SIGSEGV
* frame #0: 0x00007fd13200defb Datadog.Profiler.Native.so`_ULx86_64_dwarf_callback(info=<unavailable>, size=<unavailable>, ptr=<unavailable>) at Gfind_proc_info-lsb.c:0:11
frame #1: 0x00007fd1ad574ef0 libc.so.6`dl_iterate_phdr + 352
frame #2: 0x00007fd1adbc71a5 Datadog.Linux.ApiWrapper.x64.so`dl_iterate_phdr(callback=(Datadog.Profiler.Native.so`_ULx86_64_dwarf_callback at Gfind_proc_info-lsb.c:574), data=0x00007fd110135530) at functions_to_wrap.c:73:18
frame #3: 0x00007fd13200e245 Datadog.Profiler.Native.so`_ULx86_64_dwarf_find_proc_info(as=0x00007fd1323bc600, ip=140538528257024, pi=0x00007fd110135b78, need_unwind_info=1, arg=0x00007fd110136211) at Gfind_proc_info-lsb.c:807:9
frame #4: 0x00007fd13200a9a6 Datadog.Profiler.Native.so`_ULx86_64_dwarf_step [inlined] fetch_proc_info(c=0x00007fd110135a20, ip=140538528257024) at Gparser.c:473:18
frame #5: 0x00007fd13200a92d Datadog.Profiler.Native.so`_ULx86_64_dwarf_step at Gparser.c:1021:13
frame #6: 0x00007fd13200a7b4 Datadog.Profiler.Native.so`_ULx86_64_dwarf_step(c=0x00007fd110135a20) at Gparser.c:1066:14
frame #7: 0x00007fd132008f89 Datadog.Profiler.Native.so`_ULx86_64_step(cursor=0x00007fd110135a20) at Gstep.c:93:9
frame #8: 0x00007fd132009baf Datadog.Profiler.Native.so`_ULx86_64_tdep_trace [inlined] trace_init_addr(f=<unavailable>, cursor=0x00007fd110135a20, cfa=140535894602616, rip=<unavailable>, rbp=140527566523136, rsp=140535894602616) at Gtrace.c:249:10
frame #9: 0x00007fd132009b2d Datadog.Profiler.Native.so`_ULx86_64_tdep_trace [inlined] trace_lookup(cursor=0x00007fd110135a20, cache=<unavailable>, cfa=140535894602616, rip=<unavailable>, rbp=140527566523136, rsp=140535894602616) at Gtrace.c:331:10
frame #10: 0x00007fd132009b2d Datadog.Profiler.Native.so`_ULx86_64_tdep_trace(cursor=0x00007fd110135a20, buffer=0x000055ac419d81c0, size=0x00007fd110135a0c) at Gtrace.c:449:27
frame #11: 0x00007fd132008834 Datadog.Profiler.Native.so`unw_backtrace2(buffer=<unavailable>, size=2049, uc2=0x00007fd110136680) at backtrace.c:113:7
frame #12: 0x00007fd131f92b1a Datadog.Profiler.Native.so`LinuxStackFramesCollector::CollectCallStackCurrentThread(void*) [inlined] LinuxStackFramesCollector::CollectStackWithBacktrace2(this=0x000055ac41869f20, ctx=0x00007fd110136680) at LinuxStackFramesCollector.cpp:250:18
frame #13: 0x00007fd131f92b04 Datadog.Profiler.Native.so`LinuxStackFramesCollector::CollectCallStackCurrentThread(this=0x000055ac41869f20, ctx=0x00007fd110136680) at LinuxStackFramesCollector.cpp:177:33
frame #14: 0x00007fd131f9208a Datadog.Profiler.Native.so`LinuxStackFramesCollector::CollectStackSampleSignalHandler(signal=<unavailable>, info=<unavailable>, context=0x00007fd110136680) at LinuxStackFramesCollector.cpp:293:60
frame #15: 0x00007fd131f94f11 Datadog.Profiler.Native.so`ProfilerSignalManager::SignalHandler(int, siginfo_t*, void*) [inlined] ProfilerSignalManager::CallCustomHandler(this=<unavailable>, signal=10, info=0x00007fd1101367b0, context=0x00007fd110136680) at ProfilerSignalManager.cpp:157:34
frame #16: 0x00007fd131f94efb Datadog.Profiler.Native.so`ProfilerSignalManager::SignalHandler(signal=10, info=0x00007fd1101367b0, context=0x00007fd110136680) at ProfilerSignalManager.cpp:148:25
frame #17: 0x00007fd1ad442520 libc.so.6`___lldb_unnamed_symbol3237 + 1
frame #18: 0x00007fd1ad0dd400 libcoreclr.so`sigsegv_handler(code=11, siginfo=0x00007fd1101374b0, context=0x00007fd110137380) at signal.cpp:548
frame #19: 0x00007fd13c5556ce
frame #20: 0x00007fd137b80351
Info:
Frames 19 and 20 are managed frames (according to MS Engineers, managed callstack can be unwound using the frame-based pointer approach. According to our tests, it looks like so)
The thread was interrupted while executing the sigsegv handler setup by the .NET CLR (frame 18)
In the .NET CLR, they use the SIGSEGV handler to manage NullReferenceException, in this case it should be a recoverable SIGSEV.
(lldb) fr v
(dl_phdr_info *) info = <Could not evaluate DW_OP_entry_value.>
(size_t) size = <Could not evaluate DW_OP_entry_value.>
(void *) ptr = <Could not evaluate DW_OP_entry_value.>
(dwarf_eh_frame_hdr) synth_eh_frame_hdr = {
version = <read memory from 0x7fd110134400 failed (0 of 1 bytes read)>
eh_frame_ptr_enc = <read memory from 0x7fd110134401 failed (0 of 1 bytes read)>
fde_count_enc = <read memory from 0x7fd110134402 failed (0 of 1 bytes read)>
table_enc = <read memory from 0x7fd110134403 failed (0 of 1 bytes read)>
eh_frame = <read memory from 0x7fd110134404 failed (0 of 8 bytes read)>
}
(dwarf_callback_data *) cb_data = <variable not available>
(unw_dyn_info_t *) di = <variable not available>
(Elf64_Addr) max_load_addr = 140538530461312
(int) need_unwind_info = 1
(unw_proc_info_t *) pi = 0x00007fd110135b78
(dwarf_eh_frame_hdr *) hdr = <variable not available>
(int) found = 0
(unw_word_t) ip = 140538528257024
(const Elf64_Phdr *) phdr = <variable not available>
(Elf64_Addr) load_base = 140538523156480
(const Elf64_Phdr *) p_text = 0x00007fd1acc00078
(const Elf64_Phdr *) p_eh_hdr = <variable not available>
(const Elf64_Phdr *) p_dynamic = 0x00007fd1acc000e8
(long) n = <variable not available>
(unw_accessors_t *) a = <variable not available>
(unw_word_t) addr = <variable not available>
(unw_word_t) eh_frame_start = <variable not available>
(int) ret = <variable not available>
(unw_word_t) fde_count = <variable not available>
(unw_word_t) eh_frame_end = <variable not available>
I'm not sure if I can give you the coredump file but I can provide you the libcoreclr.so and the corresponding so.dbg file. Feel free to ask me to run additional command-lines (lldb).
I hope you will be able to help me.
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi all,
a bit of context:
I work at Datadog and we built the .NET continuous profiler:
Datadog.Profiler.Native.so
. We use libunwind (statically linked) to collect callstacks. In a signal handler, we callunw_backtrace2
with the context (provided by the handler) to get the instruction pointers.We also have a
ld_preloaded
library (Datadog.Linux.ApiWrapper.x64.so
) which is merely a proxy to problematic situation (ex: thread interrupted while acquiring a lock indl_iterate_phdr
example) and gives information to the profiler if it's safe to stackwalk or not.Recently we got a crash report from one of our customer:
Info:
NullReferenceException
, in this case it should be arecoverable
SIGSEV.and from the
ip
fieldI'm not sure if I can give you the coredump file but I can provide you the
libcoreclr.so
and the correspondingso.dbg
file. Feel free to ask me to run additional command-lines (lldb).I hope you will be able to help me.
Thanks in advance.
The text was updated successfully, but these errors were encountered: