Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more information after a segmentation fault #1921

Open
wants to merge 6 commits into
base: master
from

Conversation

@wezrule
Copy link
Collaborator

commented Apr 25, 2019

The information gathered from a segmentation fault trace (from dmesg) on linux without a core dump can be next to useless if the virtual addresses are randomized. Boost has a stacktrace library and the example catches the SIGSEGV and SIGABRT signals and displays a callstack with various memory addresses for the callstack in the handler. The results depends a lot on compiler intrinsics/debug info options. I tried various stacktrace build options with gcc 8.3 on Ubunutu 19.04 to no avail, and was only able to get output similar to this with the nano_node:

0# 0x000055A6F6B57C0F
1# 0x000055A6F6B57C42
2# 0x000055A6F6B57C9D

One of these addresses did match that in the segmentation fault ip so it at least gives more information. A lot of people seem to have this issue though.

The boost stacktrace warns that:

Writing a signal handler requires high attention! Only a few system calls allowed in signal handlers, so there's no cross platform way to print a stacktrace without a risk of deadlocking. The only way to deal with the problem - dump raw stacktrace into file/socket and parse it on program restart.

For this reason I am careful what is done inside there. I don't use any live objects, and just write out a separate file for each library (8 in total currently) in the same directory as the executable.

I also tried using backtrace () on linux directly which gave much better results (function names + offsets), but it required a lot more build changes (checking if libbacktrace.so is installed etc) but seems promising at least. I was able to achieve (with no build changes) similar results using dl_iterate_phdr on linux to get the load addresses of the executable and all libraries. These can then be used to subtract from the ip address to find the exact cause. This is only called after a seg fault so that the information is not output while the program is running (for security reasons). I do wonder if this might cause an issue in some circumstances if the shared library gets unloaded for instance, so results may be better if this is used at the start but lowered security.

So if the node seg faults on Linux, the process for services would now become:
1 - Do they have core dumps available? (these are the most useful to us). However they may not want to send us these files. If possible they can build the node with debug information (use exact same -DCMAKE_BUILD_TYPE but add add_compile_options(-g) to the main CMakeLists.txt file, load into gdb and send us the backtrace. This should point directly to the cause.
2 - Should the service not be willing to do any extra work themselves, then they can send us the following:

  • Commit used (or tag)
  • Send the segmentation fault message in dmesg (if available)
  • run ./nano_node --debug_output_last_backtrace_dump and send output
  • send us the files located in same directory as nano_node: nano_node_crash_load_address_dump_*.txt

3 - On our end compare the ip memory address with those in all the nano_node_crash_load_address_dump_*.txt files. Find the one which is closest (but lower) than the ip memory address. Subtract the ip from the load address in the file. Build the same version of the node (same compile options must be used) but with an added add_compile_options(-g) in the main CMakeLists.txt file if a release build was used.

Then either run:
addr2line -fCi -e nano_node <insert_hex_address>

or

gdb nano_node
gdb > info line * <insert_hex_address>
gdb > list * <insert_hex_address>

It should give you some useful information about the cause of the issue.

@wezrule wezrule added this to the V19.0 milestone Apr 25, 2019

@wezrule wezrule requested review from argakiig, SergiySW and cryptocode Apr 25, 2019

@wezrule wezrule self-assigned this Apr 25, 2019

@wezrule wezrule added the experiment label Apr 25, 2019

@zhyatt zhyatt added this to During RC in V19 Apr 30, 2019

wezrule added some commits May 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.