Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux C++ crashpad stacktrace incorrect #569

Closed
2 tasks
krisprad opened this issue Jul 5, 2021 · 12 comments
Closed
2 tasks

Linux C++ crashpad stacktrace incorrect #569

krisprad opened this issue Jul 5, 2021 · 12 comments

Comments

@krisprad
Copy link

krisprad commented Jul 5, 2021

I am getting incorrect stack traces for the following code. The curious thing is presence of this unused function: uncommenting_this_functions_scuppers_stackrace_for_release_builds
(obviously christened for this example.).
If this is present, wrong stack trace. If absent, somewhat better stack trace. In the incorrect one, function names are not coming out correctly.
We have a much larger Linux C++ projects where stack traces are jumbled up and trying to narrow down the problem with a toy example like this.

(also posted in sentry forums earlier. Apologies if that breaks post etiquette. I wanted better visibility)

When does the problem happen

  • During build
  • During run-time
  • [ X] When capturing a hard crash

Environment

  • OS: [Ubuntu 20.04.2 LTS]
  • Compiler: [gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0]
  • CMake version and config: [cmake version 3.16.3, SENTRY_BACKEND crashpad]

Steps To Reproduce

Source code:

sentry_example_crashpad_docker.cpp.txt

Log output

The call sequence leading to crash should be : main → SetupSentry::SetupSentry → trigger_crash_a → trigger_crash_b → trigger_crash

See snapshot of stacktrace screens:

Stacktrace ok

stacktrace_ok

Stacktrace incorrect (trigger_crash_a function call missing. We see 'clone .constprop.0' in its place). Happens when uncommenting_this_functions_scuppers_stackrace_for_release_builds is uncomented.

stacktrace_not_ok

Output of crash and crashpad upload (identical log for run in both cases):

upload_log_for_ok_stacktrace.txt

upload_log_for_not_ok_stacktrace.txt

@Swatinem
Copy link
Member

Swatinem commented Jul 9, 2021

Bildschirmfoto von 2021-07-09 11-38-33

The frames from libc/libstdc++ are a bit weird, but its unwinding correctly. I wonder if all the calls are being optimized away, or what exactly is going on.

image

Here the function has a broken name, but the file/line information is correct. Also note that all of these are inline frames.

In your case specifically, I think you are missing some unwind information (indicated by the red underline). Can you provide a link to your sentry issue so that I can look at the debug files in detail?

@Swatinem
Copy link
Member

Swatinem commented Jul 9, 2021

It seems that this is rather a UI problem.

grafik

When I expand the full function names, I do get the correct function name, however collapsing it (it should remove function arguments and generics) removes the function name and just leaves the [clone .constprop.0]

@krisprad
Copy link
Author

Thanks for responding.

Here is the link to the issue:

https://sentry.io/organizations/colet-systems/issues/2426845845/?project=5752922&query=is%3Aunresolved

This indicate that all the required information is available, including stack unwinding info.

The stack unwinding info is ticked green when I uploaded executable along with the debug file.

stacktrace_not_ok_with_all_info

Some aspects

  1. One question is why the presence or absence of a function mentioned previously affects the stacktrace:

void uncommenting_this_functions_scuppers_stackrace_for_release_builds(int argc, char **argv)

  1. The application is built using: -g3 -O3 gcc flags:

Uploaded debug file app_binary.debug which is created as below:

objcopy --only-keep-debug --compress-debug-sections=zlib app_binary app_binary.debug

  1. We have a more complex multi-threaded application and we don't get correct stacktraces either.

There are no per thread stack frames displayed. All are shown in a single frame.
Compared this with gdb stack trace for the same builds, and there is no
correspondence between what gdb shows (which is correct symbolic trace) and in sentry (which is
jumbled up and no symbols)

Considering that gdb displays correctly, I assume that the necessary debug information is correctly generated.
If so, are there any other files (other than debug information files) sentry requires?

Much of sentry documentation is common across all platforms, but there may be subtle Linux issues I might have missed.

@Swatinem
Copy link
Member

1. One question is why the presence or absence of a function mentioned previously affects the stacktrace:

I would guess that the compiler can completely inline trigger_crash_a if it sees that it is only used in one single place. Having it being called by two functions, it does not inline it anymore. The [clone …] indicates that.

1. The application is built using: `-g3 -O3` gcc flags:

That is perfectly fine, also the way you generate the debug file.

there is no correspondence between what gdb shows

gdb has access to all files on the system, and sentry does not. In particular, you are missing all the system libraries like libc, libpthread, etc. (and also libsentry.so)

We have no builtin symbol sources for well known linux distributions, like we have for apple or microsoft. So you will have to upload all of your system symbols manually. With that, you should get good stack traces.

@krisprad
Copy link
Author

It is understandable system and sentry libraries not displaying symbols, however, this doesn't seem to be the issue.
We are not expecting symbolic trace from outside main application, such as system libraries or sentry.
If I run this in gdb (the same build uploaded to sentry), here is the stack trace.

# gdb -batch -ex run -ex bt  --args ./sentry_example crash
<relevant crash stack trace>

Thread 1 "sentry_example" received signal SIGSEGV, Segmentation fault.
0x0000563bc6b4f9ee in memset (__len=100, __ch=1, __dest=0x1) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71
71	  return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest));
#0  0x0000563bc6b4f9ee in memset (__len=100, __ch=1, __dest=0x1) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71
#1  trigger_crash (v=0x7fff83903b48) at /sentry_example/sentry_example_crashpad_docker.cpp:76
#2  trigger_crash_b (v=0x7fff83903b48) at /sentry_example/sentry_example_crashpad_docker.cpp:83
#3  trigger_crash_a (v=0x7fff83903b48) at /sentry_example/sentry_example_crashpad_docker.cpp:90
#4  0x0000563bc6b4ff02 in SetupSentry::SetupSentry (this=<optimized out>, argc=2, argv=0x7fff83903ca8) at /sentry_example/sentry_example_crashpad_docker.cpp:237
#5  0x0000563bc6b4f672 in main (argc=2, argv=0x7fff83903ca8) at /sentry_example/sentry_example_crashpad_docker.cpp:284
# 

Is it reasonable to expect stack trace similar to gdb for the part of the code that has debug information files available? If not, what other files one needs to add to help sentry?

When I expand the full function names, I do get the correct function name, however collapsing it (it should remove function arguments and generics) removes the function name and just leaves the [clone .constprop.0]

How to avoid clone.* and get correct function names?

@Swatinem
Copy link
Member

grafik

The button to expand the function details is a bit hard to spot unfortunately. The fact that the wrong parts of the complete function are being hidden/trimmed of is a bug on our end which we are working on fixing.

@krisprad
Copy link
Author

Another thing I noticed is that the function argument values do not show up. Is there some sentry magic that can accomplish this?

@Swatinem
Copy link
Member

Not currently, that would be a fairly big change, and so far we have not scheduled it.

@krisprad
Copy link
Author

krisprad commented Jul 24, 2021

Getting back to this after results from a real world example this time:

We have a code (from a very complex project). with steps executed in this sequence

(1) handleControllerUpdate(...) <--- this function is invoked from somewhere
(2) <create stack trace using libunwind> <-- creates a sentry event with stack trace from libunwind
(3)throw std::runtime_error("exception") <---- crashes the application generating sentry stacktrace

(2) is immediately followed by (3)

I have used libunwind to get stack trace and compare with sentry's stack trace at around the same point.

When I see stacktrace leading to handleControllerUpdate, it looks different for sentry stacktrace and libunwind.
libunwind appears more compact and plausible, but sentry version has lot more'noise' I feel. I am confused why they differ so much.

The issue is here: https://sentry.io/organizations/colet-systems/issues/2532311290/?project=5868275&query=is%3Aunresolved

Happy to fill you in with details if I am not very clear.

@Swatinem
Copy link
Member

The problem here is that you haven’t uploaded all the debug/executable files. libunwind has access to all of the unwind information embedded in the executables/libraries since it is running directly in the process. For sentry to do a correct job, it does need access to the same files, including system symbols, such as libc, etc.
The red underlined addresses indicate that there was no unwind information present, and sentry was only "guessing" the functions based on whatever was in stack memory.

@krisprad
Copy link
Author

That makes sense but please clarify these points. Thanks.

Suppose we have an application 'A' with system libraries 'S' linked (libc etc).

Let 'A' has debug information file (DIF) uploaded.

I expected part of the stack trace that features 'A' (i.e. excluding 'S' like libc) would
be identical to that in libunwind stack trace. (This is similar issue re gdb you answered previously)
This is not the case.
Is it because lack of required debug info from 'S'
is preventing proper trace from 'A' even if 'A' has DIF and also has minidump?

sentry was only "guessing" the functions based on whatever was in stack memory.

What exactly does this mean? This means stacktrace can be buggy and correct functions are not identified?

Any suggestions on what we can do?
Uploading system or third party libraries' DIF-s may not be feasible.
Uploading the executable (not just DIF) will have any helpful impact?

@Swatinem
Copy link
Member

Yes, you do need proper unwind info for both A, and S. Also note that sometimes strip only splits out the debug info, as you would need the unwind info for the runtime, and actually the runtime library has the unwind info. So for sentry to function properly, you need both the executable/library as well as the associated debug files.

For your specific usecase above, the topmost frame is from libc, so to unwind through that, you do need to upload your libc version.

What exactly does this mean? This means stacktrace can be buggy and correct functions are not identified?

It means that it simply scans stack memory and flags anything as a stack frame that happens to point into a valid code region. So it can have a ton of false positives, and also missing correct functions if it then uses the unwind info of the incorrectly identified function address.

I hope this helps. I will close this issue, as the underlying problem seems to be missing unwind info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants