Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes when ocamlopt-generated code calls into Objective-C code that adds an exception handler #7118

Closed
vicuna opened this issue Jan 6, 2016 · 7 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Jan 6, 2016

Original bug ID: 7118
Reporter: bartjacobs
Assigned to: @gasche
Status: resolved (set by @gasche on 2016-03-08T13:36:16Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.02.1
Target version: 4.03.1+dev
Fixed in version: 4.03.0+dev / +beta1
Category: back end (clambda to assembly)
Related to: #7120
Monitored by: @gasche

Bug description

It seems that Objective-C code sometimes performs a stack walk as part of its happy-path normal operation; such stack walks seem to sometimes crash on ocamlopt-generated stack frames.

Evidence for this is that my tool, VeriFast, has been crashing recently. See the attached crash report. VeriFast uses GTK through lablgtk for its GUI. On my OS X machine, I can make VeriFast 15.12 crash reliably (see below). Apparently, the sequence of events is as follows:

  • VeriFast calls into lablgtk
  • lablgtk calls into GTK
  • GTK calls into Cocoa (AppKit)
  • AppKit calls into the Objective-C runtime to add an exception handler.
  • The runtime wants to find the innermost enclosing stack frame that handles Objective-C exceptions. It uses Apple's libunwind implementation to walk the stack to find such a frame.
  • libunwind crashes while advancing the stack walking cursor.

I haven't been able to verify yet that the stack frame that libunwind crashes on is an ocamlopt-generated stack frame (I don't have debug symbols for libunwind and interpreting the machine state manually is a bit daunting), but that seems likely, given that ocamlopt-generated code generates unusually-shaped stack frames.

Note: initially, I though the problem manifested only when calling ocamlopt without the -g option, but recently, I'm seeing the crashing even with VeriFast 15.12, which is built with the -g option.

Unfortunately, it does not seem to be the case that libunwind always crashes on ocamlopt-generated stack frames. Apparently, it only does sometimes. For example, if I run VeriFast IDE inside a debugger (lldb), I don't get crashes. (I can only investigate the problem by generating a core dump.)

Assuming that it is indeed the case that libunwind crashes on ocamlopt-generated stack frames, there seem to be a number of possible solutions:

  • Modify ocamlopt so that it generates stack frames that libunwind doesn't crash on
  • Modify ocamlopt so that on OS X it generates stack walking backstops when calling into external functions.
  • Modify libunwind so that it doesn't crash.
  • Modify lablgtk so that it generates stack walking backstops.
  • Modify GTK so that it generates stack walking backstops when calling into Cocoa.
  • Modify the Objective-C runtime so that it doesn't walk the stack when adding an exception handler.
  • Modify the Objective-C compiler so that it generates a stack walking backstop when C code calls Objective-C code.

Steps to reproduce

(This works on my machine reliably, but it seems rather brittle so it might not work on your machine.)

  1. Download VeriFast 15.12 for OS X from http://www.cs.kuleuven.be/~bartj/
  2. Unzip the tarball.
  3. Run 'bin/vfide examples/termination/ackermann.c'
  4. Issue the Verify command, either by clicking the Play toolbar button, pressing F5, or choosing the Verify menu item.
    VeriFast crashes after verification finishes, when it tries to scroll to the error location.

Additional information

I have also filed the following bugs:

File attachments

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 6, 2016

Comment author: @gasche

One option you could try as a potential workaround/experiment is to use the variant of the OCaml distribution that is configured to add frame pointers (this makes stack easier to walks by some tools). In OPAM, you can use the switches 4.02.1+fp (or 4.02.3+fp) to get an OCaml environment with frame-pointers enabled -- but this is a global change, as it changes the global compilation strategy.

Long-term, if your analysis of the issue is valid, I would guess that your option

Modify ocamlopt so that on OS X it generates stack walking backstops when calling into external functions.

is the more doable options on the OCaml side. I don't know about "stack walking backstops", but if was a clang attribute to generate such backstops at function entry, it would even be possible to add them in the C binding code directly, for example in the C bindings part of LablGtk. We could even modify only the C functions that we know may delegate to the Objective-C runtime, which would lower massively any performance impact of the change.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 6, 2016

Comment author: @yallop

We've seen similar issues with Objective C, libunwind, stack walking and ocamlopt on OS X:

http://lists.ocaml.org/pipermail/ctypes/2013-December/000015.html

I don't think we ever quite got to the bottom of the problem, but if the problems are related then the information in that thread might be helpful. For example, Daniel Buenzli traced the problem to the addition of -no_compact_unwind to the configure script:

http://lists.ocaml.org/pipermail/ctypes/2013-December/000038.html

According to my notes libunwind seemed to have trouble parsing the frame description entries (FDEs) generated with -no_compact_unwind.

We had the additional complication of using libffi, so it's possible there was something else going on.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 7, 2016

Comment author: bartjacobs

Many thanks to gasche and yallop for your help.

I have now familiarized myself a bit with DWARF, used that to read some of the relevant libunwind source code, and used that to make some sense of the core dump of my crashed VeriFast process. I can now confirm that libunwind does indeed crash while trying to step to the caller of an ocamlopt-generated frame. The frame it crashes on is the first ocamlopt-generated frame it encounters. This is a VeriFast frame.

libunwind does find DWARF call frame information for the ocamlopt-generated code, but apparently something goes wrong while interpreting that information, because the CFA computed is outside the stack. (The CFA, Canonical Frame Address, is supposed to be the stack pointer at the call site.) (This seems to confirm yallop's notes, except that I did not yet check that I'm using -no_compact_unwind.)

libunwind then tries to use this CFA to retrieve a register value saved on the stack by dereferencing the address at a given offset from the CFA, whence the EXC_BAD_ACCESS crash.

It just occurred to me that there might be an extremely simple and feasible solution: simply don't emit any unwind information for ocamlopt-generated frames! This will cause libunwind to stop walking the stack. This even reduces the file size! :-)

I will try this once I figure out what options to pass to ocamlopt and/or the linker to achieve this. (Any pointers?) However, this won't work if this information is required for a 'legitimate' purpose. Does the OCaml runtime use this information for its own exception propagation?

In that case, I see another very simple solution: do not emit unwind information for the C modules of lablgtk.

I'll try these approaches now.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 7, 2016

Comment author: bartjacobs

Update: I decided to first check why earlier versions of VeriFast didn't crash. By debugging VeriFast 15.05 with lldb, I could determine that in that version, libunwind doesn't walk into the VeriFast binary. Specifically, it walks through GTK, but it doesn't walk into the lablgtk C glue function. Presumably, in that build the lablgtk C functions don't have unwinding info. I'll try to figure out why in my current builds they do.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 7, 2016

Comment author: @gasche

Mark Shinwell would be the expert on Dwarf information emitted by the compiler -- current and future state.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 8, 2016

Comment author: bartjacobs

Update: I seem to have found a fix for my problem. It seems to be sufficient to add -fno-unwind-tables to the C compiler command line when compiling OCaml-to-GTK glue code. After this fix, my previously reliable way to make VeriFast crash no longer works. I confirmed in the debugger that the Cocoa stack walks no longer walk past the lablgtk C glue functions.

I have reported this on the lablgtk bug tracker.

While, for defense-in-depth purposes, it might be good to also take further action on the OCaml side (such as fixing the unwind info or dropping it), for me, no further action seems to be required. Many thanks again to gasche and yallop for your help!

@vicuna

This comment has been minimized.

Copy link
Author

commented Mar 8, 2016

Comment author: @gasche

This has been fixed by Bart Jacob's PR
#408
which has been merged in 4.03:
59a4fd6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.