Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Crashes when ocamlopt-generated code calls into Objective-C code that adds an exception handler #7118
Original bug ID: 7118
It seems that Objective-C code sometimes performs a stack walk as part of its happy-path normal operation; such stack walks seem to sometimes crash on ocamlopt-generated stack frames.
Evidence for this is that my tool, VeriFast, has been crashing recently. See the attached crash report. VeriFast uses GTK through lablgtk for its GUI. On my OS X machine, I can make VeriFast 15.12 crash reliably (see below). Apparently, the sequence of events is as follows:
I haven't been able to verify yet that the stack frame that libunwind crashes on is an ocamlopt-generated stack frame (I don't have debug symbols for libunwind and interpreting the machine state manually is a bit daunting), but that seems likely, given that ocamlopt-generated code generates unusually-shaped stack frames.
Note: initially, I though the problem manifested only when calling ocamlopt without the -g option, but recently, I'm seeing the crashing even with VeriFast 15.12, which is built with the -g option.
Unfortunately, it does not seem to be the case that libunwind always crashes on ocamlopt-generated stack frames. Apparently, it only does sometimes. For example, if I run VeriFast IDE inside a debugger (lldb), I don't get crashes. (I can only investigate the problem by generating a core dump.)
Assuming that it is indeed the case that libunwind crashes on ocamlopt-generated stack frames, there seem to be a number of possible solutions:
Steps to reproduce
(This works on my machine reliably, but it seems rather brittle so it might not work on your machine.)
I have also filed the following bugs:
Comment author: @gasche
One option you could try as a potential workaround/experiment is to use the variant of the OCaml distribution that is configured to add frame pointers (this makes stack easier to walks by some tools). In OPAM, you can use the switches 4.02.1+fp (or 4.02.3+fp) to get an OCaml environment with frame-pointers enabled -- but this is a global change, as it changes the global compilation strategy.
Long-term, if your analysis of the issue is valid, I would guess that your option
is the more doable options on the OCaml side. I don't know about "stack walking backstops", but if was a clang attribute to generate such backstops at function entry, it would even be possible to add them in the C binding code directly, for example in the C bindings part of LablGtk. We could even modify only the C functions that we know may delegate to the Objective-C runtime, which would lower massively any performance impact of the change.
Comment author: @yallop
We've seen similar issues with Objective C, libunwind, stack walking and ocamlopt on OS X:
I don't think we ever quite got to the bottom of the problem, but if the problems are related then the information in that thread might be helpful. For example, Daniel Buenzli traced the problem to the addition of -no_compact_unwind to the configure script:
According to my notes libunwind seemed to have trouble parsing the frame description entries (FDEs) generated with -no_compact_unwind.
We had the additional complication of using libffi, so it's possible there was something else going on.
Comment author: bartjacobs
Many thanks to gasche and yallop for your help.
I have now familiarized myself a bit with DWARF, used that to read some of the relevant libunwind source code, and used that to make some sense of the core dump of my crashed VeriFast process. I can now confirm that libunwind does indeed crash while trying to step to the caller of an ocamlopt-generated frame. The frame it crashes on is the first ocamlopt-generated frame it encounters. This is a VeriFast frame.
libunwind does find DWARF call frame information for the ocamlopt-generated code, but apparently something goes wrong while interpreting that information, because the CFA computed is outside the stack. (The CFA, Canonical Frame Address, is supposed to be the stack pointer at the call site.) (This seems to confirm yallop's notes, except that I did not yet check that I'm using -no_compact_unwind.)
libunwind then tries to use this CFA to retrieve a register value saved on the stack by dereferencing the address at a given offset from the CFA, whence the EXC_BAD_ACCESS crash.
It just occurred to me that there might be an extremely simple and feasible solution: simply don't emit any unwind information for ocamlopt-generated frames! This will cause libunwind to stop walking the stack. This even reduces the file size! :-)
I will try this once I figure out what options to pass to ocamlopt and/or the linker to achieve this. (Any pointers?) However, this won't work if this information is required for a 'legitimate' purpose. Does the OCaml runtime use this information for its own exception propagation?
In that case, I see another very simple solution: do not emit unwind information for the C modules of lablgtk.
I'll try these approaches now.
Comment author: bartjacobs
Update: I decided to first check why earlier versions of VeriFast didn't crash. By debugging VeriFast 15.05 with lldb, I could determine that in that version, libunwind doesn't walk into the VeriFast binary. Specifically, it walks through GTK, but it doesn't walk into the lablgtk C glue function. Presumably, in that build the lablgtk C functions don't have unwinding info. I'll try to figure out why in my current builds they do.
Comment author: bartjacobs
Update: I seem to have found a fix for my problem. It seems to be sufficient to add -fno-unwind-tables to the C compiler command line when compiling OCaml-to-GTK glue code. After this fix, my previously reliable way to make VeriFast crash no longer works. I confirmed in the debugger that the Cocoa stack walks no longer walk past the lablgtk C glue functions.
I have reported this on the lablgtk bug tracker.
While, for defense-in-depth purposes, it might be good to also take further action on the OCaml side (such as fixing the unwind info or dropping it), for me, no further action seems to be required. Many thanks again to gasche and yallop for your help!