make world.opt seems to crash on tip of trunk on up-to-date OS X #6239
Original bug ID: 6239
I don't know if others can reproduce this, but on my mac, trunk segfaults when you try to build world.opt. Here's the github id of the version I tried.
Here's the error message I got:
../boot/ocamlrun ../ocamlopt -nostdlib -I ../stdlib -I ../utils -I ../parsing -I ../typing -I ../bytecomp -I ../asmcomp -I ../driver -I ../toplevel -o read_cmt.opt ../utils/misc.cmx ../utils/warnings.cmx ../utils/tbl.cmx ../utils/consistbl.cmx ../utils/config.cmx ../utils/clflags.cmx ../parsing/location.cmx ../parsing/longident.cmx ../parsing/lexer.cmx ../parsing/pprintast.cmx ../parsing/ast_helper.cmx ../parsing/ast_mapper.cmx ../typing/ident.cmx ../typing/path.cmx ../typing/types.cmx ../typing/typedtree.cmx ../typing/btype.cmx ../typing/subst.cmx ../typing/predef.cmx ../typing/datarepr.cmx ../typing/cmi_format.cmx ../typing/env.cmx ../typing/ctype.cmx ../typing/oprint.cmx ../typing/primitive.cmx ../typing/printtyp.cmx ../typing/mtype.cmx ../typing/envaux.cmx ../typing/typedtreeMap.cmx ../typing/typedtreeIter.cmx ../typing/cmt_format.cmx ../typing/stypes.cmx untypeast.cmx tast_iter.cmx cmt2annot.cmx read_cmt.cmx
Steps to reproduce
I've attached the log of the build, as well as some stack-traces from re-running the failing command using lldb
ocaml-trunk $ lldb -- /Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt -nostdlib -I ../stdlib -pp ./remove_DEBUG -I ../parsing -I ../utils -I ../typing -I ../driver -I ../bytecomp -I ../tools -I ../toplevel/ -I ../stdlib -I ../otherlibs/str -I ../otherlibs/dynlink -I ../otherlibs/unix -I ../otherlibs/num -I ../otherlibs/graph -warn-error A -c odoc_config.ml
Comment author: yminsky
I'm building with the latest xcode on 10.9. And the same box can build older versions, e.g., I built 4.00.1 on the same box after the build of trunk failed.
I'm not sure what extra debug info would be helpful for tracking this down. It's clearly not an issue with all os x builds.
Comment author: yminsky
Trying the latest version (14307), I new get it to fail in a different place:
ocaml-trunk $ lldb -- /Users/yminsky/Documents/code/ocaml-trunk/ocamlc.opt -nostdlib -I ../../stdlib -c -w +33..39 -warn-error A -g -nolabels unix.mli
Comment author: @mshinwell
This is a horrid one. I couldn't reproduce it but then realized what's wrong: it's faulting because %rbp isn't 16-byte aligned on that 128-bit move in [large_malloc].
So it looks like this is very similar to mantis 5700. C functions have to be entered with %rsp mod 16 = 8. I have to go now, and I haven't yet identified exactly where this rule is being broken, but it should be enough for you (Alain!) to go on. My suspicion is that the assembly code of [caml_raise_exn] (and perhaps [caml_reraise_exn] in some cases) is being called with the wrong stack alignment.
Comment author: @avsm
I've successfully built trunk (r14390, remove camlp4) on OS X 10.9 and passed all tests with this gcc:
$ gcc -v
I've also tried a build with various Malloc options enabled to see if that'll make a difference, which it hasn't. Yaron, how much memory do you have in your laptop (mine is 8GB, so I should be in high memory too).
$ env MallocScribble=1 MallocPreScribble=1 MallocGuardEdges=1 make world.opt
Not sure what else to try to reproduce this one.
Comment author: @xavierleroy
let f x = raise x
Compile this with ocamlopt -g, and you'll see that the stack (initially = 8 mod 16) is not realigned to 0 mod 16 before calling caml_raise_exn.
Why? because ocamlopt treats this function as a leaf function (!Proc.contains_calls = false) which does not need allocation of a proper stack frame.
The criteria for a leaf function are pretty strict: it should
Can you spot the missing case? Yes, there is one: if the function contains a "raise" and is compiled with -g, a call to a C function (caml_stash_backtrace) can occur, so it must not be a tail function.
This issue has been with us for a long time, but I believe it shows up only now because of Alain's recent optimization of constant exceptions. Before, raising such an exception would always allocate, causing the enclosing function to lose its leaf status. Now, we have more cases of useful functions that raise exceptions but don't allocate.
The fix is pretty simple: set Proc.contains_calls to true if the function contains a "raise" (not of the "notrace" kind) and is compiled with -g.
This fix is committed on SVN trunk, r14136, and a patch is attached.
Please let us know if this fixes the crash; then, I'll port it to the 4.01 branch.