Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make world.opt seems to crash on tip of trunk on up-to-date OS X #6239

Closed
vicuna opened this Issue Nov 16, 2013 · 19 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link
Collaborator

commented Nov 16, 2013

Original bug ID: 6239
Reporter: yminsky
Status: closed (set by @xavierleroy on 2015-12-11T18:25:21Z)
Resolution: fixed
Priority: normal
Severity: major
Fixed in version: 4.01.1+dev
Category: ~DO NOT USE (was: OCaml general)

Bug description

I don't know if others can reproduce this, but on my mac, trunk segfaults when you try to build world.opt. Here's the github id of the version I tried.

df7e6c1

Here's the error message I got:

../boot/ocamlrun ../ocamlopt -nostdlib -I ../stdlib -I ../utils -I ../parsing -I ../typing -I ../bytecomp -I ../asmcomp -I ../driver -I ../toplevel -o read_cmt.opt ../utils/misc.cmx ../utils/warnings.cmx ../utils/tbl.cmx ../utils/consistbl.cmx ../utils/config.cmx ../utils/clflags.cmx ../parsing/location.cmx ../parsing/longident.cmx ../parsing/lexer.cmx ../parsing/pprintast.cmx ../parsing/ast_helper.cmx ../parsing/ast_mapper.cmx ../typing/ident.cmx ../typing/path.cmx ../typing/types.cmx ../typing/typedtree.cmx ../typing/btype.cmx ../typing/subst.cmx ../typing/predef.cmx ../typing/datarepr.cmx ../typing/cmi_format.cmx ../typing/env.cmx ../typing/ctype.cmx ../typing/oprint.cmx ../typing/primitive.cmx ../typing/printtyp.cmx ../typing/mtype.cmx ../typing/envaux.cmx ../typing/typedtreeMap.cmx ../typing/typedtreeIter.cmx ../typing/cmt_format.cmx ../typing/stypes.cmx untypeast.cmx tast_iter.cmx cmt2annot.cmx read_cmt.cmx
cd ocamldoc && /Applications/Xcode.app/Contents/Developer/usr/bin/make opt.opt
/Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt -nostdlib -I ../stdlib -pp ./remove_DEBUG -I ../parsing -I ../utils -I ../typing -I ../driver -I ../bytecomp -I ../tools -I ../toplevel/ -I ../stdlib -I ../otherlibs/str -I ../otherlibs/dynlink -I ../otherlibs/unix -I ../otherlibs/num -I ../otherlibs/graph -warn-error A -c odoc_config.ml
/bin/sh: line 1: 73228 Segmentation fault: 11 ${CAMLOPT_BIN} -nostdlib -I ../stdlib -pp './remove_DEBUG' -I ../parsing -I ../utils -I ../typing -I ../driver -I ../bytecomp -I ../tools -I ../toplevel/ -I ../stdlib -I ../otherlibs/str -I ../otherlibs/dynlink -I ../otherlibs/unix -I ../otherlibs/num -I ../otherlibs/graph -warn-error A -c odoc_config.ml
make[3]: *** [odoc_config.cmx] Error 139
make[2]: *** [ocamldoc.opt] Error 2
make[1]: *** [opt.opt] Error 2
make: *** [world.opt] Error 2

Steps to reproduce

I've attached the log of the build, as well as some stack-traces from re-running the failing command using lldb

Additional information

ocaml-trunk $ lldb -- /Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt -nostdlib -I ../stdlib -pp ./remove_DEBUG -I ../parsing -I ../utils -I ../typing -I ../driver -I ../bytecomp -I ../tools -I ../toplevel/ -I ../stdlib -I ../otherlibs/str -I ../otherlibs/dynlink -I ../otherlibs/unix -I ../otherlibs/num -I ../otherlibs/graph -warn-error A -c odoc_config.ml
Current executable set to '/Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt' (x86_64).
(lldb) run
Process 73259 launched: '/Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt' (x86_64)
Process 73259 stopped

  • thread #1: tid = 0x4ade4, 0x00007fff89c30d49 libsystem_malloc.dyliblarge_malloc + 50, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00007fff89c30d49 libsystem_malloc.dyliblarge_malloc + 50
    libsystem_malloc.dylib`large_malloc + 50:
    -> 0x7fff89c30d49: movaps %xmm0, -64(%rbp)
    0x7fff89c30d4d: cmoveq %r13, %r14
    0x7fff89c30d51: shlq %cl, %r14
    0x7fff89c30d54: cmpq $134217727, %r14
    (lldb) bt
  • thread #1: tid = 0x4ade4, 0x00007fff89c30d49 libsystem_malloc.dyliblarge_malloc + 50, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00007fff89c30d49 libsystem_malloc.dyliblarge_malloc + 50
    frame #1: 0x00007fff89c363b6 libsystem_malloc.dylibszone_malloc_should_clear + 287 frame #2: 0x00007fff89c3887c libsystem_malloc.dylibmalloc_zone_malloc + 71
    frame #3: 0x00007fff89c39290 libsystem_malloc.dylibmalloc + 42 frame #4: 0x00000001001c7f6e ocamlopt.optcaml_stat_alloc + 14
    frame #5: 0x00000001001c3f99 ocamlopt.optcaml_init_frame_descriptors + 185 frame #6: 0x00000001001d9e03 ocamlopt.optcaml_next_frame_descriptor + 35
    frame #7: 0x00000001001d9efd ocamlopt.optcaml_stash_backtrace + 93 frame #8: 0x00000001001da95e ocamlopt.optcaml_raise_exn + 54
    frame #9: 0x000000010019d251 ocamlopt.opt.L200 + 13 (lldb) frame select 4 frame #4: 0x00000001001c7f6e ocamlopt.optcaml_stat_alloc + 14
    ocamlopt.optcaml_stat_alloc + 14: -> 0x1001c7f6e: testq %rax, %rax 0x1001c7f71: je 0x1001c7f7a ; caml_stat_alloc + 26 0x1001c7f73: addq $8, %rsp 0x1001c7f77: popq %rbx (lldb) register read General Purpose Registers: rbx = 0x0000000000080000 rbp = 0x00007fff5fbff5a8 rsp = 0x00007fff5fbff598 r12 = 0x00007fff5fbff568 r13 = 0x00000001001da921 ocamlopt.optcaml_start_program + 165
    r14 = 0x0000000000010000
    r15 = 0x00000001010009a0
    rip = 0x00000001001c7f6e ocamlopt.opt`caml_stat_alloc + 14
    13 registers were unavailable.

File attachments

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 18, 2013

Comment author: @alainfrisch

Can you check that the version your tried is after revision 14294 from the upstream SVN? (i.e. check that asmrun/fail.c includes "callback.h")

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 18, 2013

Comment author: @mshinwell

Also, which version of Mac OS X is this?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 18, 2013

Comment author: @johnwhitington

SVN 14302 builds fine on OS X 10.9 with the latest XCode.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 19, 2013

Comment author: yminsky

I'm building with the latest xcode on 10.9. And the same box can build older versions, e.g., I built 4.00.1 on the same box after the build of trunk failed.

I'm not sure what extra debug info would be helpful for tracking this down. It's clearly not an issue with all os x builds.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 19, 2013

Comment author: yminsky

And I can confirm for Alain that it was exactly 14294 that I built.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 19, 2013

Comment author: @mshinwell

I'm going to look at this on yminsky's machine.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 20, 2013

Comment author: @alainfrisch

This should be fixed by commit 14307.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 21, 2013

Comment author: yminsky

Trying the latest version (14307), I new get it to fail in a different place:

ocaml-trunk $ lldb -- /Users/yminsky/Documents/code/ocaml-trunk/ocamlc.opt -nostdlib -I ../../stdlib -c -w +33..39 -warn-error A -g -nolabels unix.mli
Current executable set to '/Users/yminsky/Documents/code/ocaml-trunk/ocamlc.opt' (x86_64).
(lldb) run
Process 22317 launched: '/Users/yminsky/Documents/code/ocaml-trunk/ocamlc.opt' (x86_64)
Process 22317 stopped

  • thread #1: tid = 0xd10c9, 0x00007fff89c30d49 libsystem_malloc.dyliblarge_malloc + 50, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00007fff89c30d49 libsystem_malloc.dyliblarge_malloc + 50
    libsystem_malloc.dylib`large_malloc + 50:
    -> 0x7fff89c30d49: movaps %xmm0, -64(%rbp)
    0x7fff89c30d4d: cmoveq %r13, %r14
    0x7fff89c30d51: shlq %cl, %r14
    0x7fff89c30d54: cmpq $134217727, %r14
    (lldb) bt
  • thread #1: tid = 0xd10c9, 0x00007fff89c30d49 libsystem_malloc.dyliblarge_malloc + 50, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00007fff89c30d49 libsystem_malloc.dyliblarge_malloc + 50
    frame #1: 0x00007fff89c363b6 libsystem_malloc.dylibszone_malloc_should_clear + 287 frame #2: 0x00007fff89c3887c libsystem_malloc.dylibmalloc_zone_malloc + 71
    frame #3: 0x00007fff89c39290 libsystem_malloc.dylibmalloc + 42 frame #4: 0x000000010018c6ce ocamlc.optcaml_stat_alloc + 14
    frame #5: 0x00000001001886f9 ocamlc.optcaml_init_frame_descriptors + 185 frame #6: 0x000000010019e563 ocamlc.optcaml_next_frame_descriptor + 35
    frame #7: 0x000000010019e65d ocamlc.optcaml_stash_backtrace + 93 frame #8: 0x000000010019f1f6 ocamlc.optcaml_raise_exn + 54
    frame #9: 0x0000000100161f41 ocamlc.opt`.L200 + 13
@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 21, 2013

Comment author: @mshinwell

I'm trying to reproduce this now...

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 21, 2013

Comment author: @mshinwell

This is a horrid one. I couldn't reproduce it but then realized what's wrong: it's faulting because %rbp isn't 16-byte aligned on that 128-bit move in [large_malloc].

So it looks like this is very similar to mantis 5700. C functions have to be entered with %rsp mod 16 = 8. I have to go now, and I haven't yet identified exactly where this rule is being broken, but it should be enough for you (Alain!) to go on. My suspicion is that the assembly code of [caml_raise_exn] (and perhaps [caml_reraise_exn] in some cases) is being called with the wrong stack alignment.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 27, 2013

Comment author: @mshinwell

I haven't managed to reproduce this yet. If anyone can reproduce it, please let me know. I expect to be able to get access to yminsky's machine in a couple of weeks.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 27, 2013

Comment author: @avsm

I've successfully built trunk (r14390, remove camlp4) on OS X 10.9 and passed all tests with this gcc:

$ gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix
flick:testsuite avsm$ clang -v
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix

I've also tried a build with various Malloc options enabled to see if that'll make a difference, which it hasn't. Yaron, how much memory do you have in your laptop (mine is 8GB, so I should be in high memory too).

$ env MallocScribble=1 MallocPreScribble=1 MallocGuardEdges=1 make world.opt

Not sure what else to try to reproduce this one.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 27, 2013

Comment author: @damiendoligez

I can't reproduce this problem on any of my Macs.
I've even tried to activate the -with-frame-pointers option to configure (had to patch configure to allow me), but that didn't crash.

I'm on r14310.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 27, 2013

Comment author: @damiendoligez

XL found the bug.
To reproduce, launch "OCAMLRUNPARAM=b make world.opt".

Xavier will explain the bug and post a patch soon.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 27, 2013

Comment author: @xavierleroy

Consider:

let f x = raise x

Compile this with ocamlopt -g, and you'll see that the stack (initially = 8 mod 16) is not realigned to 0 mod 16 before calling caml_raise_exn.

Why? because ocamlopt treats this function as a leaf function (!Proc.contains_calls = false) which does not need allocation of a proper stack frame.

The criteria for a leaf function are pretty strict: it should

  • not contain any call to a Caml function (except tail calls)
  • not contain any call to a C function
  • not allocate (because this can call the GC)
  • not spill any temporaries to the stack.

Can you spot the missing case? Yes, there is one: if the function contains a "raise" and is compiled with -g, a call to a C function (caml_stash_backtrace) can occur, so it must not be a tail function.

This issue has been with us for a long time, but I believe it shows up only now because of Alain's recent optimization of constant exceptions. Before, raising such an exception would always allocate, causing the enclosing function to lose its leaf status. Now, we have more cases of useful functions that raise exceptions but don't allocate.

The fix is pretty simple: set Proc.contains_calls to true if the function contains a "raise" (not of the "notrace" kind) and is compiled with -g.

This fix is committed on SVN trunk, r14136, and a patch is attached.

Please let us know if this fixes the crash; then, I'll port it to the 4.01 branch.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 27, 2013

Comment author: @avsm

Confirmed the crash and the fix on OSX 10.9 and 4.02.0dev+trunk.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 27, 2013

Comment author: @johnwhitington

Patch as applied in 14316 fixes the crash here.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 28, 2013

Comment author: @damiendoligez

Confirmed the fix on OSX 10.7 with Xcode 4.6.3.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 28, 2013

Comment author: @xavierleroy

Thanks for the confirmations. Fix also applied to 4.01 bugfix branch, r14320. Marking this PR as resolved.

@vicuna vicuna closed this Dec 11, 2015

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.