Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

native binaries crash in top-level exception handler #5700

Closed
vicuna opened this issue Jul 26, 2012 · 21 comments
Closed

native binaries crash in top-level exception handler #5700

vicuna opened this issue Jul 26, 2012 · 21 comments

Comments

@vicuna
Copy link

@vicuna vicuna commented Jul 26, 2012

Original bug ID: 5700
Reporter: @avsm
Status: closed (set by @xavierleroy on 2015-12-11T18:25:33Z)
Resolution: fixed
Priority: high
Severity: crash
OS: MacOS X
OS Version: 10.8
Version: 4.00.0+beta2/+rc1
Target version: 4.00.1+dev
Fixed in version: 4.00.1+dev
Category: back end (clambda to assembly)
Monitored by: serp @ygrek "Richard Jones" @dbuenzli

Bug description

With MacOS X 10.8 and latest XCode, native code binaries seem to crash if invoked from a subshell, with OCAMLRUNPARAM set to b.

gdb ocamlbuild
GNU gdb 6.3.50-20050815 (Apple version gdb-1820) (Sat Jun 16 02:40:11 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .. done

(gdb) run -clean
Starting program: /Users/avsm/.opam/4.00.0+rc1/bin/ocamlbuild -clean
Reading symbols for shared libraries +............................. done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x00007fff89012f88 in large_malloc ()
(gdb) bt
#0 0x00007fff89012f88 in large_malloc ()
#1 0x00007fff8901974f in szone_malloc_should_clear ()
#2 0x00007fff8900b183 in malloc_zone_malloc ()
#3 0x00007fff8900bbd7 in malloc ()
#4 0x000000010007b7d4 in caml_stat_alloc ()
#5 0x0000000100077ed5 in caml_init_frame_descriptors ()
#6 0x000000010008db66 in caml_stash_backtrace ()
#7 0x000000010008e01d in caml_raise_exn ()
Previous frame inner to this frame (gdb could not unwind past this frame)
(gdb) The program is running. Exit anyway? (y or n) y

Steps to reproduce

I can reproduce this reliably by:

$ git clone http://github.com/mirage/ocaml-cstruct
$ cd ocaml-cstruct/unix
$ make (or make clean for a simpler example).

It doesnt seem to happen directly from a shell, nor with a trivial Makefile that invokes ocamlbuild -clean directly. Narrowing it down now...

File attachments

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 26, 2012

Comment author: @lefessan

I couldn't reproduce the problem on my Linux computer (I have no Mac OS X computer available). Could you just test this in the same directory:

echo "include Hashtbl" > test.ml
ocamlopt -g -o test test.ml
./test

On my computer, "caml_init_frame_descriptors" is first called from the "String.contains" included in the "randomized_default" initialization of Hashtbl.

Narrowing down would probably include compiling "libasmrund.a" in trunk/asmrun, and linking the test program above (if it fails too) with it to be able to have better debugging information.

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 26, 2012

Comment author: @avsm

This only happens on 10.8 x86_64 for me, and not reproducible on any other OS for me. The test-case above doesn't crash, and it can only be triggered when OCAMLRUNPARAM=b. I've got ocamlbuild in the 4.00 tree crashing with this shell script to build it after a 'make world.opt':

[code]

#!/bin/sh -ex
cd asmrun && make libasmrun.a && cp libasmrun.a ../stdlib
cd ../_build
../ocamlcompopt.sh -verbose -nostdlib unix.cmxa -g -I stdlib -I ../otherlibs/unix ocamlbuild/ocamlbuild_executor.cmx ocamlbuild/ocamlbuild_pack.cmx ocamlbuild/ocamlbuild_unix_plugin.cmx ocamlbuild/ocamlbuild.cmx -o ocamlbuild/ocamlbuild.native
export OCAMLRUNPARAM=b
#export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
export MallocGuardEdges=1
export MallocCheckHeapStart=1
export MallocCheckHeapEach=1
export MallocScribble=1
./ocamlbuild/ocamlbuild.native -clean
[/code]

Note that if the DYLD_INSERT_LIBRARIES is uncommented (to use the debug MacOS X malloc), then the program completes fine. The malloc checks don't make a difference. I'll try with libasmrund.a now and see if that also repros

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 26, 2012

Comment author: @avsm

Still happens with libasmrund.a, and here's the more helpful backtrace:

(gdb) run -clean
Starting program: /Users/avsm/src/git/bmeurer/ocaml/_build/ocamlbuild/ocamlbuild.native -clean
bash(69204) malloc: enabling scribbling to detect mods to free blocks
arch(69204) malloc: enabling scribbling to detect mods to free blocks
Reading symbols for shared libraries +............................. done
ocamlbuild.native(69204) malloc: enabling scribbling to detect mods to free blocks

OCaml runtime: debug mode

Initial minor heap size: 2048k bytes
Initial major heap size: 992k bytes
Initial space overhead: 80%
Initial max overhead: 500%
Initial heap increment: 992k bytes
Initial allocation policy: 0

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x00007fff89012f88 in large_malloc ()
(gdb) bt
#0 0x00007fff89012f88 in large_malloc ()
#1 0x00007fff8901974f in szone_malloc_should_clear ()
#2 0x00007fff8900b183 in malloc_zone_malloc ()
#3 0x00007fff8900bbd7 in malloc ()
#4 0x0000000100081918 in caml_stat_alloc (sz=131072) at memory.d.c:529
#5 0x0000000100078ad3 in caml_init_frame_descriptors () at roots.d.c:101
#6 0x00000001000a0f19 in caml_stash_backtrace (exn=4301260168, pc=4295272412, sp=0x7fff5fbff5d0 "?\005`", trapsp=0x7fff5fbff5e0 " ??_?") at backtrace.d.c:73
#7 0x00000001000a18fd in caml_raise_exn ()
Previous frame inner to this frame (gdb could not unwind past this frame)

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 26, 2012

Comment author: @avsm

I tell a lie; your test case does indeed trigger the error, but not with the debug library in this case:

$ ocamlopt -g -o test test.ml && OCAMLRUNPARAM= ./test
$ ocamlopt -g -o test test.ml && OCAMLRUNPARAM=b ./test
Segmentation fault: 11
$ ocamlopt -g -runtime-variant d -o test test.ml && OCAMLRUNPARAM=b ./test

OCaml runtime: debug mode

Initial minor heap size: 2048k bytes
Initial major heap size: 992k bytes
Initial space overhead: 80%
Initial max overhead: 500%
Initial heap increment: 992k bytes
Initial allocation policy: 0
$ ocamlopt -g -runtime-variant d -o test test.ml && OCAMLRUNPARAM= ./test

OCaml runtime: debug mode

Initial minor heap size: 2048k bytes
Initial major heap size: 992k bytes
Initial space overhead: 80%
Initial max overhead: 500%
Initial heap increment: 992k bytes
Initial allocation policy: 0
$

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 26, 2012

Comment author: @lefessan

Have you got other versions of OCaml running this example without crashing ? 3.12.1, 4.0 beta1 ? any trunk revision ?

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 27, 2012

Comment author: @avsm

I've now uninstalled all previous Homebrew and done a fresh install of 3.12.1, and that crashes on this 10.8 machine also in the same way, so this is not a 4.00 regression. Now however, I cannot get the 'include Hashtbl' to crash on either 3.12.1 or 4.00.0 whereas my above traces do show it segfaulting in the past. The ocaml-cstruct repository example (which uses oasis) does continue to segfault on both 3.12 and 4.0

If this is memory corruption, then it could be address-space randomisation causing the differences in behaviour between runs. I'm leaving a second Mac Mini upgrading to 10.8 so that I can eliminate this one machine as a cause. I noticed that there was another similar report on the Caml list about this same problem: https://sympa.inria.fr/sympa/arc/caml-list/2012-07/msg00142.html which indicates that it's not just my machine though.

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 27, 2012

Comment author: @lefessan

For 3.12.1, there is no randomization in Hashtbls, so the initialization code won't raise an exception. The problem probably appears later. Maybe simply raising an exception would trigger the bug:

let _ = raise Not_found

and is the simplest reproducible case for all versions.

Anyway, what is weird is that the bug appears inside "malloc", and the only reason I can see would be the corruption of the header/trailer of some previously allocated block with a previous malloc. Why this problem arises only on Mac OS X is another question...

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 27, 2012

Comment author: @avsm

The other odd thing is that the various malloc guard variables (which add guard pages per-allocation and scribble over fresh memory, and generally try to detect heap corruption) do not detect any corruption of the malloc structures.

Unfortunately, the pre-10.8 MacOS X method of disabling address-space randomisation (DYLD_NO_PIE) appears to have been removed in this version. I'll continue to try and find a reproducible small case (I can still repro it with the cstruct compilation, but not with a smaller test case anymore)

@vicuna
Copy link
Author

@vicuna vicuna commented Jul 30, 2012

Comment author: @avsm

Another upgraded Mac (from 10.7->10.8 and freshly installed OCaml toolchain) exhibits the same behaviour. It's reproducible by just running 'ocamlopt.opt' 3.12.1 or 4.00.0 with OCAMLRUNPARAM=b

$ uname -a
Darwin cubik.local 12.0.0 Darwin Kernel Version 12.0.0: Sun Jun 24 23:00:16 PDT 2012; root:xnu-2050.7.91/RELEASE_X86_64 x86_64
$ gcc -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.11
28/src/configure --disable-checking --enable-werror --prefix=/Applications/Xcode.app/Contents/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.11~28/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
$ ocamlopt -v
The Objective Caml native-code compiler, version 3.12.1
Standard library directory: /usr/local/lib/ocaml
$ ocamlopt.opt
Segmentation fault: 11
cubik:x avsm$ gdb ocamlopt.opt
GNU gdb 6.3.50-20050815 (Apple version gdb-1820) (Sat Jun 16 02:40:11 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .. done

(gdb) run
Starting program: /usr/local/bin/ocamlopt.opt
Reading symbols for shared libraries +............................. done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x00007fff89d2ff88 in large_malloc ()
(gdb) bt
#0 0x00007fff89d2ff88 in large_malloc ()
#1 0x00007fff89d3674f in szone_malloc_should_clear ()
#2 0x00007fff89d28183 in malloc_zone_malloc ()
#3 0x00007fff89d28bd7 in malloc ()
#4 0x00000001001523c4 in caml_stat_alloc ()
#5 0x000000010014eb05 in caml_init_frame_descriptors ()
#6 0x0000000100162fd6 in caml_stash_backtrace ()
#7 0x00000001001634fc in .L111 ()
#8 0x000000010014e7cd in caml_raise_constant ()
#9 0x000000010014e7f0 in caml_raise_not_found ()
#10 0x000000010015d313 in caml_sys_getenv ()
#11 0x00000001001633ac in caml_c_call ()
Previous frame inner to this frame (gdb could not unwind past this frame)

I've tried a few smaller test programs to see if I can spot the heap corruption before malloc, but nothing triggers it yet. Using ocamlopt.opt as the failing testcase, the program crashes after startup.c/caml_start_program is called.

If I initialise the frame tables early in startup.c by adding:

#include "roots.h"
if (caml_frame_descriptors == NULL) { caml_init_frame_descriptors(); }

to startup.c just before "res = caml_start_program", then the problem vanishes.

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 2, 2012

Comment author: @ygrek

As this first apeeared in 4.00 and is connected with debugging - a blind guess - maybe CFI is an issue? (e.g. macos profiles mallocs and samples stack at each allocation and due to incorrect(?) cfi information fails badly?). What happens if one compiles ocaml without cfi enabled (after configure set ASM_CFI_SUPPORTED=no in Makefile.config)?

PS Not connected, but I have seen tcmalloc segfaulting on linux while probing stack for profiling.

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 2, 2012

Comment author: @xavierleroy

I have a possible explanation, but it's a wild shot.

MacOS X is very touchy about the stack pointer being 16-aligned, or more precisely about C functions being entered with rsp mod 16 = 8. That's because their C compiler sometimes emits SSE2 128-bit load and store instructions that demand 16-alignment of their target addresses.

Re-reading the code of caml_raise_exception in asmrun/amd64.S, I see that it violates this alignment constraint: caml_stash_backtrace is entered with rsp mod 16 = 0. Most of the time, caml_stash_backtrace doesn't call any C library function, so no bad things happen. If, however, the frame table was not initialized before, caml_stash_backtrace calls caml_init_frame_descriptors, which does a lot of work, including calling malloc(). And maybe the malloc() in 10.8 happens to use those strictly-aligned SSE2 instructions.

Bottom line: could you please try to apply the patch (attached to this PR and included below for e-mail convenience) to asmrun/amd64.S and let us know if the problem is still here? What the patch does is simply to fix caml_raise_exception so that it maintains proper stack alignment.

Index: asmrun/amd64.S

--- asmrun/amd64.S (revision 12802)
+++ asmrun/amd64.S (working copy)
@@ -483,9 +483,10 @@
LBL(110):
movq %rax, %r12 /* Save exception bucket /
movq %rax, C_ARG_1 /
arg 1: exception bucket */

  •    movq    0(%rsp), C_ARG_2      /* arg 2: pc of raise */
    
  •    leaq    8(%rsp), C_ARG_3      /* arg 3: sp of raise */
    
  •    popq    C_ARG_2               /* arg 2: pc of raise */
    
  •    movq    %rsp, C_ARG_3         /* arg 3: sp at raise */
       movq    %r14, C_ARG_4         /* arg 4: sp of handler */
    
  • /* #5700: thanks to popq above, stack is now 16-aligned /
    PREPARE_FOR_C_CALL /
    no need to cleanup after /
    call GCALL(caml_stash_backtrace)
    movq %r12, %rax /
    Recover exception bucket */

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 2, 2012

Comment author: serp

Patch is OK. On Mac OS 10.8 "ocaml 4.00.0" with "OCAMLRUNPARAM=b" - compiled successfuly. Thanks.

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 2, 2012

Comment author: @xavierleroy

Patch to amd64.S committed in 4.00 bugfix branch (r12815) and on trunk (r12816). I'm not 100% sure it fixes avsm's original issue, but optimistically assume that it does. Please reopen this PR if the problem persists.

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 2, 2012

Comment author: @damiendoligez

I have uploaded a version of Xavier's patch that applies cleanly to 4.00.0, and I confirm that it fixes at least avsm's "ocamlopt.opt" repro case.

It seems that you will need to "make clean" after applying this patch (instead of rebuilding right away).

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 2, 2012

Comment author: @avsm

Sorry about the (vacation-induced) delayed response. The patch does indeed eliminate the segfault on 4.00.0 for me also, and the fix is confirmed in gdb:

$ env OCAMLRUNPARAM=b gdb ./ocamlopt-4.00.0.opt.broken
(gdb) break caml_stash_backtrace
Breakpoint 1 at 0x1001a7288: file backtrace.d.c, line 65.
(gdb) run
Breakpoint 1, caml_stash_backtrace (exn=4313839736, pc=4296384572, sp=0x7fff5fbff9b0 "??\037\001\001", trapsp=0x7fff5fbff9c0 "") at backtrace.d.c:65
(gdb) print $rsp
$1 = (void *) 0x7fff5fbff958
(gdb) cont
Continuing.
Program received signal EXC_BAD_ACCESS, Could not access memory.

With the patch, rsp is 16-byte aligned after the CALL instruction to caml_stash_backtrace:

(gdb) break caml_stash_backtrace
Breakpoint 1 at 0x100194ebd
(gdb) run
Starting program: /Users/avsm/src/git/bmeurer/ocaml/ocamlopt-4.00.fixed
Reading symbols for shared libraries +............................. done
Breakpoint 1, 0x0000000100194ebd in caml_stash_backtrace ()
(gdb) print $rsp
$1 = (void *) 0x7fff5fbff950

I backported this to 3.12.1 to help with migrating our repositories, and it gets further but still crashes shortly afterwards from an early caml_c_call:

Reason: 13 at address: 0x0000000000000000
0x00007fff8c3e6f88 in large_malloc ()
(gdb) bt
#0 0x00007fff8c3e6f88 in large_malloc ()
#1 0x00007fff8c3ed74f in szone_malloc_should_clear ()
#2 0x00007fff8c3df183 in malloc_zone_malloc ()
#3 0x00007fff8c3dfbd7 in malloc ()
#4 0x00000001001523c4 in caml_stat_alloc ()
#5 0x000000010014eb05 in caml_init_frame_descriptors ()
#6 0x0000000100162fd6 in caml_stash_backtrace ()
#7 0x00000001001634f8 in .L111 ()
#8 0x000000010014e7cd in caml_raise_constant ()
#9 0x000000010014e7f0 in caml_raise_not_found ()
#10 0x000000010015d313 in caml_sys_getenv ()
#11 0x00000001001633ac in caml_c_call ()

rsp is also misaligned here in caml_c_call:
Breakpoint 1, 0x0000000100163380 in caml_c_call ()
(gdb) print $rsp
$1 = (void *) 0x7fff5fbff9b8

I couldn't spot any differences between the 3.12.1 and 4.00.0 caml_c_call implementations, so I just wanted to check that it isn't just working by a lucky alignment.

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 3, 2012

Comment author: @xavierleroy

Thanks again for the precious feedback. I don't have MacOS 10.8 installed, so I just instrumented amd64.S to check stack alignment before every call to C functions, and lo and behold, there is another call to caml_stash_backtrace with a misaligned SP...

Attached to this PR (alignment-caml-raise-exception-2.diff) and included below for e-mail convenience is a second patch, to be applied on top of the previous one, which should complete the fix. Let me know how it goes.

(For 3.12.1, just insert "subq $8, %rsp" before "call GCALL(caml_stash_backtrace)" in asmrun/amd64.S, function caml_raise_exception.)

Index: amd64.S

--- amd64.S (revision 12816)
+++ amd64.S (working copy)
@@ -510,6 +510,7 @@
LOAD_VAR(caml_last_return_address,C_ARG_2) /* arg 2: pc of raise /
LOAD_VAR(caml_bottom_of_stack,C_ARG_3) /
arg 3: sp of raise /
LOAD_VAR(caml_exception_pointer,C_ARG_4) /
arg 4: sp of handler */

  •    subq    $8, %rsp              /* #5700: maintain stack alignment */
       PREPARE_FOR_C_CALL            /* no need to cleanup after */
       call    GCALL(caml_stash_backtrace)
       movq    %r12, %rax            /* Recover exception bucket */
    

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 3, 2012

Comment author: @xavierleroy

Second patch commited in 4.00 bugfix branch (r12817) and in trunk (r12818).

@vicuna
Copy link
Author

@vicuna vicuna commented Aug 3, 2012

Comment author: @avsm

Perfect! A quick spin sees everything working, and I'll try it more when I'm back next week. For anyone else who needs a quick fix, I've uploaded combined patches against 3.12.1 and 4.00.0 to this ticket, and submitted pull requests to Homebrew:

3.12.1: Homebrew/legacy-homebrew#13913
4.00.0: in my tree in http://github.com/avsm/homebrew (ocaml4-upgrade branch) while I test it more

@vicuna
Copy link
Author

@vicuna vicuna commented Jan 6, 2014

Comment author: Richard Jones

FYI I have hit the same issue in slightly different circumstances. It's basically an example of this:
http://www.bailopan.net/blog/?p=7

  • Platform is Linux, RHEL 6.

  • 32 bit i386 code generated by the OCaml i386 code generator.

  • OCaml code is calling into the garbage collector in C.

(gdb) bt
#0 0x0818e054 in caml_major_collection_slice (howmuch=howmuch@entry=0)
at major_gc.c:399
#1 0x0818e90a in caml_minor_collection () at minor_gc.c:281
#2 0x0818d653 in caml_garbage_collection () at signals_asm.c:71
#3 0x0819e31e in caml_system__code_begin ()
#4 0x080bd380 in camlEnv__store_value_1751 ()
#5 0x08105251 in camlTypecore__fun_5704 ()
#6 0x0810cb55 in camlTypecore__add_pattern_variables_1753 ()
#7 0x0810cd79 in camlTypecore__type_pattern_list_1778 ()
#8 0x0810e946 in camlTypecore__type_let_2156 ()
#9 0x08115cf8 in camlTypecore__type_binding_2824 ()
#10 0x0812aef9 in camlTypemod__type_struct_1875 ()
#11 0x0812af14 in camlTypemod__type_struct_1875 ()
#12 0x0812af14 in camlTypemod__type_struct_1875 ()
#13 0x0812af14 in camlTypemod__type_struct_1875 ()
#14 0x0812af14 in camlTypemod__type_struct_1875 ()
#15 0x0812af14 in camlTypemod__type_struct_1875 ()
#16 0x0812af14 in camlTypemod__type_struct_1875 ()
#17 0x0812af14 in camlTypemod__type_struct_1875 ()
#18 0x0812af14 in camlTypemod__type_struct_1875 ()
#19 0x0812af14 in camlTypemod__type_struct_1875 ()
#20 0x0812af14 in camlTypemod__type_struct_1875 ()
#21 0x0812af14 in camlTypemod__type_struct_1875 ()
#22 0x0812af14 in camlTypemod__type_struct_1875 ()
#23 0x0812af14 in camlTypemod__type_struct_1875 ()
#24 0x0812af14 in camlTypemod__type_struct_1875 ()
#25 0x0812af14 in camlTypemod__type_struct_1875 ()
#26 0x0812af14 in camlTypemod__type_struct_1875 ()
#27 0x0812af14 in camlTypemod__type_struct_1875 ()
#28 0x0812af14 in camlTypemod__type_struct_1875 ()
#29 0x0812af14 in camlTypemod__type_struct_1875 ()
#30 0x0812ed47 in camlTypemod__type_structure_1820 ()
#31 0x081301b6 in camlTypemod__type_implementation_2125 ()
#32 0x0808cbca in camlOptcompile__implementation_1040 ()
#33 0x0804cd03 in camlOptmain__process_implementation_file_1015 ()
#34 0x0817c144 in camlArg__parse_argv_1093 () at arg.ml:208
#35 0x0817c25e in camlArg__parse_1139 () at arg.ml:216
#36 0x0804d93e in camlOptmain__main_1308 ()
#37 0x0804e847 in camlOptmain__entry ()
#38 0x08049dbd in caml_program ()
#39 0x0819e45a in caml_start_program ()
#40 0x0819e92a in caml_main (argv=0xffc6d724,
argv@entry=<error reading variable: Cannot access memory at address 0xf>)
at startup.c:189
#41 0x08049754 in main (
argc=<error reading variable: Cannot access memory at address 0xb>,

  • gcc appears to assume 16 byte stack alignment, and therefore emits movapd instructions such as:

=> 0x0818e054 <+324>: movapd %xmm1,0x20(%esp)

  • Since the stack is not aligned, %esp = 0xffc6cf9c and so the instruction segfaults.

I will see if I can make a variant of the amd64 patch to work on 32 bit and see if that fixes the bug.

@vicuna
Copy link
Author

@vicuna vicuna commented Jan 6, 2014

Comment author: Richard Jones

There is a long thread/argument here which basically says we need to use 16 byte stack alignment in order to interoperate with gcc:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38496

@vicuna
Copy link
Author

@vicuna vicuna commented Jan 6, 2014

Comment author: Richard Jones

FWIW I decided to workaround (instead of fix) this issue by compiling OCaml with:

CFLAGS=-mpreferred-stack-boundary=2 ./configure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant