ocamlopt generated code: segmentation fault instead of stack overflow #5064
Original bug ID: 5064
This is related to #4297 , however I didn't find a way to reopen that bug. Please consider fixing this, see a possible solution in the 'Additional Information' section of this bug.
I encountered a segmentation fault when running menhir (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=583300), and I reduced the problem down to the attached testcase.
Problem is that there is not enough stack space available to invoke the C function caml_equal, and ocaml's sigsegv handler only raises Stack_overflow for OCaml code. This is generally a good idea, since the C code could itself be responsible for the overflow, and there is no way to know that raising an OCaml exception (instead of segfaulting) won't result in inconsistent state for the C part.
However the equality comparison is part of the OCaml language (and any native C code invoked by standard libraries would be too), and I think it is important to have a robust protection against stack overflows. Functional programs are recursive by their nature, and getting a segfault due to a simple runaway recursion is not acceptable.
Here are some suggestions:
Implementing 1. is very simple, and doesn't require much additional code:
Then if you don't have at least 8k left on the stack, you'll get a SIGSEGV at the 'movq', which will be considered as still part of the OCaml code, and you get a nice Stack_Overflow instead of a segfault.
I tested this modification, and it works!
Before the modification to y.s I got the segfault. (I captured the .o files by using a bash script that I passed as -cc to ocamlopt).
The text was updated successfully, but these errors were encountered:
Comment author: @edwintorok
I have measured the performance and code size impact of implementing my suggestion #2.
There is no measurable performance difference (all differences are within the standard deviation, and are <1%)
Code size for ocamlopt.opt:
I think this stack check could be eliminated when -unsafe is specified, and then you'd get back the original code size.
Measurements done on Debian unstable, x86_64, Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz, 4GB RAM.
Comment author: @ygrek
Here is another example (from caml-list, "Segmentation faults in unexpected places" by Peter Frey at Tue, 9 Aug 2011 18:16:00 -0400) :
$ cat a.ml
let stol len = (* create some bogus int list of length len *)
let () =
$ uname -rms
Note the random crashing.
In this case offending C function is caml_call_gc and descendants :
Core was generated by `./a 261900'.
Comment author: gerd
This seems to be difficult to solve. I think a strategy could be based on two ideas:
Comment author: @xavierleroy
I implemented a 90% complete fix (commits r12159 and r12160) whereas the stack "touch", checking that 4K words of stack are available and raising Stack_overflow otherwise, is performed in the asm glue between ocamlopt-generated code and the OCaml runtime system, namely in glue functions caml_c_call and caml_call_gc.
A good thing about this solution is that correct stack backtraces are generated, since caml_bottom_of_stack and caml_last_return_address are properly set up before the stack touch. Another good thing is that no extra code is generated by ocamlopt.
A weakness, compared with the original proposal, is that no stack touch is performed (yet) before calls to "noalloc" C functions, which do not go through caml_c_call. I need to run more experiments to assess the overhead of this solution, esp. when calling small functions from the math library.
The fix is currently implemented for x86 (32 and 64 bits) under Linux, BSD and MacOS X, as well as PowerPC under MacOS X. Testing on production code is most welcome.
The 4K words should be enough to run the GC and most of the OCaml runtime functions, with the notable exception of hashing, marshaling and unmarshaling, which are recursive. An iterative reimplementation of these functions can be considered but is nontrivial work.