Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal recovery of 2 stack overflows with ocamlopt in Mac OS #5976

Closed
vicuna opened this Issue Apr 5, 2013 · 5 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link
Collaborator

vicuna commented Apr 5, 2013

Original bug ID: 5976
Reporter: pboutill
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2015-12-11T18:19:31Z)
Resolution: fixed
Priority: normal
Severity: major
Platform: x86_64
OS: MacOS
OS Version: 10.5-10.8
Version: 4.00.1
Target version: 4.01.0+dev
Fixed in version: 4.01.0+dev
Category: runtime system and C interface

Bug description

The following code produces the output

Illegal instruction: 4

(only while compile in native)

Steps to reproduce

(* compile the following code with ocamlopt *)
let rec f () = f () ; f ()

let rec loop i =
if i <= 0 then print_string "OK\n" else
try
f ()
with Stack_overflow -> loop (pred i)

let () = loop 2 (* works for 1 *)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 5, 2013

Comment author: ppedrot

This bug is cumbersome in Coq, because whenever a computation raises a Stack_overflow, the user cannot do anything but restart coqtop to recover properly the next Stack_overflow failure.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 6, 2013

Comment author: @alainfrisch

It is also known that stack flow recovery does not work well under Windows. What about a mode where the runtime would stop cleanly, with a proper error message, upon stack overflow, instead of trying to recover from it?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 7, 2013

Comment author: @xavierleroy

The same Caml code works fine under Linux x86-64, so there's something specific to MacOS X to be investigated.

@Frisch: stack overflow as clean fatal error wouldn't help with the Coq use case mentioned by ppedrot. Also, even printing an error message can be challenging when your program is really out of stack space. But I welcome sample implementations, esp. for Windows.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 8, 2013

Comment author: @xavierleroy

Further investigations: I tried to reproduce the problem in pure C code, using setjmp/longjmp to simulate exceptions, and the problem does not show up. Looking further into the implementation of longjmp() on MacOS X, it appears that it goes to great lengths to call the undocumented "sigreturn" syscall when exiting from a signal handler. I have the impression that this is especially important when the signal was taken on an alternate stack.

My theory at this point is as follows: the OCaml runtime exits the handler for the stack overflow signal by raising an OCaml exception. This cuts the stack just fine, but does not call "sigreturn". As a consequence, the alternate stack for this handler may not be reset properly, and taking a second stack overflow signal on this alternate stack causes the kernel to abort the program.

This needs to be confirmed further, knowing that gdb under MacOS X is unable to step through a SIGSEGV signal handler...

A possible workaround would be to simulate the raising of the Stack_overflow exception from within the signal handler, by tweaking the saved registers from the ucontext, then returning "normally". This would be a major hack and I'm unsure it can be done in time for release 4.01.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jun 9, 2013

Comment author: @xavierleroy

Tentative fix in trunk, commits r13759 and r13760. The fix is to return normally from segv_handler, after changing the PC in the signal context to point to caml_stack_overflow in amd64.S, which actually raises the exception. Whether to use this trick is governed by RETURN_AFTER_STACK_OVERFLOW defined or not in asmrun/signals_osdep.h. For the time being, it is defined only for amd64/macosx.

Note: stack backtraces on Stack_overflow exceptions were not reliably recorded by the old implementation, to begin with, but this alternate implementation makes it fundamentally impossible to record them, as we don't have the stack space required to do so. This could be an additional reason to stick to the old implementation on all platforms where it works.

@vicuna vicuna closed this Dec 11, 2015

@vicuna vicuna added the stdlib label Mar 14, 2019

@vicuna vicuna added this to the 4.01.0 milestone Mar 14, 2019

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.