Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing the risk of segfault due to stack overflow #5485

vicuna opened this issue Jan 19, 2012 · 2 comments

Reducing the risk of segfault due to stack overflow #5485

vicuna opened this issue Jan 19, 2012 · 2 comments


Copy link

@vicuna vicuna commented Jan 19, 2012

Original bug ID: 5485
Reporter: @mjambon
Status: closed (set by @xavierleroy on 2015-12-11T18:19:42Z)
Resolution: fixed
Priority: normal
Severity: feature
Platform: Linux/AMD64
OS Version: 2.6.34
Version: 3.12.1
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5064
Monitored by: @ygrek

Bug description

We just had a segfault due a stack overflow. It occurred during the compare_val function while doing a Hashtbl.find_all on an excessively deep bucket. It took me close to two days to identify the source of the problem and to fix it in our unfriendly environment (Hadoop map/reduce). Having a stack trace in this case would have saved about a day of debugging.

Here is a simple repro case:

----- -----
let main () =
let n = 1_000_000 in
let tbl = Hashtbl.create n in
let k = "a" in
for i = 1 to n do
Hashtbl.add tbl k ()
print_endline "find_all";
ignore (Hashtbl.find_all tbl k)

let () =
Printexc.record_backtrace true;
main ()

$ ocamlopt -o overflow -g
$ ulimit -c unlimited
$ ./overflow
Segmentation fault (core dumped)

Xavier's last comment on a similar bug report is:

"Mark Shinwell's analysis is correct. We can catch SEGV arising from stack overflows in Caml code reasonably well, but we cannot recover from a SEGV arising in the middle of C code. I'm afraid this is a "cannot fix" situation."

(see #4843#c5094)

Here gdb tells us that the crash occurs during compare_val() which is
used to compare the query key with the keys in the hash table's bucket:

$ gdb overflow core
GNU gdb (Gentoo 7.3.1 p2) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /home/martin/tmp/overflow/overflow...(no debugging
symbols found)...done.
[New LWP 7204]

warning: Can't read pathname for load map: Input/output error.
Core was generated by `./overflow'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000041affc in compare_val ()

In the absence of a better solution, something helpful would be a compile-time flag that adds a check before each function call. A Stack_overflow exception would be raised N bytes before reaching the stack size limit. It means that any sequence of C function calls would have at least N bytes of stack space to work on. This would make a C function much less likely to trigger a stack overflow.

A compile-time flag causing a reasonable slowdown (< 2x) would greatly facilitate debugging.

Copy link

@vicuna vicuna commented Feb 16, 2012

Comment author: @xavierleroy

It might be possible to raise a Stack_overflow in this case (if there is not enough stack space available before calling into a C function), either as suggested in #5064, or (even more cheaply) by putting the "stack touch" sequence in caml_c_call and caml_call_gc. Generating a meaningful stack backtrace is much more difficult, though.

Copy link

@vicuna vicuna commented Feb 17, 2012

Comment author: @xavierleroy

Reasonable (but not 100% perfect) fix implemented in SVN trunk. It does generate stack backtraces! See #5064 for a discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant