Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation error on a simple program #4843

Closed
vicuna opened this Issue Jul 28, 2009 · 10 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link
Collaborator

vicuna commented Jul 28, 2009

Original bug ID: 4843
Reporter: kwakita
Status: closed (set by @xavierleroy on 2009-09-16T09:16:41Z)
Resolution: not fixable
Priority: normal
Severity: crash
Version: 3.11.1
Category: ~DO NOT USE (was: OCaml general)
Monitored by: @mshinwell till @mmottl

Bug description

Attached is a small program (bug2.ml) that exhibits segmentation violation error when it is compiled by ocamlopt and run on Intel/Mac OS X 10.5.7.

Interestingly, if I remove line 11 (the one that starts with 'ignore'), SEGV error disappears and the program reports Stack_overflow exception, which is the expected behavior.

Additional information

My platform is Intel/Mac OS X 10.5.7
The ocaml source distribution 3.11.1 was downloaded and compiled by myself.

File attachments

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 28, 2009

Comment author: @mshinwell

This looks like a problem with the heuristic that determines whether a faulting address is in the stack. It may be related to issue 4746.

I suggest doing the following, which should collect the necessary information to determine what is wrong. (In case anyone is wondering, I think this will indeed give the stack pointer before it is switched to the alternate signal stack.)

$ ulimit -s
...write down what it reports...
$ gdb
(gdb) handle SIGSEGV stop nopass
(gdb) r
...segfault occurs...
(gdb) info reg
(gdb) p/x system_stack_top

Then before exiting the debugger, in another terminal, use vmmap (I think that's the right program on Mac OS X; it's called pmap on Linux) specifying the process ID of the Caml program -- not the debugger -- and note what it reports as the stack size.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 28, 2009

Comment author: kwakita

Shinwell, thank you for your suggestion. I followed your instruction and get the following outputs. I hope they are helpful.


dasher:bug$ ulimit -s
8192

dasher:bug$ gdb ./bug2
GNU gdb 6.3.50-20050815 (Apple version gdb-966) (Tue Mar 10 02:43:13 UTC 2009)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin"...Reading symbols for shared libraries ... done

(gdb) handle SIGSEGV stop nopass
Signal Stop Print Pass to program Description
SIGSEGV Yes Yes No Segmentation fault
(gdb) r
Starting program: /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug2
Reading symbols for shared libraries ++. done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xbf7ffffc
0x00007fb6 in compare_val ()
(gdb) info reg
eax 0x3 3
ecx 0x1 1
edx 0x1 1
ebx 0x8544 34116
esp 0xbf800000 0xbf800000
ebp 0xbf800078 0xbf800078
esi 0x3 3
edi 0x0 0
eip 0x7fb6 0x7fb6 <compare_val+9>
eflags 0x10286 66182
cs 0x17 23
ss 0x1f 31
ds 0x1f 31
es 0x1f 31
fs 0x0 0
gs 0x37 55
(gdb) p/x system_stack_top
$1 = Value can't be converted to integer.
(gdb)


Virtual Memory Map of process 17079 (bug2)
Output report format: 2.2 -- 32-bit process

==== Non-writable regions for process 17079
__PAGEZERO 00000000-00001000 [ 4K] ---/--- SM=NUL /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug2
__TEXT 00001000-00014000 [ 76K] r-x/rwx SM=COW /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug2
__LINKEDIT 0003c000-00045000 [ 36K] r--/rwx SM=COW /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug2
STACK GUARD 00045000-00046000 [ 4K] ---/rwx SM=NUL
STACK GUARD 00047000-00048000 [ 4K] ---/rwx SM=NUL
__TEXT 8fe00000-8fe2e000 [ 184K] r-x/rwx SM=COW /usr/lib/dyld
__LINKEDIT 8fe67000-8fe75000 [ 56K] r--/rwx SM=COW /usr/lib/dyld
__TEXT 91eac000-91eb4000 [ 32K] r-x/r-x SM=COW /usr/lib/libgcc_s.1.dylib
__TEXT 93589000-9358e000 [ 20K] r-x/r-x SM=COW /usr/lib/system/libmathCommon.A.dylib
__TEXT 9358e000-936f6000 [ 1440K] r-x/r-x SM=COW /usr/lib/libSystem.B.dylib
__LINKEDIT 9768c000-97a8b000 [ 4092K] r--/r-- SM=COW /usr/lib/system/libmathCommon.A.dylib
__IMPORT a0a2e000-a0a2f000 [ 4K] r-x/rwx SM=COW /usr/lib/libgcc_s.1.dylib
__IMPORT a0a48000-a0a4a000 [ 8K] r-x/rwx SM=COW /usr/lib/libSystem.B.dylib
STACK GUARD bc000000-bf800000 [ 56.0M] ---/rwx SM=NUL

==== Writable regions for process 17079
__DATA 00014000-00016000 [ 8K] rw-/rwx SM=COW /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug2
__DATA 00016000-0003b000 [ 148K] rw-/rwx SM=PRV /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug2
__IMPORT 0003b000-0003c000 [ 4K] rwx/rwx SM=COW /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug2
MALLOC (freed?) 00046000-00047000 [ 4K] rw-/rwx SM=COW
MALLOC_LARGE 00048000-000e7000 [ 636K] rw-/rwx SM=PRV DefaultMallocZone_0x100000
MALLOC_TINY 00100000-00200000 [ 1024K] rw-/rwx SM=COW DefaultMallocZone_0x100000
MALLOC_SMALL 00800000-01000000 [ 8192K] rw-/rwx SM=COW DefaultMallocZone_0x100000
__DATA 8fe2e000-8fe32000 [ 16K] rw-/rwx SM=COW /usr/lib/dyld
__DATA 8fe32000-8fe67000 [ 212K] rw-/rwx SM=ZER /usr/lib/dyld
__DATA a021e000-a021f000 [ 4K] rw-/rw- SM=COW /usr/lib/libgcc_s.1.dylib
shared pmap a0400000-a04dd000 [ 884K] rw-/rwx SM=COW
__DATA a04dd000-a04de000 [ 4K] rw-/rwx SM=COW /usr/lib/system/libmathCommon.A.dylib
__DATA a04de000-a0559000 [ 492K] rw-/rwx SM=COW /usr/lib/libSystem.B.dylib
shared pmap a0559000-a0600000 [ 668K] rw-/rwx SM=COW
Stack bf800000-bfffd000 [ 8180K] rw-/rwx SM=ZER thread 0
Stack bfffd000-bfffe000 [ 4K] rw-/rwx SM=PRV
Stack bfffe000-bffff000 [ 4K] rw-/rwx SM=ZER
Stack bffff000-c0000000 [ 4K] rw-/rwx SM=COW

==== Legend
SM=sharing mode:
COW=copy_on_write PRV=private NUL=empty ALI=aliased
SHM=shared ZER=zero_filled S/A=shared_alias

==== Summary for process 17079
ReadOnly portion of Libraries: Total=5936K resident=5168K(87%) swapped_out_or_unallocated=768K(13%)
Writable regions: Total=18.0M written=44K(0%) resident=8296K(45%) swapped_out=0K(0%) unallocated=9.9M(55%)

REGION TYPE [ VIRTUAL]
=========== [ =======]
MALLOC [ 9856K]
STACK GUARD [ 56.0M]
Stack [ 8192K]
__DATA [ 884K]
__IMPORT [ 16K]
__LINKEDIT [ 4184K]
__PAGEZERO [ 4K]
__TEXT [ 1752K]
shared pmap [ 1552K]

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 28, 2009

Comment author: @mshinwell

Can you confirm whether the output you posted is from the program that normally causes a segfault, or from the one which produces a Stack_overflow exception?

Whichever output you posted, please also paste the equivalent output from running the other program in the same manner.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 28, 2009

Comment author: kwakita

I might have been confused when I posted the note #5032. To make things more clear, I modified the program little bit, as follows.

---- begin bug3.ml -----------------
let segv = (Sys.argv.(1) = "segv")

module Make(Ord: Set.OrderedType) =
struct
type elt = Ord.t
type t = Empty | Node of t * elt

let empty = Empty

let rec add x e =
    function Empty -> Node(Empty, x)
    | Node(l, v) as s ->
        begin
          if segv then ignore (Ord.compare x v);
          Node(add x l s, v)
        end

end

module IntSet =
Make (struct
type t = int
let compare = compare
end);;

let _ =
let s = IntSet.add 0 (IntSet.empty) IntSet.empty in
IntSet.add 1 s s
---- end bug3.ml -----------------

It accepts one command-line argument, either "segv" or "overflow". When I execute this program with "./bug3 segv", it dies from SEGV signal. It finishes by receiving a Stack_overflow exception, when it is run by "./bug3 overflow".

The following is a copy of gdb interaction and vmmap information obtained from "./bug3 segv" (the case when the program is killed from SEGV signal).

---- begin gdb interaction with "./bug3 segv" -----------------
dasher:bug$ ulimit -s
8192

dasher:bug$ gdb ./bug3
GNU gdb 6.3.50-20050815 (Apple version gdb-966) (Tue Mar 10 02:43:13 UTC 2009)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin"...Reading symbols for shared libraries ... done

(gdb) handle SIGSEGV stop nopass
Signal Stop Print Pass to program Description
SIGSEGV Yes Yes No Segmentation fault
(gdb) r segv
Starting program: /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug3 segv
Reading symbols for shared libraries ++. done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xbf7ffffc
0x000080be in compare_val ()
(gdb) info reg
eax 0x3 3
ecx 0x1 1
edx 0x1 1
ebx 0x864c 34380
esp 0xbf800000 0xbf800000
ebp 0xbf800078 0xbf800078
esi 0x3 3
edi 0x0 0
eip 0x80be 0x80be <compare_val+9>
eflags 0x10286 66182
cs 0x17 23
ss 0x1f 31
ds 0x1f 31
es 0x1f 31
fs 0x0 0
gs 0x37 55
(gdb) p/x system_stack_top
---- end gdb interaction with "./bug3 segv" -----------------

---- begin vmmap of "./bug3 segv" -----------------
Virtual Memory Map of process 17706 (bug3)
Output report format: 2.2 -- 32-bit process

==== Non-writable regions for process 17706
__PAGEZERO 00000000-00001000 [ 4K] ---/--- SM=NUL /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug3
__TEXT 00001000-00014000 [ 76K] r-x/rwx SM=COW /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug3
__LINKEDIT 0003c000-00045000 [ 36K] r--/rwx SM=COW /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug3
STACK GUARD 00045000-00046000 [ 4K] ---/rwx SM=NUL
STACK GUARD 00047000-00048000 [ 4K] ---/rwx SM=NUL
__TEXT 8fe00000-8fe2e000 [ 184K] r-x/rwx SM=COW /usr/lib/dyld
__LINKEDIT 8fe67000-8fe75000 [ 56K] r--/rwx SM=COW /usr/lib/dyld
__TEXT 91eac000-91eb4000 [ 32K] r-x/r-x SM=COW /usr/lib/libgcc_s.1.dylib
__TEXT 93589000-9358e000 [ 20K] r-x/r-x SM=COW /usr/lib/system/libmathCommon.A.dylib
__TEXT 9358e000-936f6000 [ 1440K] r-x/r-x SM=COW /usr/lib/libSystem.B.dylib
__LINKEDIT 9768c000-97a8b000 [ 4092K] r--/r-- SM=COW /usr/lib/system/libmathCommon.A.dylib
__IMPORT a0a2e000-a0a2f000 [ 4K] r-x/rwx SM=COW /usr/lib/libgcc_s.1.dylib
__IMPORT a0a48000-a0a4a000 [ 8K] r-x/rwx SM=COW /usr/lib/libSystem.B.dylib
STACK GUARD bc000000-bf800000 [ 56.0M] ---/rwx SM=NUL

==== Writable regions for process 17706
__DATA 00014000-00016000 [ 8K] rw-/rwx SM=COW /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug3
__DATA 00016000-0003b000 [ 148K] rw-/rwx SM=PRV /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug3
__IMPORT 0003b000-0003c000 [ 4K] rwx/rwx SM=COW /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug3
MALLOC (freed?) 00046000-00047000 [ 4K] rw-/rwx SM=PRV
MALLOC_LARGE 00048000-000e7000 [ 636K] rw-/rwx SM=PRV DefaultMallocZone_0x100000
MALLOC_TINY 00100000-00200000 [ 1024K] rw-/rwx SM=PRV DefaultMallocZone_0x100000
MALLOC_SMALL 00800000-01000000 [ 8192K] rw-/rwx SM=PRV DefaultMallocZone_0x100000
__DATA 8fe2e000-8fe31000 [ 12K] rw-/rwx SM=COW /usr/lib/dyld
__DATA 8fe31000-8fe67000 [ 216K] rw-/rwx SM=ZER /usr/lib/dyld
__DATA a021e000-a021f000 [ 4K] rw-/rw- SM=COW /usr/lib/libgcc_s.1.dylib
shared pmap a0400000-a04dd000 [ 884K] rw-/rwx SM=COW
__DATA a04dd000-a04de000 [ 4K] rw-/rwx SM=COW /usr/lib/system/libmathCommon.A.dylib
__DATA a04de000-a051d000 [ 252K] rw-/rwx SM=COW /usr/lib/libSystem.B.dylib
shared pmap a051d000-a0600000 [ 908K] rw-/rwx SM=COW
Stack bf800000-bfffd000 [ 8180K] rw-/rwx SM=ZER thread 0
Stack bfffd000-bfffe000 [ 4K] rw-/rwx SM=PRV
Stack bfffe000-bffff000 [ 4K] rw-/rwx SM=ZER
Stack bffff000-c0000000 [ 4K] rw-/rwx SM=COW

==== Legend
SM=sharing mode:
COW=copy_on_write PRV=private NUL=empty ALI=aliased
SHM=shared ZER=zero_filled S/A=shared_alias

==== Summary for process 17706
ReadOnly portion of Libraries: Total=5936K resident=5168K(87%) swapped_out_or_unallocated=768K(13%)
Writable regions: Total=18.0M written=88K(0%) resident=8296K(45%) swapped_out=0K(0%) unallocated=9.9M(55%)

REGION TYPE [ VIRTUAL]
=========== [ =======]
MALLOC [ 9856K]
STACK GUARD [ 56.0M]
Stack [ 8192K]
__DATA [ 644K]
__IMPORT [ 16K]
__LINKEDIT [ 4184K]
__PAGEZERO [ 4K]
__TEXT [ 1752K]
shared pmap [ 1792K]
---- end vmmap of "./bug3 segv" -----------------

Then the following is the output of gdb session running "./bug3 overflow". This session stops from receiving Stack_overflow exception.

---- begin gdb interaction with "./bug3 overflow" -----------------
dasher:bug$ gdb ./bug3
GNU gdb 6.3.50-20050815 (Apple version gdb-966) (Tue Mar 10 02:43:13 UTC 2009)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin"...Reading symbols for shared libraries ... done

(gdb) handle SIGSEGV stop nopass
Signal Stop Print Pass to program Description
SIGSEGV Yes Yes No Segmentation fault
(gdb) r overflow
Starting program: /Users/wakita/Dropbox/work/ocaml/bamodel/set/bug/bug3 overflow
Reading symbols for shared libraries ++. done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xbf7ffffc
0x00001b9f in camlBug3__code_begin ()
(gdb) info reg
eax 0x3 3
ecx 0x68e94 429716
edx 0x68ea0 429728
ebx 0x1 1
esp 0xbf800000 0xbf800000
ebp 0xbffff498 0xbffff498
esi 0x3 3
edi 0x0 0
eip 0x1b9f 0x1b9f <camlBug3__code_begin+63>
eflags 0x10246 66118
cs 0x17 23
ss 0x1f 31
ds 0x1f 31
es 0x1f 31
fs 0x0 0
gs 0x37 55
(gdb) p/x system_stack_top
$1 = Value can't be converted to integer.
---- end gdb interaction with "./bug3 overflow" -----------------

The vmmap output for this session is exactly the same as the previous output, except for their process IDs.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 29, 2009

Comment author: @mshinwell

Can you post the executable for bug3 itself? I assume it isn't too big, especially if compressed.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 29, 2009

Comment author: kwakita

Yes, bug3.gz is the gzip'ed native code for bug3.ml.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 30, 2009

Comment author: @mshinwell

I've reproduced the problem; give me a few days...

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jul 30, 2009

Comment author: kwakita

Thank you very much for your time and enormous effort!

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Aug 10, 2009

Comment author: @mshinwell

I realized over the weekend what causes this. The difference in behaviour is because the faulting instruction lies in Caml compiler-generated code in one case, but in the runtime (C-compiler generated) code in another. The segfault will only be translated into a Stack_overflow exception in the former case. It isn't clear to me how this behaviour can be improved, since if for example the fault happened in the middle of the GC, it might not be safe to raise an exception and continue running Caml code. Xavier? :)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Sep 16, 2009

Comment author: @xavierleroy

Mark Shinwell's analysis is correct. We can catch SEGV arising from stack overflows in Caml code reasonably well, but we cannot recover from a SEGV arising in the middle of C code. I'm afraid this is a "cannot fix" situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.