Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coq (coqtop.byte) hangs when run from ocamldebug #7609

Closed
vicuna opened this issue Aug 18, 2017 · 7 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Aug 18, 2017

Original bug ID: 7609
Reporter: @psteckler
Assigned to: @xavierleroy
Status: resolved (set by @xavierleroy on 2017-09-23T16:12:24Z)
Resolution: fixed
Priority: normal
Severity: major
Platform: PC
OS: Linux Mint x64 (Cinnamon)
OS Version: 18
Version: 4.04.0
Target version: 4.06.0 +dev/beta1/beta2/rc1
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: tools (ocaml{lex,yacc,dep,debug,...})

Bug description

A recent change in the Coq sources results in coqtop.byte to hang when run from ocamldebug. The startup banner appears, then ... nothing. For me, this behavior happens 100% of the time. Running coqtop.byte from the command line works fine.

The Coq sources are (of course) at:

https://github.com/coq/coq

A recent commit that shows the behavior is 16b0b83. A git bisect reveals that the behavior started with commit c3ca30ca41.

I provided a (yet-unmerged) PR to restore the old behavior at coq/coq#977.

All that the PR code does is run a Gc.minor and Gc.allocated_bytes at the toplevel of an OCaml module. Why that makes a difference isn't clear.

Maybe this is a bug in Coq (perhaps some C code is corrupting the heap in a way that a collection masks), but maybe it's something about how ocamldebug runs the byte code.

Perhaps the OCaml team has some insight.

Steps to reproduce

On Linux x64, clone the Coq repo (probably the bug shows up using the tip, but you could reset to the commit mentioned above).

In the coq directory:

./configure -local
make -j && make -j byte
ocamldebug ./bin/coqtop.byte -coqlib pwd

If the bug is present, you'll see a startup banner, and the prompt will not appear.

@vicuna

This comment has been minimized.

Copy link
Author

commented Aug 18, 2017

Comment author: @psteckler

I tried with OCaml 4.05.0 from OPAM, and I get the same hanging behavior.

Note: In my shell, I have "OCAMLRUNPARAM=b". After filing this bug report, I discovered that if I unset this environment variable, the hanging problem disappears.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 20, 2017

Comment author: @xavierleroy

I was able to reproduce the hanging behavior with 4.05.0. ocamlrun loops in caml_flush because the channel is ill-formed and has channel->curr < channel->buff, hence caml_flush_partial makes no progress and returns false. Smells like memory corruption. However the issue is very hard to reproduce with ocamlrund running under gdb.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 20, 2017

Comment author: @psteckler

Are there any experiments we can run to learn where the heap becomes corrupted?

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 20, 2017

Comment author: @xavierleroy

I was able to reproduce with ocamlrund under gdb. It's a use-after-free of the "dbg_out" channel, where the free() is caused by finalization after Pervasives.flush_all is called. I'll come up with a fix in a few days.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 21, 2017

Comment author: @xavierleroy

The full scenario for the memory corruption:

  • Debugged program starts, opens an out channel "dbg_out" to send data to ocamldebug via the debugging socket.
  • Channel is created by caml_open_descriptor_in with refcount = 0 and is inserted in the doubly-linked list caml_all_opened_channels.
  • Later, debugged program calls Pervasives.flush_all, which calls caml_ml_out_channels_list.
  • caml_ml_out_channels_list wraps dbg_out in a Caml custom object of type "out_channel". dbg_out->refcount is increased to 1.
  • Later, a GC finalizes this Caml custom object. dbg_out->refcount is decreased to 0 and the dbg_out structure is freed via caml_stat_free().
  • Subsequent interactions with ocamldebug access the now-freed dbg_out structure. Disaster ensues.
@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 21, 2017

Comment author: @xavierleroy

Fix is on its way: #1361

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 23, 2017

Comment author: @xavierleroy

Fixed in the forthcoming 4.06 release and in trunk.

@vicuna vicuna closed this Sep 23, 2017

@vicuna vicuna added the tools label Mar 14, 2019

@vicuna vicuna added this to the 4.06.0 milestone Mar 14, 2019

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.