Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrupted final_table #6919

Closed
vicuna opened this issue Jun 27, 2015 · 4 comments

Comments

@vicuna
Copy link

commented Jun 27, 2015

Original bug ID: 6919
Reporter: @ygrek
Status: closed (set by @damiendoligez on 2015-07-10T14:09:54Z)
Resolution: fixed
Priority: urgent
Severity: crash
Version: 4.02.2
Target version: 4.02.3+dev
Fixed in version: 4.02.3+dev
Category: runtime system and C interface
Tags: patch
Monitored by: @ygrek @dbuenzli @yakobowski

Bug description

We are experiencing strange crashes in Gc after switching from 4.02.1 to 4.02.2 but I don't have a small repro case for now (and as such cannot exclude misbehaving C bindings etc but the code is stable with 4.02.1), maybe you have a quick idea based on symptoms.
My investigation led me to the following changeset :

444d6c2#diff-ff9cb580dcca5bf97a4e407aba803b81R260

AFAIU it changes behaviour in the way that final_table offsets are now not updated after every minor collection, I do not know whether
it is an important invariant.

Here are the details of my issue if of any use :

It crashes when calling functions from final_table, in my case it is a Gc alarm registered by ocamlnet, but that alarm
just sets one mutable variable, so it is not a suspect.

At the start of program final_table looks alright with one entry like this :

(gdb) ml_dump/r final_table 4
*0x1c3c300: Closure( camlGc__call_alarm_1056 , 0x3 )
*0x1c3c308: ( ( 1 ) , Closure( camlNetsys_win32__fun_2219 , 0x3 ) )
*0x1c3c310: NULL

*0x1c3c318: NULL

but at crash time it is obviously wrong :

(gdb) ml_dump final_table 4
*0x227e300: Closure( camlGc__call_alarm_1056 , 0x3 )
*0x227e308: u'Private_Dirty: 12 kB'
*0x227e310: NULL

*0x227e318: NULL

instead of "Private_Dirty" string it can be any ocaml value.

Stack trace looks like this :

(gdb) bt
#0 0x00000000005d8467 in camlGc__call_alarm_1056 () at gc.ml:87
#1 0x00000000006633ba in caml_start_program ()
#2 0x000000000065f3db in caml_gc_compaction ()
#3 0x00000000004acb71 in camlMemory__reclaim_s_1540 () at memory.ml:77
#4 0x00000000004acdc5 in camlMemory__reclaim_1555 () at memory.ml:92

When run with debug runtime it fails on assert on line 163 in byterun/finalize.c

void caml_final_do_strong_roots (scanning_action f)
{
uintnat i;
struct to_do *todo;

Assert (old == young);

I would be very much grateful for any pointers how to debug this or provide more info..

Steps to reproduce

None for now, but I can reproduce it locally in less than 5 minutes.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 27, 2015

Comment author: @ygrek

This patch seems to fix it for me

diff --git a/byterun/minor_gc.c b/byterun/minor_gc.c
index 4aaec96..4db3f33 100644
--- a/byterun/minor_gc.c
+++ b/byterun/minor_gc.c
@@ -260,6 +260,10 @@ void caml_empty_minor_heap (void)
caml_final_empty_young ();
if (caml_minor_gc_end_hook != NULL) (*caml_minor_gc_end_hook) ();
}

  • else
  • {
  • caml_final_empty_young ();
  • }
    #ifdef DEBUG
    {
    value *p;
@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 29, 2015

Comment author: @damiendoligez

Thanks for the report. I think you've nailed it, so you shouldn't spend time on a repro case.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 1, 2015

Comment author: @edwintorok

FWIW I just ran into this (with various symptoms: application crashing in pthread_cancel unwinder on exit, segfault after fork when Lwt is built with libev but not when built without, or segfault after fork when using OpenSSL from Lwt even without libev): ocsigen/lwt#168

I've created the testcase below before finding this bug (indeed from OCamlnet's Netsys_pollset_win32.ml), and I confirm that the patch fixes both the testcase and the segfaults in my application:

let x = ref false
let _ = Gc.create_alarm (fun () -> x := true)
let () =
Gc.compact ();
Gc.compact ()

(* ocamlc x.ml -runtime-variant d -o x && ./x
...
file finalise.c; line 163 ### Assertion failed: old == young *)

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 10, 2015

Comment author: @damiendoligez

Thanks for the report, the fix, and the test case.

Fixed in 4.02 branch (rev 16197).

@vicuna vicuna closed this Jul 10, 2015

@vicuna vicuna added this to the 4.02.3 milestone Mar 14, 2019

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.