Skip to content

corrupted final_table #6919

@vicuna

Description

@vicuna

Original bug ID: 6919
Reporter: @ygrek
Status: closed (set by @damiendoligez on 2015-07-10T14:09:54Z)
Resolution: fixed
Priority: urgent
Severity: crash
Version: 4.02.2
Target version: 4.02.3+dev
Fixed in version: 4.02.3+dev
Category: runtime system and C interface
Tags: patch
Monitored by: @ygrek @dbuenzli @yakobowski

Bug description

We are experiencing strange crashes in Gc after switching from 4.02.1 to 4.02.2 but I don't have a small repro case for now (and as such cannot exclude misbehaving C bindings etc but the code is stable with 4.02.1), maybe you have a quick idea based on symptoms.
My investigation led me to the following changeset :

444d6c2#diff-ff9cb580dcca5bf97a4e407aba803b81R260

AFAIU it changes behaviour in the way that final_table offsets are now not updated after every minor collection, I do not know whether
it is an important invariant.

Here are the details of my issue if of any use :

It crashes when calling functions from final_table, in my case it is a Gc alarm registered by ocamlnet, but that alarm
just sets one mutable variable, so it is not a suspect.

At the start of program final_table looks alright with one entry like this :

(gdb) ml_dump/r final_table 4
*0x1c3c300: Closure( camlGc__call_alarm_1056 , 0x3 )
*0x1c3c308: ( ( 1 ) , Closure( camlNetsys_win32__fun_2219 , 0x3 ) )
*0x1c3c310: NULL

*0x1c3c318: NULL

but at crash time it is obviously wrong :

(gdb) ml_dump final_table 4
*0x227e300: Closure( camlGc__call_alarm_1056 , 0x3 )
*0x227e308: u'Private_Dirty: 12 kB'
*0x227e310: NULL

*0x227e318: NULL

instead of "Private_Dirty" string it can be any ocaml value.

Stack trace looks like this :

(gdb) bt
#0 0x00000000005d8467 in camlGc__call_alarm_1056 () at gc.ml:87
#1 0x00000000006633ba in caml_start_program ()
#2 0x000000000065f3db in caml_gc_compaction ()
#3 0x00000000004acb71 in camlMemory__reclaim_s_1540 () at memory.ml:77
#4 0x00000000004acdc5 in camlMemory__reclaim_1555 () at memory.ml:92

When run with debug runtime it fails on assert on line 163 in byterun/finalize.c

void caml_final_do_strong_roots (scanning_action f)
{
uintnat i;
struct to_do *todo;

Assert (old == young);

I would be very much grateful for any pointers how to debug this or provide more info..

Steps to reproduce

None for now, but I can reproduce it locally in less than 5 minutes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions