New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC latency improvements #297

Merged
merged 13 commits into from Dec 21, 2015

Conversation

Projects
None yet
@damiendoligez
Member

damiendoligez commented Nov 17, 2015

This patch introduces several changes to the GC, mostly aimed at reducing worst-case latency:

  1. When a GC cycle starts, the global roots must be marked. This is now done incrementally. (asmrun/roots.c -- native-code only because byte-code doesn't have this problem, it has only one global root).
  2. The minor GC and major GC slice are separated and the major slice is run at the mid-point between minor collections (i.e. when the minor heap is half-full).
  3. Major GC slice size is now (under user control) amortized over several minor collections instead of varying directly with the promotion rate.
  4. Large arrays are now marked incrementally instead of blocking the program for a long time.
  5. Allocation of major heap chunks is (optionally) done with mmap using the HUGE_PAGES feature to lower the TLB load on large heap sizes.

minor improvements:

  • better control of GC verbosity in debug runtime
  • added a runtime variant with GC performance instrumentation
  • missing include in instrtrace.c
  • add an inline function for adding an entry to a ref_table
  • add functions to stdlib modules Gc and Sys to control the new features
  • better display of GC stats
@kit-ty-kate

This comment has been minimized.

Show comment
Hide comment
@kit-ty-kate

kit-ty-kate Nov 17, 2015

Have you tried to squash all the commits, stash the resulting commit, pull from trunk and apply the stashed changes ? Maybe it can work.

kit-ty-kate commented Nov 17, 2015

Have you tried to squash all the commits, stash the resulting commit, pull from trunk and apply the stashed changes ? Maybe it can work.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Nov 19, 2015

Member

We discussed this large change at the developer meeting yesterday, and decided to include it for 4.03 -- the latency improvement are rather impressive. Below is a quick summary of the salient points of the discussion.

The flambda people in the room ( @chambart and @mshinwell ) seemed nervous at the mention that the work on incremental marking of GC roots relied on their immutability, so they may want to check the details with Damien.

One thing that was forgotten in the above description is the incremental marking of large arrays.

One other thing that is unclear in the above description is the status of the mmap tricks. My understanding is that they are disabled by default, and can be enabled (but is it a configure-time flag or a program-runtime flag?).

There was discussion of the interaction with @mshinwell 's more ambitious no-naked-pointers work in #203 . One idea mentioned by Damien was to use the mmap trick in addition to no-naked-pointers, to serve as a "debug mode" making it easy to check that the naked accesses are indeed in the heap (but maybe just keeping the pagetable around for this debug mode would suffice?).

Member

gasche commented Nov 19, 2015

We discussed this large change at the developer meeting yesterday, and decided to include it for 4.03 -- the latency improvement are rather impressive. Below is a quick summary of the salient points of the discussion.

The flambda people in the room ( @chambart and @mshinwell ) seemed nervous at the mention that the work on incremental marking of GC roots relied on their immutability, so they may want to check the details with Damien.

One thing that was forgotten in the above description is the incremental marking of large arrays.

One other thing that is unclear in the above description is the status of the mmap tricks. My understanding is that they are disabled by default, and can be enabled (but is it a configure-time flag or a program-runtime flag?).

There was discussion of the interaction with @mshinwell 's more ambitious no-naked-pointers work in #203 . One idea mentioned by Damien was to use the mmap trick in addition to no-naked-pointers, to serve as a "debug mode" making it easy to check that the naked accesses are indeed in the heap (but maybe just keeping the pagetable around for this debug mode would suffice?).

@gasche gasche referenced this pull request Nov 19, 2015

Merged

Add Ephemerons to OCaml #22

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Nov 23, 2015

Member

I have updated the patch to turn it into a single commit, and to remove the mmap code. (BTW the mmap option was neither configure-time nor run-time, but compile-time). The consensus was that no-naked-pointers is the right thing to do, even if the migration path is more dangerous.

Indeed, the mmap trick is not needed to get a debug mode for no-naked-pointers.

I have updated the description above to reflect the current state of the patch (and add the missing bit about large arrays).

Member

damiendoligez commented Nov 23, 2015

I have updated the patch to turn it into a single commit, and to remove the mmap code. (BTW the mmap option was neither configure-time nor run-time, but compile-time). The consensus was that no-naked-pointers is the right thing to do, even if the migration path is more dangerous.

Indeed, the mmap trick is not needed to get a debug mode for no-naked-pointers.

I have updated the description above to reflect the current state of the patch (and add the missing bit about large arrays).

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Nov 23, 2015

Member

@bobot (and everyone else who would volunteer)
This is now ready for review.

Member

damiendoligez commented Nov 23, 2015

@bobot (and everyone else who would volunteer)
This is now ready for review.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Nov 23, 2015

Contributor

@gasche Nitpick on no-naked-pointers: despite the name (which should maybe be fixed), pointers outside the heap are permitted. It's just that they need to point at something that looks like an OCaml value.

Contributor

mshinwell commented Nov 23, 2015

@gasche Nitpick on no-naked-pointers: despite the name (which should maybe be fixed), pointers outside the heap are permitted. It's just that they need to point at something that looks like an OCaml value.

@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Nov 23, 2015

Contributor

It should be a black value, no?

Contributor

bobot commented Nov 23, 2015

It should be a black value, no?

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Nov 24, 2015

Member

The complete no-naked-pointers rule is that every pointer to outside the heap must be at least one of the following:

  • odd (i.e. look like an int)
  • stored in a closure
  • to something that has a black header (i.e. the word before the pointed thing must be readable and have the right values in its two color bits)
Member

damiendoligez commented Nov 24, 2015

The complete no-naked-pointers rule is that every pointer to outside the heap must be at least one of the following:

  • odd (i.e. look like an int)
  • stored in a closure
  • to something that has a black header (i.e. the word before the pointed thing must be readable and have the right values in its two color bits)
@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Nov 25, 2015

Contributor

As I have mentioned to @damiendoligez in private email, I suspect there is a bug in this patch that causes it to fail when merged with the flambda branch. It's probably to do with the multiple roots thing (GPR #262).

Contributor

mshinwell commented Nov 25, 2015

As I have mentioned to @damiendoligez in private email, I suspect there is a bug in this patch that causes it to fail when merged with the flambda branch. It's probably to do with the multiple roots thing (GPR #262).

Show outdated Hide outdated byterun/caml/minor_gc.h
@@ -38,6 +38,7 @@ type control = {
mutable max_overhead : int;
mutable stack_limit : int;
mutable allocation_policy : int;
window_size : int;

This comment has been minimized.

@bobot

bobot Nov 30, 2015

Contributor

Why this one is not mutable as the others?

@bobot

bobot Nov 30, 2015

Contributor

Why this one is not mutable as the others?

This comment has been minimized.

@damiendoligez

damiendoligez Dec 2, 2015

Member

The others are mutable because when this record was introduced, the record update feature { r with lbl = e } didn't exist. You would get, mutate the result, and use it to set. Now you simply set {(get()) with window_size = 5}.

If I could add a "deprecated" annotation on these mutable flags, I would. Since it is deprecated, I see no reason to duplicate it in new fields. I can change it if you think it's too weird.

@damiendoligez

damiendoligez Dec 2, 2015

Member

The others are mutable because when this record was introduced, the record update feature { r with lbl = e } didn't exist. You would get, mutate the result, and use it to set. Now you simply set {(get()) with window_size = 5}.

If I could add a "deprecated" annotation on these mutable flags, I would. Since it is deprecated, I see no reason to duplicate it in new fields. I can change it if you think it's too weird.

This comment has been minimized.

@alainfrisch

alainfrisch Dec 2, 2015

Contributor

If I could add a "deprecated" annotation on these mutable flags

This is tempting! That would probably be an annotation on the field (e.g. [@deprecated_mutable]).

@alainfrisch

alainfrisch Dec 2, 2015

Contributor

If I could add a "deprecated" annotation on these mutable flags

This is tempting! That would probably be an annotation on the field (e.g. [@deprecated_mutable]).

This comment has been minimized.

@alainfrisch

alainfrisch Dec 2, 2015

Contributor

Done! One can now write:

type control = {
   mutable minor_heap_size : int
       [@@mutable_deprecated "Use {ctrl with minor_heap_size = ...} instead."];
   ...

I didn't actually do the change in Gc for now.

@alainfrisch

alainfrisch Dec 2, 2015

Contributor

Done! One can now write:

type control = {
   mutable minor_heap_size : int
       [@@mutable_deprecated "Use {ctrl with minor_heap_size = ...} instead."];
   ...

I didn't actually do the change in Gc for now.

("a=%d,b=%s,H=%lu,i=%lu,l=%lu,o=%lu,O=%lu,p=%d,s=%lu,t=%d,v=%lu,w=%d,W=%lu",
/* a */ caml_allocation_policy,
/* b */ caml_backtrace_active,
/* h */ /* missing */ /* FIXME add when changed to min_heap_size */

This comment has been minimized.

@bobot

bobot Nov 30, 2015

Contributor

Is this FIXME still true?

@bobot

bobot Nov 30, 2015

Contributor

Is this FIXME still true?

This comment has been minimized.

@damiendoligez

damiendoligez Dec 2, 2015

Member

Yes. It's another GC feature that I'll introduce later.

@damiendoligez

damiendoligez Dec 2, 2015

Member

Yes. It's another GC feature that I'll introduce later.

@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Dec 1, 2015

Contributor

I'm adding comments during the review that you could perhaps take if they are interesting. You can find them in my fork branch trunk-gc-patches: eg bobot/ocaml@2023b60

Contributor

bobot commented Dec 1, 2015

I'm adding comments during the review that you could perhaps take if they are interesting. You can find them in my fork branch trunk-gc-patches: eg bobot/ocaml@2023b60

@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Dec 1, 2015

Contributor

The travis build failed https://travis-ci.org/bobot/ocaml/builds/94135050, perhaps because HAS_HUGE_PAGES is activated there.

Contributor

bobot commented Dec 1, 2015

The travis build failed https://travis-ci.org/bobot/ocaml/builds/94135050, perhaps because HAS_HUGE_PAGES is activated there.

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Dec 4, 2015

Member

I've made changes according to the remarks of the reviewers, and fixed a nasty bug in the root scanning (triggered by testing with flambda).

Following a discussion with @xavierleroy I'm going to remove the change to the protocol for calling the GC.

Member

damiendoligez commented Dec 4, 2015

I've made changes according to the remarks of the reviewers, and fixed a nasty bug in the root scanning (triggered by testing with flambda).

Following a discussion with @xavierleroy I'm going to remove the change to the protocol for calling the GC.

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Dec 4, 2015

Member

@bobot could you please review this version with particular attention to asmrun/roots.c? It's quite tricky.

Member

damiendoligez commented Dec 4, 2015

@bobot could you please review this version with particular attention to asmrun/roots.c? It's quite tricky.

Show outdated Hide outdated asmrun/roots.c
@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Dec 10, 2015

Contributor

I tried to review this code in asmrun/roots.c and it is indeed very tricky. Instead of three for loops and three complicated while loops could we use non-natural for loops that feel a lot more natural:
df2d87c

intnat caml_darken_all_roots_slice (intnat work)
{
  mlsize_t i, j;
  value * glob;
  intnat work_done = 0;
  CAML_INSTR_SETUP (tmr, "");

  if(incr_roots_to_resume){
    goto resume;
  }
  /* otherwise we just start darkening */
  incr_roots_count = 0;

  /* The global roots */
  for (i = 0; caml_globals[i] != 0; i++) {
    for(glob = caml_globals[i]; *glob != 0; glob++) {
      for (j = 0; j < Wosize_val(*glob); j++){
        caml_darken (Field (*glob, j), &Field (*glob, j));
        ++work_done;
        if(work <= work_done){
          /** save the state of the loop */
          incr_roots_i = i;
          incr_roots_glob = glob;
          incr_roots_j = j;
          incr_roots_count += work_done;
          incr_roots_to_resume = 1;
          goto finished;
        resume:
          i = incr_roots_i;
          glob = incr_roots_glob;
          j = incr_roots_j;
        }
      }
    }
  }

  /** We finished before the maximum amount of work is reached, so the
      darkening of all roots is done */
  caml_incremental_roots_count = incr_roots_count + work_done;
  incr_roots_to_resume = 0; /** not needed since set during start but clearer */

finished:
  /** In every case */
  CAML_INSTR_TIME (tmr, "major/mark/global_roots_slice");
  return work_done;
}

Do you have a test for the fix in def7f4d ?

EDITED: There was a bug because I assigned -1 to a mlsize_t. I don't understand why caml_do_roots define int i,j; instead of mlsize_t i,j;. Now all the tests pass except the same than without this commit (tests/warnings/deprecated_module*.ml)

Contributor

bobot commented Dec 10, 2015

I tried to review this code in asmrun/roots.c and it is indeed very tricky. Instead of three for loops and three complicated while loops could we use non-natural for loops that feel a lot more natural:
df2d87c

intnat caml_darken_all_roots_slice (intnat work)
{
  mlsize_t i, j;
  value * glob;
  intnat work_done = 0;
  CAML_INSTR_SETUP (tmr, "");

  if(incr_roots_to_resume){
    goto resume;
  }
  /* otherwise we just start darkening */
  incr_roots_count = 0;

  /* The global roots */
  for (i = 0; caml_globals[i] != 0; i++) {
    for(glob = caml_globals[i]; *glob != 0; glob++) {
      for (j = 0; j < Wosize_val(*glob); j++){
        caml_darken (Field (*glob, j), &Field (*glob, j));
        ++work_done;
        if(work <= work_done){
          /** save the state of the loop */
          incr_roots_i = i;
          incr_roots_glob = glob;
          incr_roots_j = j;
          incr_roots_count += work_done;
          incr_roots_to_resume = 1;
          goto finished;
        resume:
          i = incr_roots_i;
          glob = incr_roots_glob;
          j = incr_roots_j;
        }
      }
    }
  }

  /** We finished before the maximum amount of work is reached, so the
      darkening of all roots is done */
  caml_incremental_roots_count = incr_roots_count + work_done;
  incr_roots_to_resume = 0; /** not needed since set during start but clearer */

finished:
  /** In every case */
  CAML_INSTR_TIME (tmr, "major/mark/global_roots_slice");
  return work_done;
}

Do you have a test for the fix in def7f4d ?

EDITED: There was a bug because I assigned -1 to a mlsize_t. I don't understand why caml_do_roots define int i,j; instead of mlsize_t i,j;. Now all the tests pass except the same than without this commit (tests/warnings/deprecated_module*.ml)

Show outdated Hide outdated byterun/compact.c
@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Dec 10, 2015

Contributor

For asmrun/roots.c I'm confident in the simplified version in my fork branch trunk-gc-patches.

Contributor

bobot commented Dec 10, 2015

For asmrun/roots.c I'm confident in the simplified version in my fork branch trunk-gc-patches.

@damiendoligez damiendoligez self-assigned this Dec 15, 2015

@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Dec 17, 2015

Contributor

I think the remaining part of the merge request is fine.

Contributor

bobot commented Dec 17, 2015

I think the remaining part of the merge request is fine.

@damiendoligez damiendoligez added this to the 4.03.0 milestone Dec 17, 2015

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Dec 17, 2015

Member

Do you have a test for the fix in def7f4d ?

It's a complete install of the compiler with flambda merged in... not really usable as a regression test.

Member

damiendoligez commented Dec 17, 2015

Do you have a test for the fix in def7f4d ?

It's a complete install of the compiler with flambda merged in... not really usable as a regression test.

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Dec 17, 2015

Member

Updated with @bobot's suggestions and fixes. If anyone else wants to review, now's the time.

Member

damiendoligez commented Dec 17, 2015

Updated with @bobot's suggestions and fixes. If anyone else wants to review, now's the time.

damiendoligez added a commit that referenced this pull request Dec 21, 2015

@damiendoligez damiendoligez merged commit 423ddc8 into ocaml:trunk Dec 21, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Dec 31, 2015

Contributor

@damiendoligez At posteriori, I'm wondering why you decrement work by Whsize_wosize(size); instead of by 1. Marking this No_scan_tag does not require any work, and so if you have many No_scan_tag the incremental marking will be a lot faster than normal. So that improve latency but not regularity. I though work could be seen at some measure of very rough cpu cycle, do you prefer it to be a measure of the quantity of memory handled?

Contributor

bobot commented on byterun/major_gc.c in 0225ca0 Dec 31, 2015

@damiendoligez At posteriori, I'm wondering why you decrement work by Whsize_wosize(size); instead of by 1. Marking this No_scan_tag does not require any work, and so if you have many No_scan_tag the incremental marking will be a lot faster than normal. So that improve latency but not regularity. I though work could be seen at some measure of very rough cpu cycle, do you prefer it to be a measure of the quantity of memory handled?

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Mar 21, 2016

Member

The idea is that it has to be a measure of memory, because otherwise we don't know the total amount of work to do.

Member

damiendoligez replied Mar 21, 2016

The idea is that it has to be a measure of memory, because otherwise we don't know the total amount of work to do.

@mrvn

This comment has been minimized.

Show comment
Hide comment
@mrvn

mrvn Mar 24, 2016

Contributor
  1. If no naked pointer are allowed then where is the helper function to allocate a chunk of memory outside the ocaml heap with a suitable black colored header? Something like

CAMLextern void * caml_alloc_unmovable(mlsize_t); /* allocate a block of memory outside the ocaml heap suitable for naked C/C++ objects /
CAMLextern void caml_free_unmovale(void *); /
free block allocated by caml_alloc_unmovable */

  1. I frequently have C objects that also need to hold some values. At the moment one has to allocate an ocaml block to hold the values and malloc for the C object. Without naked pointers now one further needs a second ocaml block with No_scan_tag (or the alloc from 1). Instead couldn't we have a single block outside the ocaml heap containing the values at the start and the C structure at the end where the Gc scans only the begining?

/* allocate a block of memory outside the ocaml heap with mixed content

  • Allocate a block with tag containing ocaml values that
  • will be scanned by the Gc if reachable, followed by <no_scan> bytes of
  • of memory that will be ignored by the Gc. The block will never be moved
  • by the Gc and the no_scan portion can be accessed without holding the
  • runtime lock. The values portion must only be accessed while holding the
  • runtime lock.
    */
    CAMLextern void * caml_alloc_unmovable(mlsize_t values, tag_t tag, mlsize_t no_scan);
  1. Alternatively maybe the 'struct custom_operations' could be extended to include a field saying how many words to scan in the custom block. That way one could allocate single blocks containing both ocaml values and raw data and still have them movable by the Gc.
Contributor

mrvn commented Mar 24, 2016

  1. If no naked pointer are allowed then where is the helper function to allocate a chunk of memory outside the ocaml heap with a suitable black colored header? Something like

CAMLextern void * caml_alloc_unmovable(mlsize_t); /* allocate a block of memory outside the ocaml heap suitable for naked C/C++ objects /
CAMLextern void caml_free_unmovale(void *); /
free block allocated by caml_alloc_unmovable */

  1. I frequently have C objects that also need to hold some values. At the moment one has to allocate an ocaml block to hold the values and malloc for the C object. Without naked pointers now one further needs a second ocaml block with No_scan_tag (or the alloc from 1). Instead couldn't we have a single block outside the ocaml heap containing the values at the start and the C structure at the end where the Gc scans only the begining?

/* allocate a block of memory outside the ocaml heap with mixed content

  • Allocate a block with tag containing ocaml values that
  • will be scanned by the Gc if reachable, followed by <no_scan> bytes of
  • of memory that will be ignored by the Gc. The block will never be moved
  • by the Gc and the no_scan portion can be accessed without holding the
  • runtime lock. The values portion must only be accessed while holding the
  • runtime lock.
    */
    CAMLextern void * caml_alloc_unmovable(mlsize_t values, tag_t tag, mlsize_t no_scan);
  1. Alternatively maybe the 'struct custom_operations' could be extended to include a field saying how many words to scan in the custom block. That way one could allocate single blocks containing both ocaml values and raw data and still have them movable by the Gc.
@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Mar 24, 2016

Member

I think (1) is a good idea. cc @mshinwell

For (2): if it's outside the heap and points into the heap, then it has to be a root. It would probably not reasonable to have too many such blocks. We don't currently have a data structure to register blocks of roots, so this option would need big changes to globroots.c.

(3) looks like a good idea, but we have to be careful to remain compatible with C code that uses the current version.

Member

damiendoligez commented Mar 24, 2016

I think (1) is a good idea. cc @mshinwell

For (2): if it's outside the heap and points into the heap, then it has to be a root. It would probably not reasonable to have too many such blocks. We don't currently have a data structure to register blocks of roots, so this option would need big changes to globroots.c.

(3) looks like a good idea, but we have to be careful to remain compatible with C code that uses the current version.

@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Mar 24, 2016

Contributor

An alternative version of 3) is that a custom_operations contains a function pointer that will iterate on the value in the custom block. It is more general and could allow lots more other uses.

Contributor

bobot commented Mar 24, 2016

An alternative version of 3) is that a custom_operations contains a function pointer that will iterate on the value in the custom block. It is more general and could allow lots more other uses.

@let-def

This comment has been minimized.

Show comment
Hide comment
@let-def

let-def Mar 24, 2016

Contributor

Hijacking a bit the conversation, but related to C-FFI and GC:

When dealing with C-FFI needing a back-reference to OCaml values, I felt that a special area in the OCaml heap that is guaranteed to be non-compacted (and as such never moved) would help a lot.
This area would behave exactly like the major heap (for collection), but without compaction. Values would be allocated via a specific function.

The recent PR 7162 reminded me of this. Exchanging the idea with @garrigue, he thought this could globally benefit LablGTK (not having to disable compaction for the whole program).
@yminsky also suggested this could enable working on some OCaml values from C-FFI without locking the runtime (e.g. having C-like arrays / bigstrings in the heap).

@damiendoligez do you think this is reasonable?

Contributor

let-def commented Mar 24, 2016

Hijacking a bit the conversation, but related to C-FFI and GC:

When dealing with C-FFI needing a back-reference to OCaml values, I felt that a special area in the OCaml heap that is guaranteed to be non-compacted (and as such never moved) would help a lot.
This area would behave exactly like the major heap (for collection), but without compaction. Values would be allocated via a specific function.

The recent PR 7162 reminded me of this. Exchanging the idea with @garrigue, he thought this could globally benefit LablGTK (not having to disable compaction for the whole program).
@yminsky also suggested this could enable working on some OCaml values from C-FFI without locking the runtime (e.g. having C-like arrays / bigstrings in the heap).

@damiendoligez do you think this is reasonable?

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Mar 25, 2016

Member

@bobot I've thought of that but won't it be too slow?

@def-lkb It looks doable although it'll need a fair amount of work. Could you open a feature request on Mantis?

Member

damiendoligez commented Mar 25, 2016

@bobot I've thought of that but won't it be too slow?

@def-lkb It looks doable although it'll need a fair amount of work. Could you open a feature request on Mantis?

@jhjourdan

This comment has been minimized.

Show comment
Hide comment
@jhjourdan

jhjourdan Mar 25, 2016

Contributor

@damiendoligez Instead of a general function, we could have a bitset describing which words are pointers and which are raw data.

Contributor

jhjourdan commented Mar 25, 2016

@damiendoligez Instead of a general function, we could have a bitset describing which words are pointers and which are raw data.

@mrvn

This comment has been minimized.

Show comment
Hide comment
@mrvn

mrvn Mar 25, 2016

Contributor

@damiendoligez For 2) allocate a chunk of memory, put a block with N values at the start, add one value pointing at itself and register it as root, remaining space is for C use. The free function would unregister the root and free the block. It's what everyone already does now except that the allocations get merged into one block and the registering is automatic.

@def-lkb I'm not sure a special unmovable heap would be that beneficial. You could only use it from C and with values a module allocates itself because the type system doesn't know about unmovables. And you still have to register a root to keep the value alive. You might as well just allocate the object fully outside the ocaml heap (there should be helper functions for this creating the block header). With the no-naked-pointer change I think that would do exactly what you want without ocaml having a special unmovable heap. Secondly accessing such a block without the runtime lock would only work if all values in the block only point to other unmovable objects, which basically rules out mutable fields. You could only do that in a verry controlled structure that you set up with a deep copy of all fields.

Contributor

mrvn commented Mar 25, 2016

@damiendoligez For 2) allocate a chunk of memory, put a block with N values at the start, add one value pointing at itself and register it as root, remaining space is for C use. The free function would unregister the root and free the block. It's what everyone already does now except that the allocations get merged into one block and the registering is automatic.

@def-lkb I'm not sure a special unmovable heap would be that beneficial. You could only use it from C and with values a module allocates itself because the type system doesn't know about unmovables. And you still have to register a root to keep the value alive. You might as well just allocate the object fully outside the ocaml heap (there should be helper functions for this creating the block header). With the no-naked-pointer change I think that would do exactly what you want without ocaml having a special unmovable heap. Secondly accessing such a block without the runtime lock would only work if all values in the block only point to other unmovable objects, which basically rules out mutable fields. You could only do that in a verry controlled structure that you set up with a deep copy of all fields.

@mrvn

This comment has been minimized.

Show comment
Hide comment
@mrvn

mrvn Mar 25, 2016

Contributor

@damiendoligez

It would probably not reasonable to have too many such blocks.

Is 10-1000 appearing and disapearing per second too many? In my QT5 bindings if you do catch for example mouse events and then react to them by calling Qt functions that return anything but primitive types you get a ton of allocationd and eventual garbage collections verry quickyly. At the moment I register them as generational global root and the referenced objects generally never make it out of the minor heap. Hopefully that means the roots get unregistered before a major collection and their numbers doesn't matter. Right?

Contributor

mrvn commented Mar 25, 2016

@damiendoligez

It would probably not reasonable to have too many such blocks.

Is 10-1000 appearing and disapearing per second too many? In my QT5 bindings if you do catch for example mouse events and then react to them by calling Qt functions that return anything but primitive types you get a ton of allocationd and eventual garbage collections verry quickyly. At the moment I register them as generational global root and the referenced objects generally never make it out of the minor heap. Hopefully that means the roots get unregistered before a major collection and their numbers doesn't matter. Right?

@bobot

This comment has been minimized.

Show comment
Hide comment
@bobot

bobot Mar 25, 2016

Contributor

@bobot I've thought of that but won't it be too slow?
I think only experimenting could say, indirect calls are predicted by cpu. And if caml_darken (or caml_slice_darken) is exported in some way, its call during the iteration will be direct. I will open a mantis feature request.

Contributor

bobot commented Mar 25, 2016

@bobot I've thought of that but won't it be too slow?
I think only experimenting could say, indirect calls are predicted by cpu. And if caml_darken (or caml_slice_darken) is exported in some way, its call during the iteration will be direct. I will open a mantis feature request.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Mar 25, 2016

Contributor

One feature I would like to see is "mixed blocks": blocks that are in the ordinary OCaml heap, but have only part of themselves scanned by the GC. Applications include:

  • Unboxing of int32, int64, nativeint, double when in records/tuples that also contain other OCaml values
  • Resizable arrays (only the part that is in use needs to be scanned)
  • Storing C-allocated data (such as a callback pointer) in an OCaml block without requiring multiple allocations

@mrvn Accessing unboxed data will still work if you know (through external invariants) that no OCaml thread is accessing it. Applications include zero-copy writing of data to disk or the network.

Contributor

DemiMarie commented Mar 25, 2016

One feature I would like to see is "mixed blocks": blocks that are in the ordinary OCaml heap, but have only part of themselves scanned by the GC. Applications include:

  • Unboxing of int32, int64, nativeint, double when in records/tuples that also contain other OCaml values
  • Resizable arrays (only the part that is in use needs to be scanned)
  • Storing C-allocated data (such as a callback pointer) in an OCaml block without requiring multiple allocations

@mrvn Accessing unboxed data will still work if you know (through external invariants) that no OCaml thread is accessing it. Applications include zero-copy writing of data to disk or the network.

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Apr 4, 2016

Member

@mrvn
It's not a question of how many appear and disappear, but how many are present at any time.
Anyway, your use case looks reasonable.

Member

damiendoligez commented Apr 4, 2016

@mrvn
It's not a question of how many appear and disappear, but how many are present at any time.
Anyway, your use case looks reasonable.

@damiendoligez damiendoligez deleted the damiendoligez:trunk-gc-patches branch Apr 4, 2016

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Apr 15, 2016

Contributor

I suspect this PR is the cause for http://caml.inria.fr/mantis/view.php?id=7228

Contributor

alainfrisch commented Apr 15, 2016

I suspect this PR is the cause for http://caml.inria.fr/mantis/view.php?id=7228

tbrk added a commit to inria-parkas/sundialsml that referenced this pull request Jul 22, 2016

Wrap session pointers in custom blocks
Ensure compatibility with the new OCaml -no-naked-pointers option. See, for
example, ocaml/ocaml#297 (comment).

tbrk added a commit to inria-parkas/sundialsml that referenced this pull request Jul 22, 2016

Add block header to session backrefs
Follow the rules set out in
  ocaml/ocaml#297 (comment)
so that backrefs become compatible with the -no-naked-pointers option.
@hhugo

This comment has been minimized.

Show comment
Hide comment
@hhugo

hhugo Feb 28, 2017

Contributor

@damiendoligez
caml_backtrace_active should not be printed with '%s'.
This causes a segfault.
ocaml 4.03 / 4.04 / 4.05 are affected

Contributor

hhugo commented on byterun/gc_ctrl.c in 0225ca0 Feb 28, 2017

@damiendoligez
caml_backtrace_active should not be printed with '%s'.
This causes a segfault.
ocaml 4.03 / 4.04 / 4.05 are affected

This comment has been minimized.

Show comment
Hide comment
@oandrieu

oandrieu Feb 28, 2017

Contributor

I was trying to fix that (and other similar issues) in #1070

Contributor

oandrieu replied Feb 28, 2017

I was trying to fix that (and other similar issues) in #1070

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment