Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caml_alloc_custom_mem #1738

Merged
merged 10 commits into from Nov 6, 2018

Conversation

Projects
None yet
6 participants
@damiendoligez
Copy link
Member

commented Apr 25, 2018

This PR is related to #1476 and MPR#7750.

It adds one function to the runtime and three new GC parameters.

The function (caml_alloc_custom_mem) is to be used by C code to allocate custom objects that consist of a small block in the heap pointing to a large amount of memory outside the heap (typically allocated with malloc). This function takes only one size parameter, the number of bytes of memory outside the heap used by this custom block.

The user can then control how much of this memory is retained by dead blocks because the GC does not immediately collect dead blocks ("floating garbage"). This is done with three new GC parameters:

  • M custom_major_ratio This is a percentage relative to the major heap size. A complete GC cycle will be done every time 2/3 of that much memory is allocated for blocks in the major heap. Assuming constant allocation and deallocation rates, this means there are at most (M/100 * major-heap-size) bytes of floating garbage at any time. The factor of 2/3 (or 1.5) is (roughly speaking) because the major GC takes 1.5 cycles (previous cycle + marking phase) before it starts to deallocate dead blocks allocated during the previous cycle.

  • m custom_minor_ratio This is a percentage relative to the minor heap size. A minor GC will be forced when the memory allocated for blocks in the minor heap reaches this value.

  • n custom_minor_max_size This is a number of bytes. Blocks allocated in the minor heap that are over this size are assumed to be long-lived and counted immediately against the major heap rather than the minor heap. This is to avoid large allocations forcing a minor GC every time.

The function is then used to allocate bigarrays, allowing us to get rid of the infamous CAML_BA_MAX_MEMORY constant.

When applied to the test program of MPR#7750, the results are pretty good: we can make the process size vary from about 6GB (with M=10) to 15 GB (with M=100) to 25 GB (with M=200).

needs to be done: documentation

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2018

This should also hopefully address MPR#7100.

Show resolved Hide resolved byterun/major_gc.c Outdated
@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2018

Not strictly speaking related to the proposed change, but I think it's good to maintain an overall understanding of how custom blocks are handled by the GC. Could you explain the following heuristics in caml_adjust_gc_speed:

  if (caml_extra_heap_resources
           > (double) caml_minor_heap_wsz / 2.0
             / (double) caml_stat_heap_wsz) {
    CAML_INSTR_INT ("request_major/adjust_gc_speed_2@", 1);
    caml_request_major_slice ();
  }

As the heap grows during the life of the program, this will trigger more and more often. For instance, each i/o channel allocation (currently with ratio 1/1000) will trigger a major slice as soon as:

  caml_stat_heap_wsz > 500 * caml_minor_heap_wsz

and this can easily happen. Does it really make sense to trigger GC slices more often when the heap grows for a constant allocation rate of custom blocks?

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Apr 26, 2018

A bit of digging with git blame reveals that this heuristic was already present in 1995 in the very first commit in OCaml's history, so it dates back to Caml Light... It's probably time to re-assess because I really don't remember the reason for this.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2018

@stedolan

This comment has been minimized.

Copy link
Contributor

commented May 1, 2018

This looks really nice. The only bit that worries me is custom_minor_max_size, because it looks like performance might change abruptly when allocation sizes go from 8191 to 8193 bytes. Rather than a threshold, how about something smoother, like say adding the first 8k to extra_heap_resources_minor and the rest to extra_heap_resources?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented May 2, 2018

If you're in the mood for archeology, the Caml Light sources (with CVS
history) are here

This leads to :

camllight/camllight@67c1dc3#diff-d0f533866c552cc04626e6d892579278R153

which does not give any rationale for that specific heuristics.

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented May 3, 2018

This looks really nice. The only bit that worries me is custom_minor_max_size, because it looks like performance might change abruptly when allocation sizes go from 8191 to 8193 bytes. Rather than a threshold, how about something smoother, like say adding the first 8k to extra_heap_resources_minor and the rest to extra_heap_resources?

Sounds like a good idea. I'll try it.

@damiendoligez damiendoligez force-pushed the damiendoligez:custom-obj-mem branch from 5ad3b36 to 6bab982 May 3, 2018

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented May 4, 2018

@stedolan Done.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented May 11, 2018

@damiendoligez Should this new API also be used for i/o channels?

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented May 20, 2018

@alainfrisch: I/O buffers consume both one file descriptor (a system resource, perhaps more appropriate for the old API?) and an out-of-heap buffer (a memory resource, very appropriate for the new API). So, it's a tie.

This said I'm very happy to see the new API. If the experts are OK with it I suggest merging in trunk ASAP.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented May 22, 2018

I/O buffers consume both one file descriptor (a system resource, perhaps more appropriate for the old API?)

It don't think it's the case. The life-time of the file descriptor does not depend on the GC: it can and must be released explicitly (with close), and the GC finaliser won't touch it (no implicit close). So as far as I can tell, the channel object is equivalent to a pure memory buffer, as far as the GC is concerned.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented May 22, 2018

(About i/o channels: with an extra indirection, one could even destroy (free) the buffer when the channel is (explicitly) closed, but it's tricky to make this thread-safe.)

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented May 23, 2018

@alainfrisch you're right, the FD part of the channel is unmanaged, and the memory part is a good fit for the new API, so I'm going to convert it too.

Don't merge before I do that.

@damiendoligez damiendoligez force-pushed the damiendoligez:custom-obj-mem branch from 4ab5946 to d278ab1 Jun 6, 2018

@delamonpansie

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2018

Why just not use caml_alloc_dependent_memory and caml_free_dependent_memory?
Yes, this would require extending caml_ba_proxy with field holding actual size of allocation. It simple and relies on already existing Gc infrastructure.

Here is a small test case: (it requires 4.07)

module Gc_helper = struct
  type void
  external alloc_dependent_memory : (int [@untaged]) -> void = "caml_alloc_dependent_memory"
  external free_dependent_memory : (int [@untaged]) -> void = "caml_free_dependent_memory"

  let allocated = ref 0
  let minor_heap_size = Gc.((get ()).minor_heap_size) / (Sys.word_size / 8)

  let bigarray v =
    let size = Bigarray.(Array1.size_in_bytes v) in
    allocated := !allocated + size;
    if !allocated > minor_heap_size then begin
        allocated := size;
        Gc.major_slice (-1) |> ignore; (* heuristics from https://github.com/ocaml/ocaml/blob/trunk/byterun/memory.c#L522 *)
      end;
    alloc_dependent_memory size |> ignore;
    Gc.finalise_last (fun () -> free_dependent_memory size |> ignore) v; (* this should go to caml_ba_finaliser *)
    v
end

module Memstat = struct
  let word_size = Sys.word_size / 8 (* word size in bytes *)

  let kmgt size =
    let sprintf = Printf.sprintf in
    if size < 512 * 1024 then
      sprintf "%.2fk" (float size /. 0x1.0p10)
    else if size < 512 * 1024 * 1024 then
      sprintf "%.2fm" (float size /. 0x1.0p20)
    else if size < 512 * 1024 * 1024 * 1024 then
      sprintf "%.2fg" (float size /. 0x1.0p30)
    else
      sprintf "%.2ft" (float size /. 0x1.0p40)

  let statm () =
    let open Scanf in
    let fd = Scanning.from_file "/proc/self/statm" in
    let result = bscanf fd "%i %i %i"
                   (fun size resident shared -> (size, resident, shared) ) in
    Scanning.close_in fd;
    result

  let alive_buf_size = ref 0
  let total_buf_size = ref 0

  let alloc size v =
    total_buf_size := !total_buf_size + size;
    alive_buf_size := !alive_buf_size + size;
    Gc.finalise_last (fun () -> total_buf_size := !total_buf_size - size) v
  let free size =
    alive_buf_size := !alive_buf_size - size

  let rec memstat () =
    let stat = Gc.stat () in
    let vss, _, _ = statm () in
    Printf.printf "% 9s vss; % 9s heap; % 9s live; % 9s alive buf; % 9s total buf; % 9.0f%% garbage\n%!"
      (kmgt (vss * 4096))
      (kmgt (stat.heap_words * word_size))
      (kmgt (stat.live_words * word_size))
      (kmgt !alive_buf_size)
      (kmgt !total_buf_size)
      (100. *. float (!total_buf_size - !alive_buf_size) /. float !alive_buf_size);
    Unix.sleepf 0.5;
    memstat ()

  let _ =
    Thread.create memstat ()
end


(* Create some memory pressure, simulating long running programm with some
   reasonable amount of small objects *)
let tree =
  let limit = 1024 * 1024 * (try int_of_string (Sys.getenv "LIMIT")
                             with _ -> 10) in
  let key size =
    Bytes.init size (fun _ -> Random.int 9 + Char.code '0' |> Char.chr) in
  let module T = Set.Make(Bytes) in
  let rec build total_size t =
    if total_size > limit then
      t
    else
      let size = Random.int 16 in
      build (total_size + size) (T.add (key size) t)
  in
  print_string "Allocating some objects\n";
  build 0 T.empty

let run message alloc =
  Gc.compact ();
  print_string message;
  print_newline ();

  let q = Queue.create () in
  let deadline = Unix.gettimeofday () +. 15.0 in
  while Unix.gettimeofday () < deadline do
    let size = 2 * 1024 * 1024 + Random.int (2 * 1024 * 1024) in
    let buf = alloc size in
    Queue.push (size, buf) q;
    Memstat.alloc size buf;

    (* pretend  we have about 30 active buffers at every moment of time *)
    while Queue.length q > 30 do
      let size, _ = Queue.pop q in
      Memstat.free size;
    done;
  done;

  (* drain queue *)
  while Queue.length q > 0 do
    let size, _ = Queue.pop q in
    Memstat.alive_buf_size := !Memstat.alive_buf_size - size
  done


let _ =
  run "Bytes" (fun size -> Bytes.create size);
  run "helped Bigarray" (fun size -> Bigarray.(Array1.create char c_layout size) |> Gc_helper.bigarray);
  run "plain Bigarray" (fun size -> Bigarray.(Array1.create char c_layout size));


(* Local Variables: *)
(* compile-command: "ocamlopt -thread unix.cmxa threads.cmxa bigarray.cmxa gc_sim.ml -o gc_sim" *)
(* End: *)
@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

Why just not use caml_alloc_dependent_memory and caml_free_dependent_memory?

That API is harder to use (it mandates the use of finalizers) and never caught on. It doesn't let the user control the ratio of heap to external memory.

@damiendoligez damiendoligez force-pushed the damiendoligez:custom-obj-mem branch from d278ab1 to 2dd846b Jul 2, 2018

@gasche

This comment has been minimized.

Copy link
Member

commented Sep 14, 2018

As far as I can tell, this PR has not been fully reviewed (and not approved), and @damiendoligez indicates that it is ready. @stedolan, would you maybe be willing to do a review?

@damiendoligez damiendoligez force-pushed the damiendoligez:custom-obj-mem branch from 2dd846b to ac908e4 Sep 14, 2018

@gasche

This comment has been minimized.

Copy link
Member

commented Oct 28, 2018

( @alainfrisch or @jhjourdan might be qualified to review as well. )

@alainfrisch
Copy link
Contributor

left a comment

LGTM, only cosmetic comments.

@damiendoligez Did you test MPR#7100 against this branch?

Show resolved Hide resolved man/ocamlrun.m Outdated
Show resolved Hide resolved man/ocamlrun.m Outdated
Show resolved Hide resolved man/ocamlrun.m Outdated
Show resolved Hide resolved man/ocamlrun.m Outdated
Show resolved Hide resolved manual/manual/cmds/runtime.etex
Show resolved Hide resolved runtime/custom.c Outdated
Show resolved Hide resolved stdlib/gc.mli Outdated
Show resolved Hide resolved testsuite/tests/misc/ephetest2.ml
Show resolved Hide resolved stdlib/gc.mli Outdated
@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2018

You could also use the new API in win32graph/draw.c, graph/image.c and avoid magic constants in those files. It seems all other uses in the core distribution are with default values for mem/max (0/1), so could also be switched to new API (or not).

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

@damiendoligez : careful, I think something went wrong with a merge/rebase on the branch.

@damiendoligez damiendoligez force-pushed the damiendoligez:custom-obj-mem branch from 12546c4 to 199120d Nov 5, 2018

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Nov 5, 2018

I fixed the rebase problem. Now I'll look at graph and other uses in the core distribution.

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Nov 5, 2018

I'm not sure about Win32, but on X Window, the memory held by a Pixmap is in the server, not in the user process, so I don't think it makes sense to use the new API there.

For the other uses, you're right that they all use 0/1 so converting them wouldn't change anything.

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Nov 5, 2018

I tried the test program of MPR#7100 and indeed with this PR it doesn't fail.

Show resolved Hide resolved stdlib/gc.mli Outdated
@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Nov 5, 2018

Note: it will be time to revisit #1961 after this PR is merged and tested in the wild for some time.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

I tried the test program of MPR#7100 and indeed with this PR it doesn't fail.

(Actually, I'm confused on whether #1476 already fixed that one.)

@alainfrisch
Copy link
Contributor

left a comment

LGTM

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

(Actually, I'm confused on whether #1476 already fixed that one.)

I don't think it did. I tried the test program with 4.07.1 and it failed.

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

CI passed, so I'm merging. Thanks for the review!

@damiendoligez damiendoligez merged commit 17b64ac into ocaml:trunk Nov 6, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@damiendoligez damiendoligez deleted the damiendoligez:custom-obj-mem branch Nov 6, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Nov 9, 2018

Other tickets more or less directly related to this PR:

https://caml.inria.fr/mantis/view.php?id=7198
https://caml.inria.fr/mantis/view.php?id=7676 (somehow)
https://caml.inria.fr/mantis/view.php?id=6294
https://caml.inria.fr/mantis/view.php?id=7158

(I'm not sure it's worth mentioning them in the Changelog, but I'll let @damiendoligez decide.)

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Nov 12, 2018

This PR resolves MPR#7198 but the others will also need changes in other software.

I'll add it to Changes.

@damiendoligez

This comment has been minimized.

Copy link
Member Author

commented Nov 13, 2018

I added a mention of MPR#7198 (3f46a1d). The others are not directly fixed by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.