Bigarray sub-arrays counted towards total out-of-heap memory usage #12491

codido · 2023-08-22T22:11:37Z

The following code snippet demonstrates the issue:

module A1 = Bigarray.Array1

let rec sum_bigarray arr sum =
  match A1.dim arr with
  | 0 -> sum
  | len ->
      let sum = sum + A1.get arr 0 in
      let remaining = A1.sub arr 1 (len - 1) in
      sum_bigarray remaining sum

let () =
  let arr = A1.init Bigarray.Int Bigarray.c_layout 1_000_000 Fun.id in
  Printf.printf "sum: %d\n%!" @@ sum_bigarray arr 0

From a quick look, it appears that creating sub-arrays of bigarrays may result in excessive GC cycles. For example, running the above snippet, the top offenders according to perf are:

    55.29%  main.exe  main.exe              [.] pool_sweep
    12.18%  main.exe  main.exe              [.] caml_darken.part.0
     9.38%  main.exe  main.exe              [.] caml_scan_global_roots
     3.71%  main.exe  main.exe              [.] caml_darken
     2.49%  main.exe  main.exe              [.] mark
     1.99%  main.exe  main.exe              [.] major_collection_slice
     1.06%  main.exe  main.exe              [.] caml_scan_stack

From what I can tell, the out-of-heap amount is increased as a result of calling caml_ba_alloc even if no out-of-heap memory is actually being allocated. To make things worse, since caml_ba_sub calls caml_ba_alloc with the original dimensions, the size of the whole (original) bigarray is counted towards the total out-of-heap memory.

The text was updated successfully, but these errors were encountered:

gasche · 2023-08-23T05:51:15Z

I looked at the code, agree that there is an issue worth fixing, and discuss some options I see to fix the bug. Hopefully people who know bigarrays better can chime in with preference regarding potential fixes.

caml_ba_alloc is defined as follows:

/* [caml_ba_alloc] will allocate a new bigarray object in the heap.
   If [data] is NULL, the memory for the contents is also allocated
   (with [malloc]) by [caml_ba_alloc].
   [data] cannot point into the OCaml heap.
   [dim] may point into an object in the OCaml heap.
*/
CAMLexport value
caml_ba_alloc(int flags, int num_dims, void * data, intnat * dim)

Currently its caml_alloc_custom_mem call counts the bigarray external memory for all GC-tracked bigarrays, which is (as pointed out by @codido) not appropriate for "proxied" bigarrays that share their external memory.

I see two different approaches to fix the issue:

in caml_ba_alloc, we can stop counting the external memory when data is not NULL. In bigarray.c, this change is correct: a non-NULL field is passed exactly for bigarrays that reuse another's memory. But this could lead to under-counting external memory for user code that performs its own external allocation before calling caml_ba_alloc -- it passes explicit data, which is really owned by the bigarray.
we could provide a separate caml_ba_alloc_proxied function for bigarrays that share their data with an existing bigarray, and use that in bigarray.c. This is slightly more invasive, but it guarantees the absence of under-counting in user code using caml_ba_alloc.

Note that for some user code the same over-counting may occur (as in the runtime) and so (1) would also fix the issue in user code while (2) would keep over-counting (if the user does not update their code to use caml_ba_alloc_proxied).

Another solution might be to tune caml_update_proxy to reduce GC pressure from external memory, after caml_ba_alloc increased it -- as if the corresponding external memory had just been deallocated. This sounds bizarre to me.

…subarrays

gasche · 2023-08-23T13:31:10Z

I thought more about this and now believe that approach (1) is always the right choice when the flags parameter contains the MANAGED flag: in this case, the caller (either the runtime or an external user) claims that the memory passed is already owned by the OCaml heap, so it has already been accounted for and we should not increase the GC pacing.

I propose PR #12493 with a bugfix, which is fairly small/uninvasive. With this bugfix, the proposed testcase terminates immediately.

gasche · 2023-08-23T13:45:08Z

Thinking more about this, I am not sure what semantics a user should have in mind when calling caml_ba_alloc with the MANAGED flag and a non-NULL data argument:

my interpretation: data is already owned by the OCaml heap
another reasonable interpretation: the user is transferring ownership of the external data to the OCaml heap (they will not free it, it is now the GC's duty)

This bugfix assumes (1), but (2) is also a possibility. In this case the calls to caml_ba_alloc in the subarray/slicing functions are wrong and should be fixed by introducing a generalized allocation function to tweak the GC-pacing logic.

gasche · 2023-08-23T14:26:47Z

I looked at some user code found through Github code search, and I can see several cases where I believe that the MANAGED flag is used to transfer ownership of newly-allocated memory to the OCaml GC:

The fix proposed in #12493 would be incorrect in this case -- where I believe we should increase GC pacing, so I closed that issue.

I think that a better fix would be to introduce an alloc function whose caller can indicate that they are reusing external memory already tracked in the OCaml heap.

codido · 2023-08-25T19:41:42Z

@gasche, thanks so much for the prompt response and fix!

Regarding the ownership transfer use case you mentioned, it seems a bit fragile due to the prerequisites for the transferred memory block (must reside on the same heap, freeable with free()). Case in point, out of the three examples you noted above, it looks like two might be problematic?

For the unison case, unless I'm missing something, the data transferred does not necessarily point directly to memory allocated by malloc, so freeing it later on could result in memory corruption.

For the core case, wouldn't the GC pace be a bit off here? It looks like caml_ba_alloc is called with the original dim, so the GC pace would be increased by that amount and not by the new size, and moreover, not sure the old (freed) allocation size is taken into account?

Not suggesting that there's anything wrong with your change however, just that it's a tad tricky to get it right :)

gasche · 2023-08-25T20:18:40Z

Good points.

in the unison case, the memory is allocated by caml_output_value_to_malloc, which one may reasonably expect to just call malloc, but in fact calls caml_stat_alloc_noexc, which may (in pooling mode) return a pointer into a larger allocation with some metadata. I would still claim that the unison code is morally correct, and the runtime should make it work by using a real malloc in the public caml_output_value_to_malloc, and using a separate caml_output_value_to_stat_alloc function for runtime tracking needs.
in the core case, the code allocates a new Bigarray with the same dimension as the previous bigarray, and then mutates its dimension array after the fact. This is a bug in the user code (for external memory accounting), which should instead compute the correct dimensions first and then allocate. Once this independent bug is fixed the code should (I think?) work as expected.

codido · 2023-08-25T20:23:16Z

Regarding (2), even if it did that, would the old memory block size (that's now freed) be accounted for?

gasche · 2023-08-25T20:30:16Z

(Indeed, this was an unsatisfying suggestion and I edited it out in the meantime.)

gasche · 2023-08-25T20:59:51Z

Digging a bit further: there is more code on the web that assumes that caml_output_to_malloc can then be freed, for example:

And just for fun there is also code that uses caml_stat_free:

https://github.com/xavierleroy/ocamlmpi/blob/3ff7aedf05216722ec08b2aabab1af84f595f37c/msgs.c#L51-L56

Fixes: ocaml#12491

gasche · 2023-10-03T13:36:18Z

(Xavier sent me his own fixes for this and I need to do the work of comparing them and blessing one of them for upstreaming. I hope to get to it soon.)

Fixes: ocaml#12491

This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc. Fixes: #12491 Closes: #12500

) This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc. Fixes: ocaml#12491 Closes: ocaml#12500

This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc. Fixes: #12491 Closes: #12500 (cherry picked from commit 6125c29)

gasche added a commit to gasche/ocaml that referenced this issue Aug 23, 2023

ocaml#12491: do not change GC pace when creating proxies of existing …

ce76f5f

…subarrays

gasche mentioned this issue Aug 23, 2023

Bigarray: do not change GC pace when creating subarrays or slices #12493

Closed

gasche added a commit to gasche/ocaml that referenced this issue Aug 24, 2023

bigarray: fix ocaml#12491 (memory accounting of shared bigarray data)

ecf2613

gasche added a commit to gasche/ocaml that referenced this issue Aug 24, 2023

bigarray: fix ocaml#12491 (memory accounting of shared bigarray data)

540f387

gasche mentioned this issue Aug 24, 2023

Bigarray: do not change GC pace when creating sub-arrays #12500

Closed

gasche added a commit to gasche/ocaml that referenced this issue Aug 26, 2023

bigarray: fix ocaml#12491 (memory accounting of shared bigarray data)

ba69b2c

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Aug 31, 2023

Do not change GC pace when creating sub-arrays of bigarrays

fd87f73

Fixes: ocaml#12491

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Aug 31, 2023

Do not change GC pace when creating sub-arrays of bigarrays

275e893

Fixes: ocaml#12491

gasche mentioned this issue Sep 16, 2023

caml_output_value_to_malloc: really use malloc #12566

Merged

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Sep 19, 2023

Do not change GC pace when creating sub-arrays of bigarrays

b37c358

Fixes: ocaml#12491

gasche added the bug label Oct 3, 2023

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Nov 17, 2023

Do not change GC pace when creating sub-arrays of bigarrays

a6827e5

Fixes: ocaml#12491

xavierleroy mentioned this issue Nov 17, 2023

Do not change GC pace when creating sub-arrays of bigarrays #12754

Merged

xavierleroy closed this as completed in #12754 Nov 18, 2023

xavierleroy added a commit that referenced this issue Nov 18, 2023

Do not change GC pace when creating sub-arrays of bigarrays (#12754)

6125c29

This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc. Fixes: #12491 Closes: #12500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigarray sub-arrays counted towards total out-of-heap memory usage #12491

Bigarray sub-arrays counted towards total out-of-heap memory usage #12491

codido commented Aug 22, 2023

gasche commented Aug 23, 2023 •

edited

gasche commented Aug 23, 2023

gasche commented Aug 23, 2023

gasche commented Aug 23, 2023

codido commented Aug 25, 2023

gasche commented Aug 25, 2023 •

edited

codido commented Aug 25, 2023

gasche commented Aug 25, 2023

gasche commented Aug 25, 2023

gasche commented Oct 3, 2023

Bigarray sub-arrays counted towards total out-of-heap memory usage #12491

Bigarray sub-arrays counted towards total out-of-heap memory usage #12491

Comments

codido commented Aug 22, 2023

gasche commented Aug 23, 2023 • edited

gasche commented Aug 23, 2023

gasche commented Aug 23, 2023

gasche commented Aug 23, 2023

codido commented Aug 25, 2023

gasche commented Aug 25, 2023 • edited

codido commented Aug 25, 2023

gasche commented Aug 25, 2023

gasche commented Aug 25, 2023

gasche commented Oct 3, 2023

gasche commented Aug 23, 2023 •

edited

gasche commented Aug 25, 2023 •

edited