Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigarray sub-arrays counted towards total out-of-heap memory usage #12491

Closed
codido opened this issue Aug 22, 2023 · 10 comments · Fixed by #12754
Closed

Bigarray sub-arrays counted towards total out-of-heap memory usage #12491

codido opened this issue Aug 22, 2023 · 10 comments · Fixed by #12754
Labels

Comments

@codido
Copy link

codido commented Aug 22, 2023

The following code snippet demonstrates the issue:

module A1 = Bigarray.Array1

let rec sum_bigarray arr sum =
  match A1.dim arr with
  | 0 -> sum
  | len ->
      let sum = sum + A1.get arr 0 in
      let remaining = A1.sub arr 1 (len - 1) in
      sum_bigarray remaining sum

let () =
  let arr = A1.init Bigarray.Int Bigarray.c_layout 1_000_000 Fun.id in
  Printf.printf "sum: %d\n%!" @@ sum_bigarray arr 0

From a quick look, it appears that creating sub-arrays of bigarrays may result in excessive GC cycles. For example, running the above snippet, the top offenders according to perf are:

    55.29%  main.exe  main.exe              [.] pool_sweep
    12.18%  main.exe  main.exe              [.] caml_darken.part.0
     9.38%  main.exe  main.exe              [.] caml_scan_global_roots
     3.71%  main.exe  main.exe              [.] caml_darken
     2.49%  main.exe  main.exe              [.] mark
     1.99%  main.exe  main.exe              [.] major_collection_slice
     1.06%  main.exe  main.exe              [.] caml_scan_stack

From what I can tell, the out-of-heap amount is increased as a result of calling caml_ba_alloc even if no out-of-heap memory is actually being allocated. To make things worse, since caml_ba_sub calls caml_ba_alloc with the original dimensions, the size of the whole (original) bigarray is counted towards the total out-of-heap memory.

@gasche
Copy link
Member

gasche commented Aug 23, 2023

I looked at the code, agree that there is an issue worth fixing, and discuss some options I see to fix the bug. Hopefully people who know bigarrays better can chime in with preference regarding potential fixes.

caml_ba_alloc is defined as follows:

/* [caml_ba_alloc] will allocate a new bigarray object in the heap.
   If [data] is NULL, the memory for the contents is also allocated
   (with [malloc]) by [caml_ba_alloc].
   [data] cannot point into the OCaml heap.
   [dim] may point into an object in the OCaml heap.
*/
CAMLexport value
caml_ba_alloc(int flags, int num_dims, void * data, intnat * dim)

Currently its caml_alloc_custom_mem call counts the bigarray external memory for all GC-tracked bigarrays, which is (as pointed out by @codido) not appropriate for "proxied" bigarrays that share their external memory.

I see two different approaches to fix the issue:

  1. in caml_ba_alloc, we can stop counting the external memory when data is not NULL. In bigarray.c, this change is correct: a non-NULL field is passed exactly for bigarrays that reuse another's memory. But this could lead to under-counting external memory for user code that performs its own external allocation before calling caml_ba_alloc -- it passes explicit data, which is really owned by the bigarray.
  2. we could provide a separate caml_ba_alloc_proxied function for bigarrays that share their data with an existing bigarray, and use that in bigarray.c. This is slightly more invasive, but it guarantees the absence of under-counting in user code using caml_ba_alloc.

Note that for some user code the same over-counting may occur (as in the runtime) and so (1) would also fix the issue in user code while (2) would keep over-counting (if the user does not update their code to use caml_ba_alloc_proxied).

Another solution might be to tune caml_update_proxy to reduce GC pressure from external memory, after caml_ba_alloc increased it -- as if the corresponding external memory had just been deallocated. This sounds bizarre to me.

@gasche
Copy link
Member

gasche commented Aug 23, 2023

I thought more about this and now believe that approach (1) is always the right choice when the flags parameter contains the MANAGED flag: in this case, the caller (either the runtime or an external user) claims that the memory passed is already owned by the OCaml heap, so it has already been accounted for and we should not increase the GC pacing.

I propose PR #12493 with a bugfix, which is fairly small/uninvasive. With this bugfix, the proposed testcase terminates immediately.

@gasche
Copy link
Member

gasche commented Aug 23, 2023

Thinking more about this, I am not sure what semantics a user should have in mind when calling caml_ba_alloc with the MANAGED flag and a non-NULL data argument:

  1. my interpretation: data is already owned by the OCaml heap
  2. another reasonable interpretation: the user is transferring ownership of the external data to the OCaml heap (they will not free it, it is now the GC's duty)

This bugfix assumes (1), but (2) is also a possibility. In this case the calls to caml_ba_alloc in the subarray/slicing functions are wrong and should be fixed by introducing a generalized allocation function to tweak the GC-pacing logic.

@gasche
Copy link
Member

gasche commented Aug 23, 2023

I looked at some user code found through Github code search, and I can see several cases where I believe that the MANAGED flag is used to transfer ownership of newly-allocated memory to the OCaml GC:

The fix proposed in #12493 would be incorrect in this case -- where I believe we should increase GC pacing, so I closed that issue.

I think that a better fix would be to introduce an alloc function whose caller can indicate that they are reusing external memory already tracked in the OCaml heap.

@codido
Copy link
Author

codido commented Aug 25, 2023

@gasche, thanks so much for the prompt response and fix!

Regarding the ownership transfer use case you mentioned, it seems a bit fragile due to the prerequisites for the transferred memory block (must reside on the same heap, freeable with free()). Case in point, out of the three examples you noted above, it looks like two might be problematic?

For the unison case, unless I'm missing something, the data transferred does not necessarily point directly to memory allocated by malloc, so freeing it later on could result in memory corruption.

For the core case, wouldn't the GC pace be a bit off here? It looks like caml_ba_alloc is called with the original dim, so the GC pace would be increased by that amount and not by the new size, and moreover, not sure the old (freed) allocation size is taken into account?

Not suggesting that there's anything wrong with your change however, just that it's a tad tricky to get it right :)

@gasche
Copy link
Member

gasche commented Aug 25, 2023

Good points.

  1. in the unison case, the memory is allocated by caml_output_value_to_malloc, which one may reasonably expect to just call malloc, but in fact calls caml_stat_alloc_noexc, which may (in pooling mode) return a pointer into a larger allocation with some metadata. I would still claim that the unison code is morally correct, and the runtime should make it work by using a real malloc in the public caml_output_value_to_malloc, and using a separate caml_output_value_to_stat_alloc function for runtime tracking needs.

  2. in the core case, the code allocates a new Bigarray with the same dimension as the previous bigarray, and then mutates its dimension array after the fact. This is a bug in the user code (for external memory accounting), which should instead compute the correct dimensions first and then allocate. Once this independent bug is fixed the code should (I think?) work as expected.

@codido
Copy link
Author

codido commented Aug 25, 2023

Regarding (2), even if it did that, would the old memory block size (that's now freed) be accounted for?

@gasche
Copy link
Member

gasche commented Aug 25, 2023

(Indeed, this was an unsatisfying suggestion and I edited it out in the meantime.)

gasche added a commit to gasche/ocaml that referenced this issue Aug 26, 2023
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Aug 31, 2023
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Aug 31, 2023
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Sep 19, 2023
@gasche gasche added the bug label Oct 3, 2023
@gasche
Copy link
Member

gasche commented Oct 3, 2023

(Xavier sent me his own fixes for this and I need to do the work of comparing them and blessing one of them for upstreaming. I hope to get to it soon.)

xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Nov 17, 2023
xavierleroy added a commit that referenced this issue Nov 18, 2023
This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc.

Fixes: #12491
Closes: #12500
xavierleroy added a commit to xavierleroy/ocaml that referenced this issue Nov 21, 2023
)

This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc.

Fixes: ocaml#12491
Closes: ocaml#12500
Octachron pushed a commit that referenced this issue Nov 28, 2023
This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc.

Fixes: #12491
Closes: #12500
(cherry picked from commit 6125c29)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants