Bigarray: do not change GC pace when creating sub-arrays #12500

gasche · 2023-08-24T23:14:48Z

This is another attempt to fix #12491 (the first attempt #12493 did not convince me).

This PR introduces a new bigarray-creation function

static value caml_ba_inherit(value vb, int num_dims, intnat * dim)

that creates a sub-array: a new OCaml bigarray that shares its data with a previous OCaml bigarray -- and does not increase the pace of the GC / correctly accounts for external memory usage.

The behavior of caml_ba_alloc is unchanged.

This caml_ba_inherit function could be exported to users, it could be useful if they want to define their own bigarray-reusing function, but this is not done in this PR -- it is only visible inside bigarray.c, for data-reusing functions exposed in the stdlib.

I checked that the reproduction case of #12491 terminates immediately with this PR.

PR structure / review tips

This PR is best reviewed commit by commit.

The first two commits are refactoring commits that should keep the behavior unchanged:

the first commit factors out the overflow-resilient size computation,
to make bigarray-creating functions easier to follow
the second introduces the caml_ba_inherit helper and uses it
(at this point the GC accounting is still wrong)

The last commit fixes the bug by introducing a caml_ba_alloc_gen function that takes an extra parameter owns_data, that indicates whether the external memory should be credited to the OCaml heap, and using it in caml_ba_inherit.

xavierleroy · 2023-08-31T08:13:38Z

I think this PR implements the correct behavior, but I have a feeling that it duplicates too much code, even with the factoring of the overflow-avoiding size computation. To make this feeling precise, here are two alternate implementations:

trunk...xavierleroy:ocaml:bigarray-pace-1

This one just adds a CAML_BA_SUBARRAY flag that lets caml_ba_alloc do the right thing for subarrays. The flag is recorded in the subarray, and might be useful one day (?)

trunk...xavierleroy:ocaml:bigarray-pace-2

This one adds a caml_ba_sub_alloc function that duplicates a bit of code from caml_ba_alloc, but less than in your solution, because for sub arrays there's no need to allocate and there's no need to compute the total size.

gasche · 2023-08-31T08:48:39Z

Thanks for the alternate implementations.

My implementation is sensibly more invasive, but it has the benefits of factoring some common logic among all sub-array-construction functions, namely the management of Custom_ops and of the proxy. I find the resulting code a bit cleaner, because it hides proxy management from domain-specific code.

If we decide that we don't care about this factoring/simplification (after all we are just trying to fix a bug), I am seduced by the minimality of the flags-using approach. (I considered adding new flags, but I thought that it should be a new flag in the caml_ba_managed enum, and I was worried about the backwards-compatibility implications of changing CAML_BA_MANAGED_MASK.) I agree that keeping the information in the metadata is a nice side-effect.

In your second proposal, I am not sure that it is a good idea to skip dimension checking in caml_ba_sub_alloc. I agree that all current callers respect the invariant that the new total size is smaller or equal to the previous one (and so cannot overflow, etc.), but I think it would be safer (and is quite easy) to check this in some way in the array-creation function.

gasche · 2023-08-31T08:59:02Z

(Note: in my mind the next step for this PR is for @xavierleroy to formulate a clear preference among one of the three approaches proposed so far. If it's for one of his, I will force-push the PR to adopt his commit instead and do the reviewing myself.)

xavierleroy · 2023-08-31T11:10:32Z

The flag-based solution is short and sweet. But I notice that in my second approach the caml_ba_sub_alloc function could also take care of

  /* Copy the finalization function from the original array (PR#8568) */
  Custom_ops_val(res) = Custom_ops_val(vb);
  /* Create or update proxy in case of managed bigarray */
  caml_ba_update_proxy(b, Caml_ba_array_val(res));

This code is currently replicated 4 times, which is not nice and error prone. Can we take a bit more time to think about this?

In your second proposal, I am not sure that it is a good idea to skip dimension checking in caml_ba_sub_alloc. I agree that all current callers respect the invariant that the new total size is smaller or equal to the previous one

This is not checked currently either. The only check performed is that the total size of the subarray (as determined from its dimensions) doesn't overflow.

gasche · 2023-08-31T11:30:47Z

If you want to add the custom-ops+proxy handling to caml_ba_sub_alloc, you need to take the value bigarray in parameter, and this becomes very close the current PR.

(I am not sure what part of the current PR you count as code duplication. Can you elaborate?)

This is not checked currently either. The only check performed is that the total size of the subarray (as determined from its dimensions) doesn't overflow.

Right, and in fact this is also true of the code in my PR (in addition to the code in trunk). So maybe we can set this aspect aside for now.

xavierleroy · 2023-08-31T11:37:59Z

Indeed, your caml_ba_inherit function is fine, the code duplication I was unhappy about is in caml_ba_alloc_gen.

gasche · 2023-08-31T11:44:50Z

But caml_ba_alloc_gen replaces caml_ba_alloc (which is just a wrapper around it), the code is moved rather than copied; is it the duplication of documentation and parameter list that you are worried about? (I could probably just remove the documentation of caml_ba_alloc.)

xavierleroy · 2023-08-31T14:12:33Z

Sorry, I've been working from memory on this PR, and clearly my memory isn't good enough. Let me reconsider that later, OK?

fabbing · 2023-11-14T13:55:32Z

It looks like your PR solves the performance regression reported in #12460!
I haven't yet looked at the changes yet but I'll be reviewing it in details (I'm not yet familiar with Bigarray) and also consider @xavierleroy's two alternative implementations.

fabbing · 2023-11-15T15:50:06Z

I can confirm that this PR solves the performance issue, or at least drastically reduces it, in @ghennequin's par-matmul-profiling reproduction example.
owl uses Bigarray and the increase in the GC pace causes many more GC major collections (x7 times more in 5.1 than 5.0), causing more running time spent in GC work rather than mutator work.

Both @gasche's PR current code and @xavierleroy's bigarray-pace-1 help the GC to perform the same number of major collections as before in 5.0.0.

GS stats from `par-matmul-profiling`

	5.0.0	5.1.0	pr12500	pace-1
minor_collections	625	310	809	848
major_collections	39	285	39	39
compactions	0	0	0	0
forced_major_collections	1	1	1	1

minor_words	127335409	54592452	115691617	110620325
promoted_words	96211	105612	96117	95962
major_words	103751	113691	103658	103369

top_heap_words	1917231	1589551	1696047	1790255
heap_words	196508	188316	204700	200604
live_words	18897	18968	18901	18901
free_words	176704	168460	184880	180786
largest_free	0	0	0	0
fragments	907	888	919	917

live_blocks	1706	1716	1707	1707
free_blocks	0	0	0	0
heap_chunks	0	0	0	0

As for the code, if I had to rank them by preference, I would say:
1- bigarray-pace-1 is appealing because it is particularly concise.
2- @gasche's code is more verbose but I don't think it duplicates code, rather it refactors ba_compute_size outside of caml_ba_alloc and caml_ba_deserialize. If bigarray-pace-1 is favoured, maybe this idea could be integrated into bigarray-pace-1 or another PR?
3- bigarray-pace-2 feels riskier as it duplicates code from caml_ba_alloc in caml_ba_sub_alloc.

Speedup on my (noisy) 4 cores machine

Domains	5.0	5.1	pr12500	pace-1
1	1,00106	1,03771	0,988304	0,971236
2	1,36456	0,980097	1,24194	1,12929
4	1,97288	1,53115	1,87095	1,83326

gasche · 2023-11-17T16:04:44Z

Thanks @fabbing for the detective hunt putting this PR/bugfix in focus again.

After giving it a few months rest, I think that all the options proposed are basically fine. I thus feel tempted by the minimal approach of using a new flag, it is the least amount of code and thus the easiest to review, and it may be useful in the future. If we want to factorize the proxy stuff away, we can send (you @fabbing can send!) a separate PR for that.

@xavierleroy what do you think of submitting trunk...xavierleroy:ocaml:bigarray-pace-1 as a PR of your own? I could help review it and get it merged. (I don't mind replacing my PR with your code, but then I cannot act as reviewer.)

xavierleroy · 2023-11-17T17:20:25Z

Submitted the minimal fix as #12754.

) This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc. Fixes: ocaml#12491 Closes: ocaml#12500

This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc. Fixes: #12491 Closes: #12500 (cherry picked from commit 6125c29)

gasche added 4 commits August 26, 2023 06:35

[refactoring] bigarray: auxiliary function for size computation

b3897b5

[refactoring] bigarray: shared helper function for bigarray data reuse

017e30c

bigarray: fix ocaml#12491 (memory accounting of shared bigarray data)

ba69b2c

Changes

9df9ee9

gasche force-pushed the bigarray-proxy-accounting-2 branch from 594f0be to 9df9ee9 Compare August 26, 2023 04:35

gasche added the runtime-system label Oct 7, 2023

fabbing mentioned this pull request Nov 17, 2023

Performance regression with multiple domains between 5.0 and 5.1 #12460

Open

xavierleroy mentioned this pull request Nov 17, 2023

Do not change GC pace when creating sub-arrays of bigarrays #12754

Merged

xavierleroy closed this in 6125c29 Nov 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigarray: do not change GC pace when creating sub-arrays #12500

Bigarray: do not change GC pace when creating sub-arrays #12500

gasche commented Aug 24, 2023 •

edited

xavierleroy commented Aug 31, 2023

gasche commented Aug 31, 2023

gasche commented Aug 31, 2023

xavierleroy commented Aug 31, 2023

gasche commented Aug 31, 2023

xavierleroy commented Aug 31, 2023

gasche commented Aug 31, 2023

xavierleroy commented Aug 31, 2023

fabbing commented Nov 14, 2023

fabbing commented Nov 15, 2023 •

edited

gasche commented Nov 17, 2023

xavierleroy commented Nov 17, 2023

Bigarray: do not change GC pace when creating sub-arrays #12500

Bigarray: do not change GC pace when creating sub-arrays #12500

Conversation

gasche commented Aug 24, 2023 • edited

PR structure / review tips

xavierleroy commented Aug 31, 2023

gasche commented Aug 31, 2023

gasche commented Aug 31, 2023

xavierleroy commented Aug 31, 2023

gasche commented Aug 31, 2023

xavierleroy commented Aug 31, 2023

gasche commented Aug 31, 2023

xavierleroy commented Aug 31, 2023

fabbing commented Nov 14, 2023

fabbing commented Nov 15, 2023 • edited

gasche commented Nov 17, 2023

xavierleroy commented Nov 17, 2023

gasche commented Aug 24, 2023 •

edited

fabbing commented Nov 15, 2023 •

edited