New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bigarray: do not change GC pace when creating sub-arrays #12500
Conversation
594f0be
to
9df9ee9
Compare
I think this PR implements the correct behavior, but I have a feeling that it duplicates too much code, even with the factoring of the overflow-avoiding size computation. To make this feeling precise, here are two alternate implementations: This one just adds a This one adds a |
Thanks for the alternate implementations. My implementation is sensibly more invasive, but it has the benefits of factoring some common logic among all sub-array-construction functions, namely the management of If we decide that we don't care about this factoring/simplification (after all we are just trying to fix a bug), I am seduced by the minimality of the flags-using approach. (I considered adding new flags, but I thought that it should be a new flag in the In your second proposal, I am not sure that it is a good idea to skip dimension checking in |
(Note: in my mind the next step for this PR is for @xavierleroy to formulate a clear preference among one of the three approaches proposed so far. If it's for one of his, I will force-push the PR to adopt his commit instead and do the reviewing myself.) |
The flag-based solution is short and sweet. But I notice that in my second approach the
This code is currently replicated 4 times, which is not nice and error prone. Can we take a bit more time to think about this?
This is not checked currently either. The only check performed is that the total size of the subarray (as determined from its dimensions) doesn't overflow. |
If you want to add the custom-ops+proxy handling to (I am not sure what part of the current PR you count as code duplication. Can you elaborate?)
Right, and in fact this is also true of the code in my PR (in addition to the code in trunk). So maybe we can set this aspect aside for now. |
Indeed, your |
But |
Sorry, I've been working from memory on this PR, and clearly my memory isn't good enough. Let me reconsider that later, OK? |
It looks like your PR solves the performance regression reported in #12460! |
I can confirm that this PR solves the performance issue, or at least drastically reduces it, in @ghennequin's Both @gasche's PR current code and @xavierleroy's GS stats from `par-matmul-profiling`
As for the code, if I had to rank them by preference, I would say: Speedup on my (noisy) 4 cores machine
|
Thanks @fabbing for the detective hunt putting this PR/bugfix in focus again. After giving it a few months rest, I think that all the options proposed are basically fine. I thus feel tempted by the minimal approach of using a new flag, it is the least amount of code and thus the easiest to review, and it may be useful in the future. If we want to factorize the proxy stuff away, we can send (you @fabbing can send!) a separate PR for that. @xavierleroy what do you think of submitting trunk...xavierleroy:ocaml:bigarray-pace-1 as a PR of your own? I could help review it and get it merged. (I don't mind replacing my PR with your code, but then I cannot act as reviewer.) |
Submitted the minimal fix as #12754. |
) This is achieved by adding a CAML_BA_SUBARRAY flag that is honored by caml_ba_alloc. Fixes: ocaml#12491 Closes: ocaml#12500
This is another attempt to fix #12491 (the first attempt #12493 did not convince me).
This PR introduces a new bigarray-creation function
that creates a sub-array: a new OCaml bigarray that shares its data with a previous OCaml bigarray -- and does not increase the pace of the GC / correctly accounts for external memory usage.
The behavior of
caml_ba_alloc
is unchanged.This
caml_ba_inherit
function could be exported to users, it could be useful if they want to define their own bigarray-reusing function, but this is not done in this PR -- it is only visible inside bigarray.c, for data-reusing functions exposed in the stdlib.I checked that the reproduction case of #12491 terminates immediately with this PR.
PR structure / review tips
This PR is best reviewed commit by commit.
The first two commits are refactoring commits that should keep the behavior unchanged:
to make bigarray-creating functions easier to follow
caml_ba_inherit
helper and uses it(at this point the GC accounting is still wrong)
The last commit fixes the bug by introducing a
caml_ba_alloc_gen
function that takes an extra parameterowns_data
, that indicates whether the external memory should be credited to the OCaml heap, and using it incaml_ba_inherit
.