Towards a new closure representation (native code) #8984
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
There have been various discussions recently on the subject of the layout of closures. The current layout of these values means that they cannot be traversed by the GC without the use of a page table (or hacks such as in #1156). They also rely on the use of
Infix_tag
, the only mechanism that permits pointers into the middle of heap blocks.We can gain benefits by removing reliance on a page table. It produces a runtime performance improvement, typically of a few percent. It also fits in with the desire of the Multicore OCaml devs not to have a page table at all, stemming from the fact that the maintenance of such a table in a parallel environment is more complicated.
Removing reliance on
Infix_tag
opens the possibility of significant simplification to the GC code. The presence of infix pointers has apparently caused various bugs in the development versions of Multicore OCaml.This patch
This patch changes the layout of closure values used by the native code compilers on both Closure and Flambda paths. @kayceesrk is working on another patch to do the same for the bytecode compiler. After two such patches, we would be in a position to remove
Infix_tag
and all associated code.What I present here is code that should pass the compiler's testsuite, but has not been validated at scale, nor subject to benchmarking. I think some discussion is in order before proceeding any further.
Proposed layout
I have chosen a simple layout. As an example assume variables
f
,g
andh
binding mutually-defined closures (at the OCaml source level, mutually-recursive functions) with one, two and three arguments respectively. We say thatf
,g
andh
form a "set of closures". The code pointer off
isf_code
(respectively forg
andh
); the union of the free variables of the functions (apart from variablesf
,g
andh
) arefv_0
throughfv_n
. The in-memory layout, of values with tagClosure_tag
, is:Statically-allocated closures, which have no free variables, do not require any runtime patching. Dynamically-allocated closures are allocated with placeholders in the closure slots (here pointing to
f
,g
andh
) and then immediately patched usingcaml_modify
to tie the knots. There are no placeholders allocated that point at the same closure (so for example inf
, there is no slot that itself points atf
). Environments are explicitly de-shared.IR changes
The Clambda language has been modified to accomodate the new representation. We have drawn on experience from Flambda 2.0, which has taught us that a good way of handling the binding of multiple mutually-recursive functions is to have a binding construct in the IR that binds multiple names at once, eliminating in particular any notion of a "variable pointing at a set of closures". As such we provide:
Ulet_set_of_closures
, which produces a mutually-recursive binding of closures to variables, around a body in the style of a normal let-expression;Uselect_closure
, which allows access via the given closure to others in the same set of closures. (For example, above, givenf
we could get tog
andh
). This is equivalent to theMove_within_set_of_closures
construct in Flambda (which in Flambda 2.0 is calledSelect_closure
).Closures within a set are referenced by integer indices, starting at zero, matching up with the order in which the function declarations are found in the appropriate maps during compilation. All offset calculations are done statically at compile time.
These constructs supercede
Uclosure
andUoffset
respectively. There are no changes to the Flambda language, although some as-yet-unpublished work by @xclerc already exists to replace the "set of closures" symbol-binding construct in the static term language with a construct similar toUlet_set_of_closures
.Some possible improvements
We could perform a simple analysis to avoid having patched pointers to closures that are never called. For example, we could elide
g
within the closure ofh
ifh
never callsg
.We could track the free variables of functions on a per-function basis rather than on a per-set-of-closures basis.
We could have the closure blocks point at a shared environment block. This would need benchmarking. It would probably necessitate ensuring that the indirection through a closure to the environment block was subject to CSE.
Reading the code
If you don't want to read the whole diff, which you probably do not, then reading the diffs of
clambda.mli
andcmmgen.ml
would be a good start; followed by the diffs ofclosure.ml
and/orflambda_to_clambda.ml
.One test case was removed from the testsuite as it no longer seemed relevant and produces a (reasonable) compile-time performance problem.