Towards a new closure representation (native code) #8984

mshinwell · 2019-09-26T09:22:46Z

Background

There have been various discussions recently on the subject of the layout of closures. The current layout of these values means that they cannot be traversed by the GC without the use of a page table (or hacks such as in #1156). They also rely on the use of Infix_tag, the only mechanism that permits pointers into the middle of heap blocks.

We can gain benefits by removing reliance on a page table. It produces a runtime performance improvement, typically of a few percent. It also fits in with the desire of the Multicore OCaml devs not to have a page table at all, stemming from the fact that the maintenance of such a table in a parallel environment is more complicated.

Removing reliance on Infix_tag opens the possibility of significant simplification to the GC code. The presence of infix pointers has apparently caused various bugs in the development versions of Multicore OCaml.

This patch

This patch changes the layout of closure values used by the native code compilers on both Closure and Flambda paths. @kayceesrk is working on another patch to do the same for the bytecode compiler. After two such patches, we would be in a position to remove Infix_tag and all associated code.

What I present here is code that should pass the compiler's testsuite, but has not been validated at scale, nor subject to benchmarking. I think some discussion is in order before proceeding any further.

Proposed layout

I have chosen a simple layout. As an example assume variables f, g and h binding mutually-defined closures (at the OCaml source level, mutually-recursive functions) with one, two and three arguments respectively. We say that f, g and h form a "set of closures". The code pointer of f is f_code (respectively for g and h); the union of the free variables of the functions (apart from variables f, g and h) are fv_0 through fv_n. The in-memory layout, of values with tag Closure_tag, is:

    -------------------------------------------------------
f = | f_code      | arity = 1 | g | h | fv_0 | ... | fv_n |
    -------------------------------------------------------

    ----------------------------------------------------------------
g = | caml_curry2 | arity = 2 | g_code | f | h | fv_0 | ... | fv_n |
    ----------------------------------------------------------------

    ----------------------------------------------------------------
h = | caml_curry3 | arity = 3 | h_code | f | g | fv_0 | ... | fv_n |
    ----------------------------------------------------------------

Statically-allocated closures, which have no free variables, do not require any runtime patching. Dynamically-allocated closures are allocated with placeholders in the closure slots (here pointing to f, g and h) and then immediately patched using caml_modify to tie the knots. There are no placeholders allocated that point at the same closure (so for example in f, there is no slot that itself points at f). Environments are explicitly de-shared.

IR changes

The Clambda language has been modified to accomodate the new representation. We have drawn on experience from Flambda 2.0, which has taught us that a good way of handling the binding of multiple mutually-recursive functions is to have a binding construct in the IR that binds multiple names at once, eliminating in particular any notion of a "variable pointing at a set of closures". As such we provide:

Ulet_set_of_closures, which produces a mutually-recursive binding of closures to variables, around a body in the style of a normal let-expression;
Uselect_closure, which allows access via the given closure to others in the same set of closures. (For example, above, given f we could get to g and h). This is equivalent to the Move_within_set_of_closures construct in Flambda (which in Flambda 2.0 is called Select_closure).

Closures within a set are referenced by integer indices, starting at zero, matching up with the order in which the function declarations are found in the appropriate maps during compilation. All offset calculations are done statically at compile time.

These constructs supercede Uclosure and Uoffset respectively. There are no changes to the Flambda language, although some as-yet-unpublished work by @xclerc already exists to replace the "set of closures" symbol-binding construct in the static term language with a construct similar to Ulet_set_of_closures.

Some possible improvements

We could perform a simple analysis to avoid having patched pointers to closures that are never called. For example, we could elide g within the closure of h if h never calls g.
We could track the free variables of functions on a per-function basis rather than on a per-set-of-closures basis.
We could have the closure blocks point at a shared environment block. This would need benchmarking. It would probably necessitate ensuring that the indirection through a closure to the environment block was subject to CSE.

Reading the code

If you don't want to read the whole diff, which you probably do not, then reading the diffs of clambda.mli and cmmgen.ml would be a good start; followed by the diffs of closure.ml and/or flambda_to_clambda.ml.

One test case was removed from the testsuite as it no longer seemed relevant and produces a (reasonable) compile-time performance problem.

jordwalke · 2019-10-13T09:55:23Z

Just curious - does this make the native representation of closures more similar or less similar to that of bytecode's form?

sabine · 2020-05-16T09:42:10Z

What's the status on this?

From the point of view of compiling to WebAssembly, getting rid of infix pointers does make things simpler (no matter whether we ship a GC on WebAssembly local memory or whether we use the WebAssembly GC feature).

mshinwell · 2021-03-08T08:54:23Z

New closure representation merged from a different PR for 4.13.

mshinwell added the work-in-progress label Sep 26, 2019

mshinwell force-pushed the closure_rep3 branch from b84ef3a to 2cc61c9 Compare September 26, 2019 09:25

New closure representation without Infix_tag

7fd8728

mshinwell force-pushed the closure_rep3 branch from 8b0b202 to 7fd8728 Compare September 26, 2019 12:37

chambart mentioned this pull request Oct 11, 2019

Removing infix tag, (Bytecode part) #9035

Open

lthls mentioned this pull request Jan 20, 2020

remove unused clos_vars from Uconst_closure and Const_closure #9223

Closed

xavierleroy mentioned this pull request May 29, 2020

A self-describing representation for function closures #9619

Merged

ppedrot mentioned this pull request Aug 20, 2020

Memory corruption while reducing fixpoints in vm_compute coq/coq#12869

Closed

mshinwell closed this Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Towards a new closure representation (native code) #8984

Towards a new closure representation (native code) #8984

mshinwell commented Sep 26, 2019

jordwalke commented Oct 13, 2019

sabine commented May 16, 2020

mshinwell commented Mar 8, 2021

Towards a new closure representation (native code) #8984

Towards a new closure representation (native code) #8984

Conversation

mshinwell commented Sep 26, 2019

Background

This patch

Proposed layout

IR changes

Some possible improvements

Reading the code

jordwalke commented Oct 13, 2019

sabine commented May 16, 2020

mshinwell commented Mar 8, 2021