Linear-time closure computation #12222

lthls · 2023-05-04T15:31:50Z

This PR adapts the algorithms for closure conversion in the bytecode and native compilers so that their complexity stays (pseudo-)linear in the number of mutually-recursive functions that are being compiled.

In essence, the compilation environments used to store information about the other functions (and free variables) in relative form: the environment for f_i would store the relative offsets from f_i to all other functions f_j.
There is no obvious sharing for these environments, so in practice each individual environment takes a linear time to build and there is a linear number of them, so we get quadratic complexity.

With my patch, environments now store a shared part that is the absolute offsets from the start of the block for each function, and a relative part that is just the offset for the current function (and the environment parameter in native mode). This means that we take a linear time computing the shared part, then each function takes a constant time building its relative part, so the overall time stays linear.

Fixes #12207.

Note that Flambda is likely still quadratic on sets of recursive functions (at least at toplevel), but not for the same reasons. Instead, an issue with the code that transforms constant expressions (including closed functions) into statically-allocated data makes these examples quadratic. This problem also occurs on nested functions, and is tracked in issue #7826.

lthls · 2023-05-04T17:32:59Z

I'm not completely sure the bootstrap is necessary, but bytecode compilation environments can end up in the cmo files through debug events so it felt safer to do the bootstrap.

Ekdohibs

The code looks good, and I think this fixes the quadratic behaviour that is seen. I was wondering if there was some way to share the code between bytecode and closure as parts of it look quite similar, but it seems like they are just different enough to make sharing them not be worth it.

gasche

Approved on @Ekdohibs' behalf.

gasche · 2023-06-28T13:52:34Z

Changes

@@ -27,6 +27,10 @@ Working version

 ### Bug fixes:

+- #12207, #12222: Make closure computation linear in the number of recursive
+  functions instead of quadratic
+  (Vincent Laviron, report by François Pottier, review by ????)


@lthls, can you add @Ekdohibs as reviewer here?

lthls · 2023-06-28T14:18:20Z

I've rebased the PR and removed the bootstrap. I think it would still be useful to add a bootstrap after this PR (otherwise people using ocamldebug on the compiler itself might get some weird errors), but I suspect it is better not to have the bootstrap in the PR itself.

gasche · 2023-06-30T08:22:48Z

(In the end I am planning to have a look at this PR today, so I removed the merge-me flag -- but I fully expect to merge.)

gasche · 2023-07-01T20:38:20Z

bytecomp/bytegen.ml

+  | Multiple_recursive of Ident.t list
+
+let closure_entries fun_defs fvs =
+  let rec add_positions entries pos_to_entry pos delta = function


pos and delta could be labelled parameters to make callsites easier to read.

gasche · 2023-07-01T20:41:09Z

bytecomp/bytegen.ml

@@ -1124,8 +1162,10 @@ let comp_function tc cont =
    | id :: rem -> Ident.add id pos (positions (pos + delta) delta rem) in
  let env =
    { ce_stack = positions arity (-1) tc.params;


Note: the positions function here is add_positions Ident.empty Fun.id. I wonder if you could make add_positions a more global function to be able to use it here and reduce the total amount of code.

gasche · 2023-07-01T20:54:40Z

middle_end/closure/closure.ml

+          match V.Map.find id entries with
+          | Free_variable fv_pos ->
+            Uprim(P.Pfield(fv_pos - env_pos, Pointer, Immutable),
+                  [Uvar env_param], Debuginfo.none)


Before this code fragment would be computed once per function and free variable, and shared between all occurrences of the free variable. Now it is regenerated afresh on each occurrence. I wonder if this could lead to a noticeable loss of sharing / increase in memory usage.

I can think of several ways of sharing these accessor code fragments, including some that would let you share the code of different free variables with the same fv_pos - env_pos relative offset. I am not sure which one would be simple enough.

I've thought about preserving sharing, but in the end I estimated that it wasn't worth it (keep in mind that even before this PR, the sharing would be lost when we translate to Cmm).

You could try to construct a worst case example and see the impact it has (something like let y = Sys.opaque_identity 0 in let f () = y + y + ... + y). I would expect that even there it's only a small amount of extra memory for the whole compiler.

I've tried an example with a free variable occurring 2000 times, and the difference between trunk and this branch is lost in the noise (both in native and bytecode).

the sharing would be lost when we translate to Cmm

Duh, of course, I had missed this.

gasche · 2023-07-02T03:46:40Z

The linux-debug github CI configuration fails the finaliser_handover.opt test with

[03] file runtime/domain.c; line 1746 ### Assertion failed: caml_gc_phase == Phase_sweep_and_mark_main

This is unrelated to the present PR and tracked in #12345.

fpottier · 2023-07-03T19:21:39Z

Sounds cool! I am looking forward to trying this out.

Ekdohibs approved these changes May 30, 2023

View reviewed changes

gasche approved these changes Jun 28, 2023

View reviewed changes

lthls force-pushed the non-quadratic-closures branch from a9fdf57 to 37fd35b Compare June 28, 2023 14:13

gasche added merge-me and removed merge-me labels Jun 28, 2023

gasche reviewed Jul 1, 2023

View reviewed changes

gasche added the flaky-ci-failure label Jul 2, 2023

gasche mentioned this pull request Jul 1, 2023

tests/weak-ephe-final/finaliser_handover is flaky #12345

Closed

lthls added 2 commits July 2, 2023 05:51

Closure: linear computation of closure environments

7ea7f35

Bytecode: linear computation of closure environments

657a5b8

gasche force-pushed the non-quadratic-closures branch from 2ba051f to 657a5b8 Compare July 2, 2023 03:52

gasche added the merge-me label Jul 2, 2023

gasche merged commit 2e5df7c into ocaml:trunk Jul 3, 2023
10 of 11 checks passed

jonahbeckford mentioned this pull request May 28, 2024

Backport linear closures bugfix #13204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear-time closure computation #12222

Linear-time closure computation #12222

lthls commented May 4, 2023

lthls commented May 4, 2023

Ekdohibs left a comment

gasche left a comment

gasche Jun 28, 2023

lthls Jun 28, 2023

lthls commented Jun 28, 2023

gasche commented Jun 30, 2023

gasche Jul 1, 2023

gasche Jul 1, 2023

gasche Jul 1, 2023

lthls Jul 1, 2023

lthls Jul 1, 2023

gasche Jul 2, 2023

gasche commented Jul 2, 2023

fpottier commented Jul 3, 2023

Linear-time closure computation #12222

Linear-time closure computation #12222

Conversation

lthls commented May 4, 2023

lthls commented May 4, 2023

Ekdohibs left a comment

Choose a reason for hiding this comment

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lthls commented Jun 28, 2023

gasche commented Jun 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gasche commented Jul 2, 2023

fpottier commented Jul 3, 2023