Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear-time closure computation #12222

Merged
merged 2 commits into from
Jul 3, 2023
Merged

Conversation

lthls
Copy link
Contributor

@lthls lthls commented May 4, 2023

This PR adapts the algorithms for closure conversion in the bytecode and native compilers so that their complexity stays (pseudo-)linear in the number of mutually-recursive functions that are being compiled.

In essence, the compilation environments used to store information about the other functions (and free variables) in relative form: the environment for f_i would store the relative offsets from f_i to all other functions f_j.
There is no obvious sharing for these environments, so in practice each individual environment takes a linear time to build and there is a linear number of them, so we get quadratic complexity.

With my patch, environments now store a shared part that is the absolute offsets from the start of the block for each function, and a relative part that is just the offset for the current function (and the environment parameter in native mode). This means that we take a linear time computing the shared part, then each function takes a constant time building its relative part, so the overall time stays linear.

Fixes #12207.

Note that Flambda is likely still quadratic on sets of recursive functions (at least at toplevel), but not for the same reasons. Instead, an issue with the code that transforms constant expressions (including closed functions) into statically-allocated data makes these examples quadratic. This problem also occurs on nested functions, and is tracked in issue #7826.

@lthls
Copy link
Contributor Author

lthls commented May 4, 2023

I'm not completely sure the bootstrap is necessary, but bytecode compilation environments can end up in the cmo files through debug events so it felt safer to do the bootstrap.

Copy link
Contributor

@Ekdohibs Ekdohibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, and I think this fixes the quadratic behaviour that is seen. I was wondering if there was some way to share the code between bytecode and closure as parts of it look quite similar, but it seems like they are just different enough to make sharing them not be worth it.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved on @Ekdohibs' behalf.

Changes Outdated
@@ -27,6 +27,10 @@ Working version

### Bug fixes:

- #12207, #12222: Make closure computation linear in the number of recursive
functions instead of quadratic
(Vincent Laviron, report by François Pottier, review by ????)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lthls, can you add @Ekdohibs as reviewer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@lthls
Copy link
Contributor Author

lthls commented Jun 28, 2023

I've rebased the PR and removed the bootstrap. I think it would still be useful to add a bootstrap after this PR (otherwise people using ocamldebug on the compiler itself might get some weird errors), but I suspect it is better not to have the bootstrap in the PR itself.

@gasche gasche added merge-me and removed merge-me labels Jun 28, 2023
@gasche
Copy link
Member

gasche commented Jun 30, 2023

(In the end I am planning to have a look at this PR today, so I removed the merge-me flag -- but I fully expect to merge.)

| Multiple_recursive of Ident.t list

let closure_entries fun_defs fvs =
let rec add_positions entries pos_to_entry pos delta = function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pos and delta could be labelled parameters to make callsites easier to read.

@@ -1124,8 +1162,10 @@ let comp_function tc cont =
| id :: rem -> Ident.add id pos (positions (pos + delta) delta rem) in
let env =
{ ce_stack = positions arity (-1) tc.params;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the positions function here is add_positions Ident.empty Fun.id. I wonder if you could make add_positions a more global function to be able to use it here and reduce the total amount of code.

match V.Map.find id entries with
| Free_variable fv_pos ->
Uprim(P.Pfield(fv_pos - env_pos, Pointer, Immutable),
[Uvar env_param], Debuginfo.none)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this code fragment would be computed once per function and free variable, and shared between all occurrences of the free variable. Now it is regenerated afresh on each occurrence. I wonder if this could lead to a noticeable loss of sharing / increase in memory usage.

I can think of several ways of sharing these accessor code fragments, including some that would let you share the code of different free variables with the same fv_pos - env_pos relative offset. I am not sure which one would be simple enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about preserving sharing, but in the end I estimated that it wasn't worth it (keep in mind that even before this PR, the sharing would be lost when we translate to Cmm).

You could try to construct a worst case example and see the impact it has (something like let y = Sys.opaque_identity 0 in let f () = y + y + ... + y). I would expect that even there it's only a small amount of extra memory for the whole compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried an example with a free variable occurring 2000 times, and the difference between trunk and this branch is lost in the noise (both in native and bytecode).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sharing would be lost when we translate to Cmm

Duh, of course, I had missed this.

@gasche
Copy link
Member

gasche commented Jul 2, 2023

The linux-debug github CI configuration fails the finaliser_handover.opt test with

[03] file runtime/domain.c; line 1746 ### Assertion failed: caml_gc_phase == Phase_sweep_and_mark_main

This is unrelated to the present PR and tracked in #12345.

@gasche gasche added the merge-me label Jul 2, 2023
@gasche gasche merged commit 2e5df7c into ocaml:trunk Jul 3, 2023
10 of 11 checks passed
@fpottier
Copy link
Contributor

fpottier commented Jul 3, 2023

Sounds cool! I am looking forward to trying this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compiling nest of mutually recursive functions exhibits nonlinear behavior
4 participants