-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear-time closure computation #12222
Conversation
I'm not completely sure the bootstrap is necessary, but bytecode compilation environments can end up in the cmo files through debug events so it felt safer to do the bootstrap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, and I think this fixes the quadratic behaviour that is seen. I was wondering if there was some way to share the code between bytecode and closure as parts of it look quite similar, but it seems like they are just different enough to make sharing them not be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved on @Ekdohibs' behalf.
Changes
Outdated
@@ -27,6 +27,10 @@ Working version | |||
|
|||
### Bug fixes: | |||
|
|||
- #12207, #12222: Make closure computation linear in the number of recursive | |||
functions instead of quadratic | |||
(Vincent Laviron, report by François Pottier, review by ????) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
a9fdf57
to
37fd35b
Compare
I've rebased the PR and removed the bootstrap. I think it would still be useful to add a bootstrap after this PR (otherwise people using |
(In the end I am planning to have a look at this PR today, so I removed the merge-me flag -- but I fully expect to merge.) |
bytecomp/bytegen.ml
Outdated
| Multiple_recursive of Ident.t list | ||
|
||
let closure_entries fun_defs fvs = | ||
let rec add_positions entries pos_to_entry pos delta = function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pos
and delta
could be labelled parameters to make callsites easier to read.
bytecomp/bytegen.ml
Outdated
@@ -1124,8 +1162,10 @@ let comp_function tc cont = | |||
| id :: rem -> Ident.add id pos (positions (pos + delta) delta rem) in | |||
let env = | |||
{ ce_stack = positions arity (-1) tc.params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the positions
function here is add_positions Ident.empty Fun.id
. I wonder if you could make add_positions
a more global function to be able to use it here and reduce the total amount of code.
match V.Map.find id entries with | ||
| Free_variable fv_pos -> | ||
Uprim(P.Pfield(fv_pos - env_pos, Pointer, Immutable), | ||
[Uvar env_param], Debuginfo.none) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this code fragment would be computed once per function and free variable, and shared between all occurrences of the free variable. Now it is regenerated afresh on each occurrence. I wonder if this could lead to a noticeable loss of sharing / increase in memory usage.
I can think of several ways of sharing these accessor code fragments, including some that would let you share the code of different free variables with the same fv_pos - env_pos
relative offset. I am not sure which one would be simple enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about preserving sharing, but in the end I estimated that it wasn't worth it (keep in mind that even before this PR, the sharing would be lost when we translate to Cmm).
You could try to construct a worst case example and see the impact it has (something like let y = Sys.opaque_identity 0 in let f () = y + y + ... + y
). I would expect that even there it's only a small amount of extra memory for the whole compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried an example with a free variable occurring 2000 times, and the difference between trunk and this branch is lost in the noise (both in native and bytecode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sharing would be lost when we translate to Cmm
Duh, of course, I had missed this.
The linux-debug github CI configuration fails the
This is unrelated to the present PR and tracked in #12345. |
2ba051f
to
657a5b8
Compare
Sounds cool! I am looking forward to trying this out. |
This PR adapts the algorithms for closure conversion in the bytecode and native compilers so that their complexity stays (pseudo-)linear in the number of mutually-recursive functions that are being compiled.
In essence, the compilation environments used to store information about the other functions (and free variables) in relative form: the environment for
f_i
would store the relative offsets fromf_i
to all other functionsf_j
.There is no obvious sharing for these environments, so in practice each individual environment takes a linear time to build and there is a linear number of them, so we get quadratic complexity.
With my patch, environments now store a shared part that is the absolute offsets from the start of the block for each function, and a relative part that is just the offset for the current function (and the environment parameter in native mode). This means that we take a linear time computing the shared part, then each function takes a constant time building its relative part, so the overall time stays linear.
Fixes #12207.
Note that Flambda is likely still quadratic on sets of recursive functions (at least at toplevel), but not for the same reasons. Instead, an issue with the code that transforms constant expressions (including closed functions) into statically-allocated data makes these examples quadratic. This problem also occurs on nested functions, and is tracked in issue #7826.