-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling nest of mutually recursive functions exhibits nonlinear behavior #12207
Comments
PS. The bytecode compiler
|
I've pushed a potential solution here: https://github.com/lthls/ocaml/tree/non-quadratic-closures I don't know if I'll be able to make an actual PR anytime soon, but if someone wants to make a PR out of my branch (eventually with modifications) I wouldn't mind. |
Cool, thanks for your quick reaction! I wonder if you could explain why the original code is quadratic and what change you propose? Just by looking at the diff, I cannot tell. |
In a set of mutually recursive functions, occurrences of the recursive functions need to be replaced by accesses through the relevant closure. There's a quadratic number of such projections (from The bytecode compiler does something quite similar; although the environment only stores positions and not the actual expressions, the positions are still relative to the current closure so they are recomputed completely for each function. In |
Thanks for the explanation! I don't know about bytecode, but in the native code compiler, I would expect no closures at all to be involved here. These are closed toplevel functions. I would expect all function calls here to be compiled down to calls to known addresses. Am I too optimistic? |
All functions calls in the examples are eventually compiled into direct calls to known addresses, although the compiler still needs to generate closures (for indirect calls coming from other compilation units). |
I think that explanation was given in the later post #12207 (comment) ? |
Oh, sorry, I missed that message in the discussion. |
With the native code compiler, compiling a collection of
N
mutually recursive toplevel functions requires nonlinear time inN
.This script (
gen.sh
) creates such a collection:Running this script with various values of
N
produces the following kind of output:Passing
-linscan
does not help; the observed times are the same.The memory usage, in the last run, does not exceed 500MB.
The size of the generated
.o
file seems linear, which is good.This issue is problematic because Menhir's code back-end generates this kind of code. A typical LR automaton for a real-world grammar can have thousands of states, and the generated code can contain thousands of mutually recursive functions.
I note that a similar example involving non-recursive functions does not exhibit this behavior: it seems to behave linearly, as desired.
The text was updated successfully, but these errors were encountered: