Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling nest of mutually recursive functions exhibits nonlinear behavior #12207

Closed
fpottier opened this issue Apr 25, 2023 · 9 comments · Fixed by #12222
Closed

Compiling nest of mutually recursive functions exhibits nonlinear behavior #12207

fpottier opened this issue Apr 25, 2023 · 9 comments · Fixed by #12222
Assignees

Comments

@fpottier
Copy link
Contributor

With the native code compiler, compiling a collection of N mutually recursive toplevel functions requires nonlinear time in N.

This script (gen.sh) creates such a collection:

#!/bin/bash

N="$1"
echo "let rec main() = f0()"
for (( c=0; c<=$N; c++ ))
do
  echo "and f$((c))() = f$((c+1))()"
done
echo "and f$((c))() = print_endline \"Hello\""
echo "let () = main()"

Running this script with various values of N produces the following kind of output:

$ ./gen.sh 1000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	0m0.459s
user	0m0.393s
sys	0m0.065s
Hello
$ ./gen.sh 2000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	0m1.280s
user	0m1.201s
sys	0m0.078s
Hello
$ ./gen.sh 5000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	0m7.913s
user	0m7.770s
sys	0m0.139s
Hello
$ ./gen.sh 10000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	0m35.636s
user	0m35.360s
sys	0m0.261s
Hello
$ ./gen.sh 20000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	2m54.296s
user	2m53.569s
sys	0m0.632s
Hello
$ ./gen.sh 40000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	11m22.849s
user	11m20.508s
sys	0m1.663s
Hello

Passing -linscan does not help; the observed times are the same.

The memory usage, in the last run, does not exceed 500MB.

The size of the generated .o file seems linear, which is good.

This issue is problematic because Menhir's code back-end generates this kind of code. A typical LR automaton for a real-world grammar can have thousands of states, and the generated code can contain thousands of mutually recursive functions.

I note that a similar example involving non-recursive functions does not exhibit this behavior: it seems to behave linearly, as desired.

@fpottier
Copy link
Contributor Author

PS. The bytecode compiler ocamlc also has nonlinear behavior, and the times are roughly the same!

$ ./gen.sh 10000 > foo.ml && time ocamlc -o foo foo.ml && ./foo

real	0m30.920s
user	0m30.825s
sys	0m0.068s
Hello

@lthls
Copy link
Contributor

lthls commented Apr 25, 2023

I've pushed a potential solution here: https://github.com/lthls/ocaml/tree/non-quadratic-closures
For now it only handles Closure, but I suspect that the bytecode compiler can be modified in the same way. Flambda may need something different: I suspect that it's the Lift_constants pass that is causing trouble (like in #7826), and the fix is not trivial.

I don't know if I'll be able to make an actual PR anytime soon, but if someone wants to make a PR out of my branch (eventually with modifications) I wouldn't mind.

@fpottier
Copy link
Contributor Author

Cool, thanks for your quick reaction! I wonder if you could explain why the original code is quadratic and what change you propose? Just by looking at the diff, I cannot tell.

@lthls
Copy link
Contributor

lthls commented Apr 25, 2023

In a set of mutually recursive functions, occurrences of the recursive functions need to be replaced by accesses through the relevant closure. There's a quadratic number of such projections (from f_i to f_j for all i and j), and the old code computes all of these projections unconditionally (when compiling f_i, a map from f_j to the projection f_i -> f_j is stored in the environment).
My patch splits the data that the environment stores for projections into two parts: an invariant part that is computed once and shared between all the functions in the set (the entries field, which stores the absolute positions of the recursive functions and free variables from the start of the block), and the parts specific to each function (env_param, the parameter to which the closure is bound, and env_pos, the offset for the current closure from the start of the block). The actual projections are then computed on demand, whenever an occurrence of a recursive variable is found, from the information in the environment (in close_approx_var).

The bytecode compiler does something quite similar; although the environment only stores positions and not the actual expressions, the positions are still relative to the current closure so they are recomputed completely for each function. In bytegen.ml, this is the ce_rec (recursive functions) and ce_heap (free variables) fields of the environment.
Splitting that information into absolute positions from the start of the block (shared between all functions) and the offset for the current function should be reasonably easy, and restore linear complexity.

@fpottier
Copy link
Contributor Author

Thanks for the explanation!

I don't know about bytecode, but in the native code compiler, I would expect no closures at all to be involved here. These are closed toplevel functions. I would expect all function calls here to be compiled down to calls to known addresses. Am I too optimistic?

@lthls
Copy link
Contributor

lthls commented Apr 26, 2023

All functions calls in the examples are eventually compiled into direct calls to known addresses, although the compiler still needs to generate closures (for indirect calls coming from other compilation units).
More relevant here, the code in closure.ml starts by assuming that the functions are closed, but still populates the environment with the necessary projections (the compilation will be restarted if the result is not actually closed). I think that with some clever tricks we could make the compilation work for closed functions without adding the full projections to the environment, but my proposal has the advantage that it fixes the quadratic behaviour for all cases, including non-closed functions.

@xavierleroy
Copy link
Contributor

This is probably a duplicate of #7826 . @lthls, could you please explain your analysis of the problem and the gist of your solution? I don't feel like reconstructing them by reading your code.

@gasche
Copy link
Member

gasche commented Apr 28, 2023

I think that explanation was given in the later post #12207 (comment) ?

@xavierleroy
Copy link
Contributor

Oh, sorry, I missed that message in the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants