Compiling nest of mutually recursive functions exhibits nonlinear behavior #12207

fpottier · 2023-04-25T15:10:51Z

With the native code compiler, compiling a collection of N mutually recursive toplevel functions requires nonlinear time in N.

This script (gen.sh) creates such a collection:

#!/bin/bash

N="$1"
echo "let rec main() = f0()"
for (( c=0; c<=$N; c++ ))
do
  echo "and f$((c))() = f$((c+1))()"
done
echo "and f$((c))() = print_endline \"Hello\""
echo "let () = main()"

Running this script with various values of N produces the following kind of output:

$ ./gen.sh 1000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	0m0.459s
user	0m0.393s
sys	0m0.065s
Hello
$ ./gen.sh 2000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	0m1.280s
user	0m1.201s
sys	0m0.078s
Hello
$ ./gen.sh 5000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	0m7.913s
user	0m7.770s
sys	0m0.139s
Hello
$ ./gen.sh 10000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	0m35.636s
user	0m35.360s
sys	0m0.261s
Hello
$ ./gen.sh 20000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	2m54.296s
user	2m53.569s
sys	0m0.632s
Hello
$ ./gen.sh 40000 > foo.ml && time ocamlopt -o foo foo.ml && ./foo

real	11m22.849s
user	11m20.508s
sys	0m1.663s
Hello

Passing -linscan does not help; the observed times are the same.

The memory usage, in the last run, does not exceed 500MB.

The size of the generated .o file seems linear, which is good.

This issue is problematic because Menhir's code back-end generates this kind of code. A typical LR automaton for a real-world grammar can have thousands of states, and the generated code can contain thousands of mutually recursive functions.

I note that a similar example involving non-recursive functions does not exhibit this behavior: it seems to behave linearly, as desired.

The text was updated successfully, but these errors were encountered:

fpottier · 2023-04-25T15:44:01Z

PS. The bytecode compiler ocamlc also has nonlinear behavior, and the times are roughly the same!

$ ./gen.sh 10000 > foo.ml && time ocamlc -o foo foo.ml && ./foo

real	0m30.920s
user	0m30.825s
sys	0m0.068s
Hello

lthls · 2023-04-25T17:14:41Z

I've pushed a potential solution here: https://github.com/lthls/ocaml/tree/non-quadratic-closures
For now it only handles Closure, but I suspect that the bytecode compiler can be modified in the same way. Flambda may need something different: I suspect that it's the Lift_constants pass that is causing trouble (like in #7826), and the fix is not trivial.

I don't know if I'll be able to make an actual PR anytime soon, but if someone wants to make a PR out of my branch (eventually with modifications) I wouldn't mind.

fpottier · 2023-04-25T19:21:21Z

Cool, thanks for your quick reaction! I wonder if you could explain why the original code is quadratic and what change you propose? Just by looking at the diff, I cannot tell.

lthls · 2023-04-25T20:24:07Z

In a set of mutually recursive functions, occurrences of the recursive functions need to be replaced by accesses through the relevant closure. There's a quadratic number of such projections (from f_i to f_j for all i and j), and the old code computes all of these projections unconditionally (when compiling f_i, a map from f_j to the projection f_i -> f_j is stored in the environment).
My patch splits the data that the environment stores for projections into two parts: an invariant part that is computed once and shared between all the functions in the set (the entries field, which stores the absolute positions of the recursive functions and free variables from the start of the block), and the parts specific to each function (env_param, the parameter to which the closure is bound, and env_pos, the offset for the current closure from the start of the block). The actual projections are then computed on demand, whenever an occurrence of a recursive variable is found, from the information in the environment (in close_approx_var).

The bytecode compiler does something quite similar; although the environment only stores positions and not the actual expressions, the positions are still relative to the current closure so they are recomputed completely for each function. In bytegen.ml, this is the ce_rec (recursive functions) and ce_heap (free variables) fields of the environment.
Splitting that information into absolute positions from the start of the block (shared between all functions) and the offset for the current function should be reasonably easy, and restore linear complexity.

fpottier · 2023-04-25T21:56:20Z

Thanks for the explanation!

I don't know about bytecode, but in the native code compiler, I would expect no closures at all to be involved here. These are closed toplevel functions. I would expect all function calls here to be compiled down to calls to known addresses. Am I too optimistic?

lthls · 2023-04-26T07:42:57Z

All functions calls in the examples are eventually compiled into direct calls to known addresses, although the compiler still needs to generate closures (for indirect calls coming from other compilation units).
More relevant here, the code in closure.ml starts by assuming that the functions are closed, but still populates the environment with the necessary projections (the compilation will be restarted if the result is not actually closed). I think that with some clever tricks we could make the compilation work for closed functions without adding the full projections to the environment, but my proposal has the advantage that it fixes the quadratic behaviour for all cases, including non-closed functions.

xavierleroy · 2023-04-28T12:53:59Z

This is probably a duplicate of #7826 . @lthls, could you please explain your analysis of the problem and the gist of your solution? I don't feel like reconstructing them by reading your code.

gasche · 2023-04-28T13:05:01Z

I think that explanation was given in the later post #12207 (comment) ?

xavierleroy · 2023-04-28T13:29:53Z

Oh, sorry, I missed that message in the discussion.

lthls mentioned this issue May 4, 2023

Linear-time closure computation #12222

Merged

gasche assigned lthls Jun 28, 2023

gasche closed this as completed in #12222 Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling nest of mutually recursive functions exhibits nonlinear behavior #12207

Compiling nest of mutually recursive functions exhibits nonlinear behavior #12207

fpottier commented Apr 25, 2023

fpottier commented Apr 25, 2023

lthls commented Apr 25, 2023

fpottier commented Apr 25, 2023

lthls commented Apr 25, 2023

fpottier commented Apr 25, 2023

lthls commented Apr 26, 2023

xavierleroy commented Apr 28, 2023

gasche commented Apr 28, 2023

xavierleroy commented Apr 28, 2023

Compiling nest of mutually recursive functions exhibits nonlinear behavior #12207

Compiling nest of mutually recursive functions exhibits nonlinear behavior #12207

Comments

fpottier commented Apr 25, 2023

fpottier commented Apr 25, 2023

lthls commented Apr 25, 2023

fpottier commented Apr 25, 2023

lthls commented Apr 25, 2023

fpottier commented Apr 25, 2023

lthls commented Apr 26, 2023

xavierleroy commented Apr 28, 2023

gasche commented Apr 28, 2023

xavierleroy commented Apr 28, 2023