Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower compilation 4.07 => 4.08 #8776

Open
alainfrisch opened this issue Jun 28, 2019 · 12 comments

Comments

@alainfrisch
Copy link
Contributor

commented Jun 28, 2019

This is to document one case of significant degradation in compilation time between 4.07 and 4.08, on a synthetic example (originally created to stress-tess #8774). In case someone wants to investigate further...

Results:

  • 4.07: 10s
  • 4.08, trunk: 19s

Reproduction:

After a make world, run:

 boot/ocamlrun ./ocaml -I stdlib gen.ml > bar.ml && time boot/ocamlrun ./ocamlc -I stdlib -c bar.ml 

with gen.ml:

let n1 = 200
let n2 = 200
let n3 = 40

let () =
  let f () =
    for j = 1 to n2 do
      Printf.printf "method f%i = ()\n" j
    done
  in

  print_endline "type t = A: 'a -> t";
  for i = 1 to n3 do
    Printf.printf "let a%i = object\n" i;
    f ();
    Printf.printf "end\n"
  done;
  print_endline "let f x =";
  for i = 1 to n1 do
    print_endline "match x with A _ ->";
  done;
  for i = 1 to n3 do
    Printf.printf "a%i," i;
  done;
  print_endline "()"
@nojb

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

-dtimings output shows a big increase in the lambda generation phase:
4.07.1:

8.581s bar.ml
  0.084s parsing
    0.084s parser
  8.141s typing
  0.196s transl
  0.157s generate
  0.003s other
0.006s other

4.08.0:

16.359s bar.ml
  00.102s parsing
    00.102s parser
  08.271s typing
  07.814s transl
  00.172s generate
00.012s other
@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

I think at least some of this might be coming from the calls to Typeopt.value_kind (and hence Subst.typexp) introduced in simplify_cases in matching.ml in #2156.

@nojb

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

Am not sure, if I didn't make a mistake, the same slowdown (worse in fact) can be obtained without any pattern matching at all:

let n2 = 200
let n3 = 40

let () =
  let f () =
    for j = 1 to n2 do
      Printf.printf "method f%i = ()\n" j
    done
  in
  for i = 1 to n3 do
    Printf.printf "let a%i = object\n" i;
    f ();
    Printf.printf "end\n"
  done
@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

@nojb I see the same degradation in the same function with your second example (indeed it seems worse too).

@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

(gdb) inf break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000007e55c0 bytecomp/matching.ml:593
	breakpoint already hit 8000 times
        c
@lpw25

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

Looks like Typeopt.scrape_ty is calling Ctype.correct_levels which means it is making a completely unnecessary copy of the type. It should be adjusted to only do that when there is actually something to expand.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

It should be adjusted to only do that when there is actually something to expand.

Or it could just not do the copy: it's only there to update the level so the expansion won't fail. But we could also have a version of the expansion that doesn't check the levels (we're past typechecking at this point, so all this doesn't matter anymore)

@alainfrisch

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2019

I confirm that not calling correct_levels removes a good part of the observed slowdown. The call to correct_levels was there already in the first version of typeopt.ml (ea8fe59#diff-5ad81800134284c48d42f76b958990b3), but it is called much more often now, so it is worth optimizing it.

My understanding is that, if we don't introduce a variant of the expansion that ignores levels, not calling correct_levels can only miss some optimizations (but not fail); is that right?

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

My understanding is that, if we don't introduce a variant of the expansion that ignores levels, not calling correct_levels can only miss some optimizations (but not fail); is that right?

That sounds right yes.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

not calling correct_levels can only miss some optimizations (but not fail); is that right?

That's right, but when an optimisation was missed would be very unpredictable and depend on things like if -principal was used. So it'd be better to try and get this right. I think only calling correct_levels and expand_head_opt when the type_desc is actually Tconstr would probably be both correct and quick in most cases.

@kit-ty-kate

This comment has been minimized.

Copy link
Contributor

commented Jul 24, 2019

Any news on this issue before the release of 4.08.1?

@gasche

This comment has been minimized.

Copy link
Member

commented Jul 24, 2019

To my best knowledge the answer is "no". (But I would be surprised if that was the cause of the +164% increase reported #8811, which we also don't know.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.