Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating .cmt files takes a long time, in case of type error #5809

Closed
vicuna opened this Issue Nov 5, 2012 · 8 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link
Collaborator

vicuna commented Nov 5, 2012

Original bug ID: 5809
Reporter: @alainfrisch
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2015-12-11T18:08:22Z)
Resolution: fixed
Priority: normal
Severity: minor
Fixed in version: 4.01.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5814
Monitored by: jmeber

Bug description

When a unit does not type-check, a .cmt file is generated, with one entry per valid sub-node of the typed AST.

I've observed a case where compiling a (quite large, but hand-written) module takes a lot of time in case of type error: about 28 seconds, mostly to generate a 8.8 Mb .cmt file. By setting OCAML_BINANNOT_WITHENV, we ask the compiler not to remove environment for AST node; interestingly, the whole compilation now takes only 0.81 seconds and generates a 2.7 Mb .cmt file (i.e. smaller than the original one, even though environments are not discarded!). Also, I've tried to remove the hash consing in Cmt_format.keep_only_summary. As expected, the size of the .cmt file increases (to 14 Mb) and compilation takes 1.19 seconds.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 5, 2012

Comment author: @alainfrisch

I've also tried replacing hash consing with the following simpler strategy, which avoids repeated successive calls to Env.keep_only_summary on identical environment (compared with physical equality):

===================================================================
--- typing/cmt_format.ml	(revision 50428)
+++ typing/cmt_format.ml	(working copy)
@@ -76,17 +76,19 @@
   try ignore (Sys.getenv "OCAML_BINANNOT_WITHENV"); false
   with Not_found -> true
 
-(* Re-introduce sharing after clearing environments *)
+let last_env = ref None
+
 let env_hcons = Hashtbl.create 133
 let keep_only_summary env =
-  let new_env = Env.keep_only_summary env in
-  try
-    Hashtbl.find env_hcons new_env
-  with Not_found ->
-    Hashtbl.add env_hcons new_env new_env;
-    new_env
-let clear_env_hcons () = Hashtbl.clear env_hcons
+  match !last_env with
+  | Some (e0, e1) when e0 == env -> e1
+  | _ ->
+      let new_env = Env.keep_only_summary env in
+      last_env := Some (env, new_env);
+      new_env
 
+let clear_env_hcons () = last_env := None
+
 module ClearEnv  = TypedtreeMap.MakeMap (struct
   open TypedtreeMap
   include DefaultMapArgument
===================================================================

This gives a 8.9 Mb .cmt file, generated in 1.16 seconds.

I'm still confused as to why setting OCAML_BINANNOT_WITHENV generates a much smaller .cmt file.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 6, 2012

Comment author: @alainfrisch

The explanation for the increase of size of the .cmt file is actually quite simple: in "error mode", the cmt file lists all type-checked nodes; mapping over them to simplify the environment loses all the sharing between them. I can see two simple work-arounds:

  • Do not export the children of an exported node. (When a node is to be added to the list of exported nodes, this list is first back-tracked to its content before the node was considered.)

  • In error mode, never discard the environment.

  • Keep sharing when mapping the AST to simplify the environment. (Likely to reduce the size of .cmt file but increase even more the compilation time.)

One should also consider displaying the error message before creating the .cmt file.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 6, 2012

Comment author: @alainfrisch

A concrete reproduction case with typing/typecore.ml from the OCaml compiler, compiled with ocamlc:

0.54s without -bin-annot
0.98s with -bin-annot -> 1.1 Mb .cmt file
0.54s with -bin-annot and OCAML_BINANNOT_WITHENV=1 -> 1.4 Mb .cmt file
0.57s with -bin-annot and the "simpler hash-consing" patch -> 1.1 Mb

Now, add a type error at the end of the file:

0.37s without -bin-annot
7.82s with -bin-annot -> 4.6 Mb .cmt file
0.39s with -bin-annot and OCAML_BINANNOT_WITHENV=1 -> 1.5 Mb .cmt file
0.54s with -bin-annot and the "simpler hash-consing" patch -> 4.8 Mb

What about always keeping the environment in the cmt file? This is much faster, and does not seem to produce huge cmt files.

If we insist on discarding the environment, I'd suggest to adopt a more efficient way to implement that, maybe with my "simpler hash-consing" patch (less sharing, slight increase of size of .cmt files, but more more efficient) and/or by making the env fields
in nodes mutable (so that discarding the environment can be done in place without loosing sharing between nodes in the error case),
or even by making the fields of the env record themselves mutable (so that all sharing can be kept without a costly hash-table).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 6, 2012

Comment author: @alainfrisch

Keeping environments would also solve #5814.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 8, 2012

Comment author: @alainfrisch

Commit 13078: remove hash-consing of environments, replaced by a much cheaper one-slot memoization.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 8, 2012

Comment author: @alainfrisch

Commit 13080: do not keep both an expression and its sub-expressions in a partial .cmt file (in case of type error).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 8, 2012

Comment author: @alainfrisch

Commit 13081: do not store a structure item together with sub-components in partial cmt files.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 8, 2012

Comment author: @alainfrisch

New timings for compiling typing/typecore.ml with ocamlc (not ocamlc.opt):

0.54s without -bin-annot
0.56s with -bin-annot -> 1162282 bytes for the .cmt file

With a type error inserted at the end of the module:

0.35s without -bin-annot
0.36s with -bin-annot -> 1170623 bytes for the .cmt file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.