Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong calls to Env.normalize_path on non-module paths #2131

Merged
merged 6 commits into from
Nov 22, 2018

Conversation

alainfrisch
Copy link
Contributor

Env.normalize_path is supposed to be applied on module paths; in particular, it may call find_module, which of course does not make sense on other kinds of paths, and is not completely cheap. This PR renames this function to normalize_module_path to make its intended use more explicit, adds normalize_type_path (which takes care of the encoding in paths introduced by inline records), and fixes several wrong calls to normalize_path to use either normalize_type_path or normalize_path_prefix (all other kinds of paths). There is also a new fast shortcut for persistent modules (i.e. compilation units) which cannot be aliases. (@lpw25 already confirmed to me that some of these calls were indeed wrong).

This results in a noticeable speedup of the typechecker (about 10% when compiling typing/typecore.ml with ocamlc).

I suspect that normalize_path still accounts for a non-negligible fraction of type-checking time and would be worth looking at more closely.

(Another source of improvement would be to avoid always calling twice expand_head_unif in Ctype.unify2 on each type. This is by tracking initially this one that I realized that normalize_path and find_module were called on "wrong" paths.)

@garrigue
Copy link
Contributor

garrigue commented Nov 5, 2018

It seems that this change requires a bootstrap. This is a bit surprising at first, but this may just be a problem of loss of sharing before the change (supposing you end up copying less now).

This looks like a good idea. Indeed, calling normalize_module_path may attempt more expansions, and this has a cost. I'm surprised the cost was so high, but then the original change (introduction of module aliases) included performance improvements, so that this may have gone unnoticed (I believed I did some benchmarks, but do not remember exactly on what).

As for the repeated calls to expand_head_unif, there is an explicit comment by Jerome Vouillon, so they seem really needed. The question may rather be whether there should be a caching mechanism to avoid trying to expand repeatedly. I.e., a new kind of abbreviation saying that there is no expansion (expansions are already cached, but their absence is not). But then, is the extra complexity worth it.

@alainfrisch
Copy link
Contributor Author

There was at least one real bug; when passing the following to the compiler:

module X = struct
  module B = List
  exception B of {x:int}
end
let _ = X.B {x=2}

It used to fail with an assertion failure (LocalExt case in find_type_full). I will add a test for that.

;;
[%%expect{|
+module X : sig module B = List exception B of { x : int; } end
+- : exn = X.B {x = 2}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On trunk, the output is:

module X : sig module B = List exception B of { x : int; } end
Uncaught exception: File "typing/env.ml", line 942, characters 26-32: Assertion failed

@alainfrisch
Copy link
Contributor Author

As for the repeated calls to expand_head_unif, there is an explicit comment by Jerome Vouillon, so they seem really needed.

Honestly, unify2 is a bit mysterious to me, but do you agree that if the first pair:

  ignore (expand_head_unif !env t1);
  ignore (expand_head_unif !env t2);

does nothing, then the second pair is also necessarily a no-op:

  let t1' = expand_head_unif !env t1 in
  let t2' = expand_head_unif !env t2 in

?
Even without a full caching, one could detect this situation.

@lpw25
Copy link
Contributor

lpw25 commented Nov 5, 2018

Note that checking whether they've changed using univ_eq before calling the second pair is what the code used to do back when it was a fixed point rather than two iterations. So it should be safe.

@alainfrisch
Copy link
Contributor Author

So, anybody willing to review this? @lpw25 or @garrigue?

@alainfrisch
Copy link
Contributor Author

This fixes both an actual bug, and a performance bug as well, so I think it's worth looking at for 4.08.

@alainfrisch alainfrisch added this to the 4.08 milestone Nov 15, 2018
Copy link
Contributor

@garrigue garrigue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me.
I added just a request for comment, and a possible change of function name, to make the code more readable.

else normalize_module_path0 lax env (Papply(p1', p2'))
| Pident _ as path ->
normalize_module_path0 lax env path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about calling this function expand_module_path rather normalize_module_path0 ?

typing/env.ml Outdated
| _ -> false

let normalize_type_path oloc env path =
(* Inlined version of Path.constructor_typath *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you're inlining is_constructor_typath.
It would be good to add a comment about the structure of those paths: a regular type path is followed by a capitalized constructor name, hence the choice based on the capitalization.

@alainfrisch
Copy link
Contributor Author

Thanks for the review! Comments taken into account and branch rebased. Waiting for a green light for CI before merging.

(* Inlined version of Path.is_constructor_typath:
constructor type paths (i.e. path pointing to an inline
record argument of a constructpr) are built as a regular
type path followed by a capitalized constructor name. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Ext paths?
According to Subst.type_path, it seems that if p is a module path, it should be normalized as such.
Or is there some invariant that ensures that this cannot happen?

@garrigue
Copy link
Contributor

OK, if I understand correctly, it seems that in the following example, your code would fail to normalize N.E into M.E.
I ignore whether this can create problems. Basically, I don't see how one could create the type path N.E in the first place, since the constructor N.E just refers to M.E.

module M = struct exception E of {x:int; y:int} end
module N = M

@alainfrisch
Copy link
Contributor Author

To be on the safe side, I added explicit support for paths of inline records under extension constructor.

@garrigue
Copy link
Contributor

Ok. I let you merge it.

@alainfrisch alainfrisch merged commit 4c130ca into ocaml:trunk Nov 22, 2018
@gasche gasche mentioned this pull request Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants