PR#6416 et al.: injective mapping between identifiers and printed names #1120

Octachron · 2017-03-23T16:02:46Z

MPR#6416, MPR#6634, MPR#4791 ( see also MPR#6323, MPR7402).

This PR aims at getting rid of all instances where different identifiers are printed with the same name in warnings, errors, or inferred signatures; leading to the infamous

type t = A
module M: sig type t val x:t end = struct
  type t = B
  let x = A
end;;
…
Values do not match: val x : t is not included in t

and variations.

Since there are many instances where transforming the above message to use valid types is difficult or impossible, this PR takes the easy road and uses unique but not syntactically valid names to disambiguate between identifiers. For instance, the previous error messages becomes:

Values do not match: val x : t/1212 is not included in t/1215

Note that this naming scheme "identifier.name/identifier.stamp" extends the one already used for unification error messages. These unique names are not particularly clear, but they at least convey the fact there was a problem with some type redefinition, and they could be improved later on.

Moreover, once different identifiers with the same name are disambiguated, it becomes relatively straightforward to detect when a signature printed with ocamlc -i a.ml does not corresponds to the real signature of a compilation unit. Therefore, this PR adds a new warning for unprintable signature.

For instance, consider the following compilation unit

(* pr6323.ml *)
type 'a t = B of 'a t list

let rec foo f = function
  | B(v)::tl -> B(foo f v)::foo f tl
  | [] -> []

module DT = struct
  type 'a t = {bar : 'a}
  let p t = foo (fun x -> x) t
end

Currently using ocamlc -i pr6323.ml prints

type 'a t = B of 'a t list
val foo : 'a -> 'b t list -> 'c t list
module DT :
sig type 'a t = { bar : 'a; } val p : 'a t list -> 'b t list end

which is the wrong signature, since the true type of p is 'a Pr6323.t list -> 'a Pr6323.t list, not 'a Dt.t list -> 'a DT.t list.

With this PR applied, ocamlc -w +63 -i pr6323.ml yields

type 'a t = B of 'a t list
val foo : 'a -> 'b t list -> 'c t list
module DT :
 sig type 'a t = { bar : 'a; } val p : 'a t/1204 list -> 'b t/1204 list end

and emits a warning

File "pr6323.ml", line 1:
Warning 63: A signature contained items which could not be printed properly,
due to name collisions between identifiers.
The resulting printed signature is not compilable.

Implementation overview

On the implementation side, the first commit in this PR introduces a notion of naming context in printtyp.ml. This naming context is used to detect name collision between identifiers when building outcome tree. When such a collision is detected, the identifiers is renamed on the fly to a unique name fo the form identifier.name / identifier.stamp.

Moreover, a Deferred printing submodule make this naming context easier to use by splitting the rendering of types in two distinct phases:

First, build all the concerned outcome trees, accumulating information
in the naming context
Then, render the outcome tree, once all name collisions have been resolved

Note that this mechanism replaces the specific mechanism used previously for unification error messages.

The second commit make existing error and warning messages uses this new deferred printing module when appropriate.

The third commit introduces the new warning for unprintable signatures, which simply detects if there was any collision event when printing the signature with ocamlc -i.

The fourth commit extends the naming context collision mechanism to detect name collision with identifiers presents in the printing environment but absent from the current warnings, error messages or signature items. With this extension, most (all?) unprintable signature should be detected.

More examples:

signature mismatch [1], [2]
ambiguous type [1]
printed signature [1], [2], [3], [4],
unification error [1]

hcarty · 2017-03-23T16:15:22Z

Is meaningful location information available when these errors are caught? If so it would be useful to display a summary of where each unique type name comes from after the warning.

Octachron · 2017-03-23T17:23:51Z

I like the idea of adding location information as a footnote. I will have a look.

garrigue · 2017-03-24T02:02:22Z

This certainly looks useful. The general design seems fine (in particular the idea of replacing Oide_ident's argument by a reference), but I'm still surprised at the sheer size of the patch.
At the API level, I'm not completely clear about the need of the Deferred submodule: intuitively the outcometree just does that, but you apparently need one more layer.

Octachron · 2017-03-24T04:24:33Z

The Deferred submodule is not absolutely required, its existence merely stems from my attempt to make it easier to compose error messages without explicitly building all outcome trees before hand. Consequently, it could be replaced by self-enforced discipline and avoiding the printing function defined in printtyp. For instance, looking at the code change for the "values does not match" error, currently in trunk, we have

fprintf ppf
"@[<hv 2>Values do not match:@ %a@;<1 -2>is not included in@ %a@]"
(value_description id) d1 (value_description id) d2;
show_locs ppf (d1.val_loc, d2.val_loc);;

where the outcome trees associated with d1 and d2 are printed as soon as constructed, thus the identifier names in d1 cannot be expanded when a name conflict arises within d2 outcome tree.

After the second commit in this PR, the code is altered to

let open Deferred in
fprintf ppf
"@[<hv 2>Values do not match:@ %t@;<1 -2>is not included in@ %t@]"
(value_description id d1) (value_description id d2);
show_locs ppf (d1.val_loc, d2.val_loc);

in which the outcome trees for d1 and d2 are implicitly built by Deferred.value_description, consequently both outcome trees are already constructed and the subsequent name conflicts resolved when the printing starts.

The more explicit version would be

reset_naming_context ();
let td1 = tree_of_value_description id d1 and td2 = tree_of_value_description id d2 in
fprintf ppf
"@[<hv 2>Values do not match:@ %a@;<1 -2>is not included in@ %a@]"
!Oprint.out_sig_item td1 !Oprint.out_sig_item d2;
show_locs ppf (d1.val_loc, d2.val_loc);

Octachron · 2017-03-25T17:01:44Z

SInce it was a low hanging fruit, I have added a name conflict explanation footnote to the warning messages, that details the location of the definition of every conflicting identifiers. For instance,

type t = A
module M: sig type t val x:t end = struct
  type t = B
  let x = A
end;;

yields now

Values do not match: val x : t/1212 is not included in t/1215
Type t/1212 was defined at line 1, characters 0-10
Type t/1215 was defined at line 3, characters 2-12

Similarly, the error message for MPR#6323 example has been extended to

Warning 63: A signature contained items which could not be printed properly,
due to name collisions between identifiers:
Type t/1204 was defined at line 1, characters 0-26
Type t/1212 was defined at line 8, characters 2-24
The resulting printed signature is not compilable.

Octachron · 2017-03-25T20:35:10Z

Since using the internal identifier stamp was making the test suite brittle and was not particularly readable, I have udapted the code to use namespace/name specific counter for unique names, i.e:

module O = struct
  module type s
  type t = A
  module M: sig val f: (module s) -> t -> t end =
  struct module type s type t = B let f (module X:s) A = B end
end;;

emits the following error message

Error: Signature mismatch:
       Modules do not match:
         sig module type s type t = B val f : (module s) -> t/2 -> t/1 end
       is not included in
         sig val f : (module s) -> t -> t end
       Values do not match:
         val f : (module s/1) -> t/2 -> t/1
       is not included in
         val f : (module s/2) -> t/2 -> t/2
       Type t/1 was defined at line 5, characters 23-33
       Type t/2 was defined at line 3, characters 2-12
       Module type s/1 was defined at line 5, characters 9-22
       Module type s/2 was defined at line 2, characters 2-15

hcarty · 2017-03-25T20:51:25Z

Thank you, the namespace/name-specific counters are much easier to read and decipher.

lpw25 · 2017-03-27T09:08:53Z

I'm not sure about this new warning. On the one hand I can see the benefit of warning users that the signature isn't valid -- although it is worth noting that there are other ways that -i can produce an invalid signature and I don't think it is really supposed to be used for generating .mli files anyway. On the other hand, IIRC the printer doesn't make any effort to look for other ways to print the type if it is ambiguous (unless -short-paths is on) so whether the warning triggers for a particular file is going to depend on the non-principal details of the inference algorithm.

Octachron · 2017-03-27T10:00:04Z

It is certainly true that -i is not supposed to generate exact signature, however it can be a good starting point for writing .mli after some manual editing. The manual and man pages mention this workflow:

-i
Cause the compiler to print all defined names (with their inferred types or their definitions) when compiling an implementation (.ml file). No compiled files (.cmo and .cmi files) are produced. This can be useful to check the types inferred by the compiler. Also, since the output follows the syntax of interfaces, it can help in writing an explicit interface (.mli file) for a file: just redirect the standard output of the compiler to a .mli file, and edit that file to remove all declarations of unexported names.

In this context, I find quite logical to warn users when more manual editing would be needed to
obtain a valid signature. Similarly, it seems fine that the printed signature is not flawless and depends on the interference algorithm if it is only intended as a starting point.

Maybe the problem is more the wording of the warning, which should put more emphasis on the "manual intervention needed" interpretation rather than the non-compilable notice.

lpw25 · 2017-03-27T11:10:00Z

In this context, I find quite logical to warn users when more manual editing would be needed to
obtain a valid signature. Similarly, it seems fine that the printed signature is not flawless and depends on the interference algorithm if it is only intended as a starting point.

Possibly. I'm just a little concerned that people will think they can use -i as part of their actual build -- under the assumption that they will get a warning if there is an issue. Then any change to inference could break their builds.

It's probably fine, but it does seem worth bearing in mind when considering this change.

damiendoligez · 2017-03-27T14:15:10Z

       Type t/1 was defined at line 5, characters 23-33
       Type t/2 was defined at line 3, characters 2-12
       Module type s/1 was defined at line 5, characters 9-22
       Module type s/2 was defined at line 2, characters 2-15

You should consider using gcc's trick to print these locations in an emacs-friendly way:

file "foo.ml", line 5, characters 23-33:
  Definition of type t/1
file "foo.ml", line 3, characters 2-12:
  Definition of type t/2
file "foo.ml", line 5, characters 9-22:
  Definition of module type s/1
file "foo.ml", line 2, characters 2-15:
  Definition of module type s/2

This is slightly ugly but much easier to navigate once you get used to it.

Octachron · 2017-03-28T14:45:14Z

@lpw25 : I agree that is important to not give the impression that this warning can be relied upon by build systems. I have added some words of caution at the end of the warning to make this point more explicit:

Warning 63: The printed interface differs from the inferred interface.
The inferred interface contained items which could not be printed
properly due to name collisions between identifiers.
file "pr4791.ml", line 5, characters 2-12:
  Definition of type t/1
file "pr4791.ml", line 1, characters 0-10:
  Definition of type t/2
Beware that this warning is purely informational and will not catch all
instances of erroneous printed interface.

@damiendoligez, good point. I have sacrificed some readability to please emacs in f4f5576

Armael · 2017-03-29T07:03:20Z

typing/printtyp.ml

+
+let print_conflicts ?(sep=fun ppf -> Format.fprintf ppf "@;") ppf =
+  let l =
+    List.filter (* remove toplevle location, which are not really useful *)


"toplevle" -> "toplevel"

Fixed, thanks.

damiendoligez · 2018-06-05T13:28:11Z

This PR is slowly rotting away in the backyard, and that makes me sad.

gasche · 2018-06-05T13:50:28Z

We need to find good souls to rebase it and to review it. @Armael, @Drup, @trefis, could you be convinced to look at it as reviewer(s)?

trefis · 2018-06-05T14:23:06Z

I haven't followed the discussion, but I'm a bit surprised that we'd need new reviewers considering that both Jacques and Leo looked at this.
That said I don't mind having a look once it's rebased / cleaned up and @Octachron says it ready.

lpw25 · 2018-06-05T14:31:15Z

IIRC I didn't review the code, I just commented on the idea.

garrigue · 2018-06-06T04:43:17Z

ocamldoc/odoc_print.ml

@@ -14,6 +14,7 @@
 (**************************************************************************)

 open Format
+;; Printtyp.Naming_context.enable false


Unusual style in the compiler: the standard style is let () = ..., which ensures the absence of partial application.

Fixed, thanks.

garrigue · 2018-06-06T04:54:56Z

I actually, I had only looked at the first version of the PR.
The current version removes the Deferred module, and seems clean to me (just see my 2 comments).
I suppose it just has to be rebased before being approved.
Sorry not to have followed.

garrigue · 2018-06-06T05:01:04Z

typing/printtyp.ml

+
+module Namespace = struct
+
+  type t = [`Type|`Module|`Module_type|`Class|`Class_type|`Other]


Why use a polymorphic variant here?

I think the main reason was because env.ml is already using polymorphic variant for namespaces (when handling shadowing). I can revert to simple variants.

Octachron · 2018-06-06T22:41:58Z

@garrigue , I think it was largely a communication failure on my part.
I have squashed and rebased the PR and I shall do another round of self-review by the end of the week.

Octachron · 2018-06-21T21:06:01Z

Ok, I have removed the polymorphic variants and cleaned-up the rebase. @garrigue , would you agree to formally (review and) approve the PR?

garrigue

Just one small request, to have no regrets.

garrigue · 2018-06-22T09:36:43Z

typing/outcometree.mli

-  | Oide_ident of string
+  | Oide_ident of string ref
+  (** Beware that identifier names can be renamed on the fly to avoid
+      name collision between different identifiers, see {!Printtyp} *)



Wouldn't it be better to define a new type here, rather than use ref.
Say, define

type out_name = {mutable printed_name: string} type out_ident = ... | Oide_ident of out_name

My concern is mainly about making the code more robust for maintenance, since you end up adding ref in many places, and this could become hard to track.

A new record type (or maybe an abstract type) does make the intent clearer here indeed. I have pushed a new commit to this effect.

Octachron · 2018-06-26T20:04:01Z

Merged, thanks for the review.

v-gb · 2018-07-05T12:49:44Z

This change breaks some opam packages, by creating spurious warnings 49.

For instance this code, reduced from functoria:

module Foo = struct
  type info = { doc : unit }
  type t = { info : info }
end
let add_extra_info arg = arg.Foo.info.doc

~/ocaml/installed-349db3d869c8659b10f019af92f8dc497012d7ec/bin/ocamlc -nopervasives -c -no-alias-deps -intf-suffix .no-mli lib/functoria_key.ml  
File "_none_", line 1:
Warning 49: no cmi file was found in path for module Foo

It seems like several things are wrong here, though some of them may predate this change:

obviously, the warning makes no sense
then the code that causes this warning to be printed has no reason to run at all, because it's only trying to print another warning (42) that's disabled. Plus it might impact performance, because type-based disambiguation is used everywhere
finally, the compiler is actually looking for foo.cmi on disk. That seems like that could create races similar to the ones that got fixed by Env.without_cmis, because even if there a foo.ml in the same directory, then the build system knows the source file above doesn't depend on it, and so if the compiler's behavior depends on this file, its behavior will be non deterministic.

gasche · 2018-07-05T12:58:28Z

Interesting! Thanks a lot for the report.

Octachron · 2018-07-05T13:34:43Z

I will have a look.

damiendoligez · 2018-07-09T10:58:59Z

@Octachron don't hesitate to create an MPR to keep track of this problem.

lpw25 · 2018-10-16T13:05:58Z

typing/printtyp.ml

+
+(** Same as {!ident_name_simple} but lookup to existing named identifiers
+    in the current {!printing_env} *)
+let ident_name namespace id =


Why is this function needed? I'm refactoring the various lookup functions in the type-checker, and I'm not sure what to do with the one's used to implement env_ident.

The best I can work out is that this is trying to force us to print t/2 when t has been shadowed even though we are not printing any references to the shadowing type. This doesn't seem desirable to me, especially printing the associated "conflict" explanation. It's also not implemented correctly anyway.

If it's not really needed I'll just remove it as part of my refactoring.

This function is indeed trying to take in account names that have been already bound inside the environment. This is mostly useful when printing signature with ocamlc -i for cases like

type t = A module M = struct type t let x = A end

or having meaningful error message slightly earlier. For intance, the code below

type t = A module M:sig type t val x:t end = struct type t let x = A end;;

raises

Modules do not match:
sig type t val x : t/2 end
is not included in
sig type t val x : t end
Values do not match: val x : t/2 is not included in val x : t/1

without this function, the first part of the error message is useless

Modules do not match:
sig type t val x : t end
is not included in
sig type t val x : t end
Values do not match: val x : t/2 is not included in val x : t/1

If the feature is broken or too hard to implement, I imagine that it could be removed (temporarily or not).

But why does this require checking for shadowing? I would have expected the naming context to contain all the ts that we print and so to add the extra markers in all those cases.

Besides:

Modules do not match: sig type t val x : t/2 end is not included in sig type t val x : t end Values do not match: val x : t/2 is not included in val x : t/1

doesn't seem to be much of an improvement: is t/1 the same as one of the type ts or is it some third unrelated type?

Because when printing a signature, we don't want to print extra-markers if they are not strictly required?

type t module M = struct type t = A val x: t end

should be printed as

type t module M: sig type t = A val x: t end

thus checking which t is in scope and which one is shadowed.

Fix: do not forget some updates in to_cmm_static

Octachron force-pushed the deferred_ident_names branch from 85ff82b to 5357b17 Compare March 25, 2017 19:51

Octachron force-pushed the deferred_ident_names branch from b63578e to 42f673a Compare March 28, 2017 14:42

Armael reviewed Mar 29, 2017

View reviewed changes

Octachron force-pushed the deferred_ident_names branch from 8439a59 to fe9b290 Compare April 27, 2017 15:03

Octachron mentioned this pull request Aug 9, 2017

Avoid formatter flushes inside exported printers in Location #1281

Merged

Octachron force-pushed the deferred_ident_names branch from fe9b290 to 0945af5 Compare August 22, 2017 13:41

damiendoligez added this to the 4.07-or-later milestone Sep 29, 2017

damiendoligez removed this from the consider-for-4.07 milestone Jun 5, 2018

garrigue reviewed Jun 6, 2018

View reviewed changes

Octachron force-pushed the deferred_ident_names branch from 0945af5 to a503bfa Compare June 6, 2018 22:41

Octachron force-pushed the deferred_ident_names branch from a503bfa to a2408fc Compare June 6, 2018 22:44

Octachron force-pushed the deferred_ident_names branch from a2408fc to 15fb7e5 Compare June 11, 2018 12:13

Octachron force-pushed the deferred_ident_names branch from 15fb7e5 to 47ea8ed Compare June 21, 2018 19:42

GPR#1120: unique printed identifiers

4d375fd

Octachron force-pushed the deferred_ident_names branch from 47ea8ed to 3d1cf26 Compare June 21, 2018 20:52

GPR#1120: use simple variants

c005973

Octachron force-pushed the deferred_ident_names branch from 3d1cf26 to c005973 Compare June 21, 2018 22:51

garrigue requested changes Jun 22, 2018

View reviewed changes

GPR1120: out_name type

f07d73a

garrigue approved these changes Jun 22, 2018

View reviewed changes

GPR1120: fix incrementation for identifier names

af8d99d

Octachron merged commit 349db3d into ocaml:trunk Jun 26, 2018

This was referenced Jul 10, 2018

warning 40: fix missing wrap_printing_env #1893

Merged

Fix 7821 #1908

Merged

lpw25 reviewed Oct 16, 2018

View reviewed changes

This was referenced Mar 14, 2019

ocamlc -i produces incorrect result #4791

Closed

The same string can mean different types in the same error message #6416

Closed

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023

Bugfix: do not forget some updates in to_cmm_static (ocaml#1120)

919c4db

Fix: do not forget some updates in to_cmm_static

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

catch all exceptions in River.fetch and just log them (ocaml#1120)

97fd5e1


		module Namespace = struct

		type t = [`Type\|`Module\|`Module_type\|`Class\|`Class_type\|`Other]

PR#6416 et al.: injective mapping between identifiers and printed names #1120

PR#6416 et al.: injective mapping between identifiers and printed names #1120

Conversation

Octachron commented Mar 23, 2017 • edited Loading

Implementation overview

More examples:

hcarty commented Mar 23, 2017

Octachron commented Mar 23, 2017

garrigue commented Mar 24, 2017

Octachron commented Mar 24, 2017

Octachron commented Mar 25, 2017

Octachron commented Mar 25, 2017 • edited Loading

hcarty commented Mar 25, 2017

lpw25 commented Mar 27, 2017

Octachron commented Mar 27, 2017

lpw25 commented Mar 27, 2017

damiendoligez commented Mar 27, 2017

Octachron commented Mar 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damiendoligez commented Jun 5, 2018

gasche commented Jun 5, 2018

trefis commented Jun 5, 2018

lpw25 commented Jun 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garrigue commented Jun 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Octachron commented Jun 6, 2018

Octachron commented Jun 21, 2018

garrigue left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Octachron commented Jun 26, 2018

v-gb commented Jul 5, 2018

gasche commented Jul 5, 2018

Octachron commented Jul 5, 2018

damiendoligez commented Jul 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Octachron commented Mar 23, 2017 •

edited

Loading

Octachron commented Mar 25, 2017 •

edited

Loading