Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR#6416 et al.: injective mapping between identifiers and printed names #1120

Merged
merged 4 commits into from Jun 26, 2018

Conversation

Projects
None yet
9 participants
@Octachron
Copy link
Contributor

commented Mar 23, 2017

MPR#6416, MPR#6634, MPR#4791 ( see also MPR#6323, MPR7402).

This PR aims at getting rid of all instances where different identifiers are printed with the same name in warnings, errors, or inferred signatures; leading to the infamous

type t = A
module M: sig type t val x:t end = struct
  type t = B
  let x = A
end;;
…
Values do not match: val x : t is not included in t

and variations.

Since there are many instances where transforming the above message to use valid types is difficult or impossible, this PR takes the easy road and uses unique but not syntactically valid names to disambiguate between identifiers. For instance, the previous error messages becomes:

Values do not match: val x : t/1212 is not included in t/1215

Note that this naming scheme "identifier.name/identifier.stamp" extends the one already used for unification error messages. These unique names are not particularly clear, but they at least convey the fact there was a problem with some type redefinition, and they could be improved later on.

Moreover, once different identifiers with the same name are disambiguated, it becomes relatively straightforward to detect when a signature printed with ocamlc -i a.ml does not corresponds to the real signature of a compilation unit. Therefore, this PR adds a new warning for unprintable signature.

For instance, consider the following compilation unit

(* pr6323.ml *)
type 'a t = B of 'a t list

let rec foo f = function
  | B(v)::tl -> B(foo f v)::foo f tl
  | [] -> []

module DT = struct
  type 'a t = {bar : 'a}
  let p t = foo (fun x -> x) t
end

Currently using ocamlc -i pr6323.ml prints

type 'a t = B of 'a t list
val foo : 'a -> 'b t list -> 'c t list
module DT :
sig type 'a t = { bar : 'a; } val p : 'a t list -> 'b t list end

which is the wrong signature, since the true type of p is 'a Pr6323.t list -> 'a Pr6323.t list, not 'a Dt.t list -> 'a DT.t list.

With this PR applied, ocamlc -w +63 -i pr6323.ml yields

type 'a t = B of 'a t list
val foo : 'a -> 'b t list -> 'c t list
module DT :
 sig type 'a t = { bar : 'a; } val p : 'a t/1204 list -> 'b t/1204 list end

and emits a warning

File "pr6323.ml", line 1:
Warning 63: A signature contained items which could not be printed properly,
due to name collisions between identifiers.
The resulting printed signature is not compilable.

Implementation overview

On the implementation side, the first commit in this PR introduces a notion of naming context in printtyp.ml. This naming context is used to detect name collision between identifiers when building outcome tree. When such a collision is detected, the identifiers is renamed on the fly to a unique name fo the form identifier.name / identifier.stamp.

Moreover, a Deferred printing submodule make this naming context easier to use by splitting the rendering of types in two distinct phases:

  • First, build all the concerned outcome trees, accumulating information
    in the naming context

  • Then, render the outcome tree, once all name collisions have been resolved

Note that this mechanism replaces the specific mechanism used previously for unification error messages.

The second commit make existing error and warning messages uses this new deferred printing module when appropriate.

The third commit introduces the new warning for unprintable signatures, which simply detects if there was any collision event when printing the signature with ocamlc -i.

The fourth commit extends the naming context collision mechanism to detect name collision with identifiers presents in the printing environment but absent from the current warnings, error messages or signature items. With this extension, most (all?) unprintable signature should be detected.

More examples:

@hcarty

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2017

Is meaningful location information available when these errors are caught? If so it would be useful to display a summary of where each unique type name comes from after the warning.

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Mar 23, 2017

I like the idea of adding location information as a footnote. I will have a look.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Mar 24, 2017

This certainly looks useful. The general design seems fine (in particular the idea of replacing Oide_ident's argument by a reference), but I'm still surprised at the sheer size of the patch.
At the API level, I'm not completely clear about the need of the Deferred submodule: intuitively the outcometree just does that, but you apparently need one more layer.

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2017

The Deferred submodule is not absolutely required, its existence merely stems from my attempt to make it easier to compose error messages without explicitly building all outcome trees before hand. Consequently, it could be replaced by self-enforced discipline and avoiding the printing function defined in printtyp. For instance, looking at the code change for the "values does not match" error, currently in trunk, we have

fprintf ppf
"@[<hv 2>Values do not match:@ %a@;<1 -2>is not included in@ %a@]"
(value_description id) d1 (value_description id) d2;
show_locs ppf (d1.val_loc, d2.val_loc);;

where the outcome trees associated with d1 and d2 are printed as soon as constructed, thus the identifier names in d1 cannot be expanded when a name conflict arises within d2 outcome tree.

After the second commit in this PR, the code is altered to

let open Deferred in
fprintf ppf
"@[<hv 2>Values do not match:@ %t@;<1 -2>is not included in@ %t@]"
(value_description id d1) (value_description id d2);
show_locs ppf (d1.val_loc, d2.val_loc);

in which the outcome trees for d1 and d2 are implicitly built by Deferred.value_description, consequently both outcome trees are already constructed and the subsequent name conflicts resolved when the printing starts.

The more explicit version would be

reset_naming_context ();
let td1 = tree_of_value_description id d1 and td2 = tree_of_value_description id d2 in
fprintf ppf
"@[<hv 2>Values do not match:@ %a@;<1 -2>is not included in@ %a@]"
!Oprint.out_sig_item td1 !Oprint.out_sig_item d2;
show_locs ppf (d1.val_loc, d2.val_loc);
@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2017

SInce it was a low hanging fruit, I have added a name conflict explanation footnote to the warning messages, that details the location of the definition of every conflicting identifiers. For instance,

type t = A
module M: sig type t val x:t end = struct
  type t = B
  let x = A
end;;

yields now

Values do not match: val x : t/1212 is not included in t/1215
Type t/1212 was defined at line 1, characters 0-10
Type t/1215 was defined at line 3, characters 2-12

Similarly, the error message for MPR#6323 example has been extended to

Warning 63: A signature contained items which could not be printed properly,
due to name collisions between identifiers:
Type t/1204 was defined at line 1, characters 0-26
Type t/1212 was defined at line 8, characters 2-24
The resulting printed signature is not compilable.

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from 85ff82b to 5357b17 Mar 25, 2017

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2017

Since using the internal identifier stamp was making the test suite brittle and was not particularly readable, I have udapted the code to use namespace/name specific counter for unique names, i.e:

module O = struct
  module type s
  type t = A
  module M: sig val f: (module s) -> t -> t end =
  struct module type s type t = B let f (module X:s) A = B end
end;;

emits the following error message

Error: Signature mismatch:
       Modules do not match:
         sig module type s type t = B val f : (module s) -> t/2 -> t/1 end
       is not included in
         sig val f : (module s) -> t -> t end
       Values do not match:
         val f : (module s/1) -> t/2 -> t/1
       is not included in
         val f : (module s/2) -> t/2 -> t/2
       Type t/1 was defined at line 5, characters 23-33
       Type t/2 was defined at line 3, characters 2-12
       Module type s/1 was defined at line 5, characters 9-22
       Module type s/2 was defined at line 2, characters 2-15
@hcarty

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2017

Thank you, the namespace/name-specific counters are much easier to read and decipher.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2017

I'm not sure about this new warning. On the one hand I can see the benefit of warning users that the signature isn't valid -- although it is worth noting that there are other ways that -i can produce an invalid signature and I don't think it is really supposed to be used for generating .mli files anyway. On the other hand, IIRC the printer doesn't make any effort to look for other ways to print the type if it is ambiguous (unless -short-paths is on) so whether the warning triggers for a particular file is going to depend on the non-principal details of the inference algorithm.

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Mar 27, 2017

It is certainly true that -i is not supposed to generate exact signature, however it can be a good starting point for writing .mli after some manual editing. The manual and man pages mention this workflow:

-i
Cause the compiler to print all defined names (with their inferred types or their definitions) when compiling an implementation (.ml file). No compiled files (.cmo and .cmi files) are produced. This can be useful to check the types inferred by the compiler. Also, since the output follows the syntax of interfaces, it can help in writing an explicit interface (.mli file) for a file: just redirect the standard output of the compiler to a .mli file, and edit that file to remove all declarations of unexported names.

In this context, I find quite logical to warn users when more manual editing would be needed to
obtain a valid signature. Similarly, it seems fine that the printed signature is not flawless and depends on the interference algorithm if it is only intended as a starting point.

Maybe the problem is more the wording of the warning, which should put more emphasis on the "manual intervention needed" interpretation rather than the non-compilable notice.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2017

In this context, I find quite logical to warn users when more manual editing would be needed to
obtain a valid signature. Similarly, it seems fine that the printed signature is not flawless and depends on the interference algorithm if it is only intended as a starting point.

Possibly. I'm just a little concerned that people will think they can use -i as part of their actual build -- under the assumption that they will get a warning if there is an issue. Then any change to inference could break their builds.

It's probably fine, but it does seem worth bearing in mind when considering this change.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Mar 27, 2017

       Type t/1 was defined at line 5, characters 23-33
       Type t/2 was defined at line 3, characters 2-12
       Module type s/1 was defined at line 5, characters 9-22
       Module type s/2 was defined at line 2, characters 2-15

You should consider using gcc's trick to print these locations in an emacs-friendly way:

file "foo.ml", line 5, characters 23-33:
  Definition of type t/1
file "foo.ml", line 3, characters 2-12:
  Definition of type t/2
file "foo.ml", line 5, characters 9-22:
  Definition of module type s/1
file "foo.ml", line 2, characters 2-15:
  Definition of module type s/2

This is slightly ugly but much easier to navigate once you get used to it.

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from b63578e to 42f673a Mar 28, 2017

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Mar 28, 2017

@lpw25 : I agree that is important to not give the impression that this warning can be relied upon by build systems. I have added some words of caution at the end of the warning to make this point more explicit:

Warning 63: The printed interface differs from the inferred interface.
The inferred interface contained items which could not be printed
properly due to name collisions between identifiers.
file "pr4791.ml", line 5, characters 2-12:
  Definition of type t/1
file "pr4791.ml", line 1, characters 0-10:
  Definition of type t/2
Beware that this warning is purely informational and will not catch all
instances of erroneous printed interface.

@damiendoligez, good point. I have sacrificed some readability to please emacs in f4f5576


let print_conflicts ?(sep=fun ppf -> Format.fprintf ppf "@;") ppf =
let l =
List.filter (* remove toplevle location, which are not really useful *)

This comment has been minimized.

Copy link
@Armael

Armael Mar 29, 2017

Member

"toplevle" -> "toplevel"

This comment has been minimized.

Copy link
@Octachron

Octachron Mar 29, 2017

Author Contributor

Fixed, thanks.

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from 8439a59 to fe9b290 Apr 27, 2017

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from fe9b290 to 0945af5 Aug 22, 2017

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 29, 2017

@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone Jun 5, 2018

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Jun 5, 2018

This PR is slowly rotting away in the backyard, and that makes me sad.

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 5, 2018

We need to find good souls to rebase it and to review it. @Armael, @Drup, @trefis, could you be convinced to look at it as reviewer(s)?

@trefis

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2018

I haven't followed the discussion, but I'm a bit surprised that we'd need new reviewers considering that both Jacques and Leo looked at this.
That said I don't mind having a look once it's rebased / cleaned up and @Octachron says it ready.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2018

IIRC I didn't review the code, I just commented on the idea.

@@ -14,6 +14,7 @@
(**************************************************************************)

open Format
;; Printtyp.Naming_context.enable false

This comment has been minimized.

Copy link
@garrigue

garrigue Jun 6, 2018

Contributor

Unusual style in the compiler: the standard style is let () = ..., which ensures the absence of partial application.

This comment has been minimized.

Copy link
@Octachron

Octachron Jun 6, 2018

Author Contributor

Fixed, thanks.

@garrigue

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

I actually, I had only looked at the first version of the PR.
The current version removes the Deferred module, and seems clean to me (just see my 2 comments).
I suppose it just has to be rebased before being approved.
Sorry not to have followed.


module Namespace = struct

type t = [`Type|`Module|`Module_type|`Class|`Class_type|`Other]

This comment has been minimized.

Copy link
@garrigue

garrigue Jun 6, 2018

Contributor

Why use a polymorphic variant here?

This comment has been minimized.

Copy link
@Octachron

Octachron Jun 6, 2018

Author Contributor

I think the main reason was because env.ml is already using polymorphic variant for namespaces (when handling shadowing). I can revert to simple variants.

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from 0945af5 to a503bfa Jun 6, 2018

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2018

@garrigue , I think it was largely a communication failure on my part.
I have squashed and rebased the PR and I shall do another round of self-review by the end of the week.

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from a503bfa to a2408fc Jun 6, 2018

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from a2408fc to 15fb7e5 Jun 11, 2018

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from 15fb7e5 to 47ea8ed Jun 21, 2018

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from 47ea8ed to 3d1cf26 Jun 21, 2018

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2018

Ok, I have removed the polymorphic variants and cleaned-up the rebase. @garrigue , would you agree to formally (review and) approve the PR?

@Octachron Octachron force-pushed the Octachron:deferred_ident_names branch from 3d1cf26 to c005973 Jun 21, 2018

@garrigue
Copy link
Contributor

left a comment

Just one small request, to have no regrets.

| Oide_ident of string
| Oide_ident of string ref
(** Beware that identifier names can be renamed on the fly to avoid
name collision between different identifiers, see {!Printtyp} *)

This comment has been minimized.

Copy link
@garrigue

garrigue Jun 22, 2018

Contributor

Wouldn't it be better to define a new type here, rather than use ref.
Say, define

type out_name = {mutable printed_name: string}
type out_ident =
  ...
  | Oide_ident of out_name

My concern is mainly about making the code more robust for maintenance, since you end up adding ref in many places, and this could become hard to track.

This comment has been minimized.

Copy link
@Octachron

Octachron Jun 22, 2018

Author Contributor

A new record type (or maybe an abstract type) does make the intent clearer here indeed. I have pushed a new commit to this effect.

@Octachron Octachron merged commit 349db3d into ocaml:trunk Jun 26, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2018

Merged, thanks for the review.

@sliquister

This comment has been minimized.

Copy link
Contributor

commented Jul 5, 2018

This change breaks some opam packages, by creating spurious warnings 49.

For instance this code, reduced from functoria:

module Foo = struct
  type info = { doc : unit }
  type t = { info : info }
end
let add_extra_info arg = arg.Foo.info.doc
~/ocaml/installed-349db3d869c8659b10f019af92f8dc497012d7ec/bin/ocamlc -nopervasives -c -no-alias-deps -intf-suffix .no-mli lib/functoria_key.ml  
File "_none_", line 1:
Warning 49: no cmi file was found in path for module Foo

It seems like several things are wrong here, though some of them may predate this change:

  • obviously, the warning makes no sense
  • then the code that causes this warning to be printed has no reason to run at all, because it's only trying to print another warning (42) that's disabled. Plus it might impact performance, because type-based disambiguation is used everywhere
  • finally, the compiler is actually looking for foo.cmi on disk. That seems like that could create races similar to the ones that got fixed by Env.without_cmis, because even if there a foo.ml in the same directory, then the build system knows the source file above doesn't depend on it, and so if the compiler's behavior depends on this file, its behavior will be non deterministic.
@gasche

This comment has been minimized.

Copy link
Member

commented Jul 5, 2018

Interesting! Thanks a lot for the report.

@Octachron

This comment has been minimized.

Copy link
Contributor Author

commented Jul 5, 2018

I will have a look.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Jul 9, 2018

@Octachron don't hesitate to create an MPR to keep track of this problem.

This was referenced Jul 10, 2018


(** Same as {!ident_name_simple} but lookup to existing named identifiers
in the current {!printing_env} *)
let ident_name namespace id =

This comment has been minimized.

Copy link
@lpw25

lpw25 Oct 16, 2018

Contributor

Why is this function needed? I'm refactoring the various lookup functions in the type-checker, and I'm not sure what to do with the one's used to implement env_ident.

The best I can work out is that this is trying to force us to print t/2 when t has been shadowed even though we are not printing any references to the shadowing type. This doesn't seem desirable to me, especially printing the associated "conflict" explanation. It's also not implemented correctly anyway.

If it's not really needed I'll just remove it as part of my refactoring.

This comment has been minimized.

Copy link
@Octachron

Octachron Oct 16, 2018

Author Contributor

This function is indeed trying to take in account names that have been already bound inside the environment. This is mostly useful when printing signature with ocamlc -i for cases like

type t = A
module M = struct
  type t
  let x = A
end

or having meaningful error message slightly earlier. For intance, the code below

type t = A
module M:sig type t val x:t end = struct type t let x = A end;;

raises

Modules do not match:
sig type t val x : t/2 end
is not included in
sig type t val x : t end
Values do not match: val x : t/2 is not included in val x : t/1

without this function, the first part of the error message is useless

Modules do not match:
sig type t val x : t end
is not included in
sig type t val x : t end
Values do not match: val x : t/2 is not included in val x : t/1

If the feature is broken or too hard to implement, I imagine that it could be removed (temporarily or not).

This comment has been minimized.

Copy link
@lpw25

lpw25 Oct 16, 2018

Contributor

But why does this require checking for shadowing? I would have expected the naming context to contain all the ts that we print and so to add the extra markers in all those cases.

Besides:

Modules do not match:
sig type t val x : t/2 end
is not included in
sig type t val x : t end
Values do not match: val x : t/2 is not included in val x : t/1

doesn't seem to be much of an improvement: is t/1 the same as one of the type ts or is it some third unrelated type?

This comment has been minimized.

Copy link
@Octachron

Octachron Oct 16, 2018

Author Contributor

Because when printing a signature, we don't want to print extra-markers if they are not strictly required?

type t
module M = struct
  type t = A
  val x: t
end

should be printed as

type t
module M: sig
  type t = A
  val x: t
end

thus checking which t is in scope and which one is shadowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.