Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR#6416 et al.: injective mapping between identifiers and printed names #1120

Merged
merged 4 commits into from
Jun 26, 2018

Conversation

Octachron
Copy link
Member

@Octachron Octachron commented Mar 23, 2017

MPR#6416, MPR#6634, MPR#4791 ( see also MPR#6323, MPR7402).

This PR aims at getting rid of all instances where different identifiers are printed with the same name in warnings, errors, or inferred signatures; leading to the infamous

type t = A
module M: sig type t val x:t end = struct
  type t = B
  let x = A
end;;
…
Values do not match: val x : t is not included in t

and variations.

Since there are many instances where transforming the above message to use valid types is difficult or impossible, this PR takes the easy road and uses unique but not syntactically valid names to disambiguate between identifiers. For instance, the previous error messages becomes:

Values do not match: val x : t/1212 is not included in t/1215

Note that this naming scheme "identifier.name/identifier.stamp" extends the one already used for unification error messages. These unique names are not particularly clear, but they at least convey the fact there was a problem with some type redefinition, and they could be improved later on.

Moreover, once different identifiers with the same name are disambiguated, it becomes relatively straightforward to detect when a signature printed with ocamlc -i a.ml does not corresponds to the real signature of a compilation unit. Therefore, this PR adds a new warning for unprintable signature.

For instance, consider the following compilation unit

(* pr6323.ml *)
type 'a t = B of 'a t list

let rec foo f = function
  | B(v)::tl -> B(foo f v)::foo f tl
  | [] -> []

module DT = struct
  type 'a t = {bar : 'a}
  let p t = foo (fun x -> x) t
end

Currently using ocamlc -i pr6323.ml prints

type 'a t = B of 'a t list
val foo : 'a -> 'b t list -> 'c t list
module DT :
sig type 'a t = { bar : 'a; } val p : 'a t list -> 'b t list end

which is the wrong signature, since the true type of p is 'a Pr6323.t list -> 'a Pr6323.t list, not 'a Dt.t list -> 'a DT.t list.

With this PR applied, ocamlc -w +63 -i pr6323.ml yields

type 'a t = B of 'a t list
val foo : 'a -> 'b t list -> 'c t list
module DT :
 sig type 'a t = { bar : 'a; } val p : 'a t/1204 list -> 'b t/1204 list end

and emits a warning

File "pr6323.ml", line 1:
Warning 63: A signature contained items which could not be printed properly,
due to name collisions between identifiers.
The resulting printed signature is not compilable.

Implementation overview

On the implementation side, the first commit in this PR introduces a notion of naming context in printtyp.ml. This naming context is used to detect name collision between identifiers when building outcome tree. When such a collision is detected, the identifiers is renamed on the fly to a unique name fo the form identifier.name / identifier.stamp.

Moreover, a Deferred printing submodule make this naming context easier to use by splitting the rendering of types in two distinct phases:

  • First, build all the concerned outcome trees, accumulating information
    in the naming context

  • Then, render the outcome tree, once all name collisions have been resolved

Note that this mechanism replaces the specific mechanism used previously for unification error messages.

The second commit make existing error and warning messages uses this new deferred printing module when appropriate.

The third commit introduces the new warning for unprintable signatures, which simply detects if there was any collision event when printing the signature with ocamlc -i.

The fourth commit extends the naming context collision mechanism to detect name collision with identifiers presents in the printing environment but absent from the current warnings, error messages or signature items. With this extension, most (all?) unprintable signature should be detected.

More examples:

@hcarty
Copy link
Member

hcarty commented Mar 23, 2017

Is meaningful location information available when these errors are caught? If so it would be useful to display a summary of where each unique type name comes from after the warning.

@Octachron
Copy link
Member Author

I like the idea of adding location information as a footnote. I will have a look.

@garrigue
Copy link
Contributor

This certainly looks useful. The general design seems fine (in particular the idea of replacing Oide_ident's argument by a reference), but I'm still surprised at the sheer size of the patch.
At the API level, I'm not completely clear about the need of the Deferred submodule: intuitively the outcometree just does that, but you apparently need one more layer.

@Octachron
Copy link
Member Author

The Deferred submodule is not absolutely required, its existence merely stems from my attempt to make it easier to compose error messages without explicitly building all outcome trees before hand. Consequently, it could be replaced by self-enforced discipline and avoiding the printing function defined in printtyp. For instance, looking at the code change for the "values does not match" error, currently in trunk, we have

fprintf ppf
"@[<hv 2>Values do not match:@ %a@;<1 -2>is not included in@ %a@]"
(value_description id) d1 (value_description id) d2;
show_locs ppf (d1.val_loc, d2.val_loc);;

where the outcome trees associated with d1 and d2 are printed as soon as constructed, thus the identifier names in d1 cannot be expanded when a name conflict arises within d2 outcome tree.

After the second commit in this PR, the code is altered to

let open Deferred in
fprintf ppf
"@[<hv 2>Values do not match:@ %t@;<1 -2>is not included in@ %t@]"
(value_description id d1) (value_description id d2);
show_locs ppf (d1.val_loc, d2.val_loc);

in which the outcome trees for d1 and d2 are implicitly built by Deferred.value_description, consequently both outcome trees are already constructed and the subsequent name conflicts resolved when the printing starts.

The more explicit version would be

reset_naming_context ();
let td1 = tree_of_value_description id d1 and td2 = tree_of_value_description id d2 in
fprintf ppf
"@[<hv 2>Values do not match:@ %a@;<1 -2>is not included in@ %a@]"
!Oprint.out_sig_item td1 !Oprint.out_sig_item d2;
show_locs ppf (d1.val_loc, d2.val_loc);

@Octachron
Copy link
Member Author

SInce it was a low hanging fruit, I have added a name conflict explanation footnote to the warning messages, that details the location of the definition of every conflicting identifiers. For instance,

type t = A
module M: sig type t val x:t end = struct
  type t = B
  let x = A
end;;

yields now

Values do not match: val x : t/1212 is not included in t/1215
Type t/1212 was defined at line 1, characters 0-10
Type t/1215 was defined at line 3, characters 2-12

Similarly, the error message for MPR#6323 example has been extended to

Warning 63: A signature contained items which could not be printed properly,
due to name collisions between identifiers:
Type t/1204 was defined at line 1, characters 0-26
Type t/1212 was defined at line 8, characters 2-24
The resulting printed signature is not compilable.

@Octachron
Copy link
Member Author

Octachron commented Mar 25, 2017

Since using the internal identifier stamp was making the test suite brittle and was not particularly readable, I have udapted the code to use namespace/name specific counter for unique names, i.e:

module O = struct
  module type s
  type t = A
  module M: sig val f: (module s) -> t -> t end =
  struct module type s type t = B let f (module X:s) A = B end
end;;

emits the following error message

Error: Signature mismatch:
       Modules do not match:
         sig module type s type t = B val f : (module s) -> t/2 -> t/1 end
       is not included in
         sig val f : (module s) -> t -> t end
       Values do not match:
         val f : (module s/1) -> t/2 -> t/1
       is not included in
         val f : (module s/2) -> t/2 -> t/2
       Type t/1 was defined at line 5, characters 23-33
       Type t/2 was defined at line 3, characters 2-12
       Module type s/1 was defined at line 5, characters 9-22
       Module type s/2 was defined at line 2, characters 2-15

@hcarty
Copy link
Member

hcarty commented Mar 25, 2017

Thank you, the namespace/name-specific counters are much easier to read and decipher.

@lpw25
Copy link
Contributor

lpw25 commented Mar 27, 2017

I'm not sure about this new warning. On the one hand I can see the benefit of warning users that the signature isn't valid -- although it is worth noting that there are other ways that -i can produce an invalid signature and I don't think it is really supposed to be used for generating .mli files anyway. On the other hand, IIRC the printer doesn't make any effort to look for other ways to print the type if it is ambiguous (unless -short-paths is on) so whether the warning triggers for a particular file is going to depend on the non-principal details of the inference algorithm.

@Octachron
Copy link
Member Author

It is certainly true that -i is not supposed to generate exact signature, however it can be a good starting point for writing .mli after some manual editing. The manual and man pages mention this workflow:

-i
Cause the compiler to print all defined names (with their inferred types or their definitions) when compiling an implementation (.ml file). No compiled files (.cmo and .cmi files) are produced. This can be useful to check the types inferred by the compiler. Also, since the output follows the syntax of interfaces, it can help in writing an explicit interface (.mli file) for a file: just redirect the standard output of the compiler to a .mli file, and edit that file to remove all declarations of unexported names.

In this context, I find quite logical to warn users when more manual editing would be needed to
obtain a valid signature. Similarly, it seems fine that the printed signature is not flawless and depends on the interference algorithm if it is only intended as a starting point.

Maybe the problem is more the wording of the warning, which should put more emphasis on the "manual intervention needed" interpretation rather than the non-compilable notice.

@lpw25
Copy link
Contributor

lpw25 commented Mar 27, 2017

In this context, I find quite logical to warn users when more manual editing would be needed to
obtain a valid signature. Similarly, it seems fine that the printed signature is not flawless and depends on the interference algorithm if it is only intended as a starting point.

Possibly. I'm just a little concerned that people will think they can use -i as part of their actual build -- under the assumption that they will get a warning if there is an issue. Then any change to inference could break their builds.

It's probably fine, but it does seem worth bearing in mind when considering this change.

@damiendoligez
Copy link
Member

       Type t/1 was defined at line 5, characters 23-33
       Type t/2 was defined at line 3, characters 2-12
       Module type s/1 was defined at line 5, characters 9-22
       Module type s/2 was defined at line 2, characters 2-15

You should consider using gcc's trick to print these locations in an emacs-friendly way:

file "foo.ml", line 5, characters 23-33:
  Definition of type t/1
file "foo.ml", line 3, characters 2-12:
  Definition of type t/2
file "foo.ml", line 5, characters 9-22:
  Definition of module type s/1
file "foo.ml", line 2, characters 2-15:
  Definition of module type s/2

This is slightly ugly but much easier to navigate once you get used to it.

@Octachron
Copy link
Member Author

@lpw25 : I agree that is important to not give the impression that this warning can be relied upon by build systems. I have added some words of caution at the end of the warning to make this point more explicit:

Warning 63: The printed interface differs from the inferred interface.
The inferred interface contained items which could not be printed
properly due to name collisions between identifiers.
file "pr4791.ml", line 5, characters 2-12:
  Definition of type t/1
file "pr4791.ml", line 1, characters 0-10:
  Definition of type t/2
Beware that this warning is purely informational and will not catch all
instances of erroneous printed interface.

@damiendoligez, good point. I have sacrificed some readability to please emacs in f4f5576


let print_conflicts ?(sep=fun ppf -> Format.fprintf ppf "@;") ppf =
let l =
List.filter (* remove toplevle location, which are not really useful *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"toplevle" -> "toplevel"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@damiendoligez
Copy link
Member

This PR is slowly rotting away in the backyard, and that makes me sad.

@gasche
Copy link
Member

gasche commented Jun 5, 2018

We need to find good souls to rebase it and to review it. @Armael, @Drup, @trefis, could you be convinced to look at it as reviewer(s)?

@trefis
Copy link
Contributor

trefis commented Jun 5, 2018

I haven't followed the discussion, but I'm a bit surprised that we'd need new reviewers considering that both Jacques and Leo looked at this.
That said I don't mind having a look once it's rebased / cleaned up and @Octachron says it ready.

@lpw25
Copy link
Contributor

lpw25 commented Jun 5, 2018

IIRC I didn't review the code, I just commented on the idea.

@@ -14,6 +14,7 @@
(**************************************************************************)

open Format
;; Printtyp.Naming_context.enable false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unusual style in the compiler: the standard style is let () = ..., which ensures the absence of partial application.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@garrigue
Copy link
Contributor

garrigue commented Jun 6, 2018

I actually, I had only looked at the first version of the PR.
The current version removes the Deferred module, and seems clean to me (just see my 2 comments).
I suppose it just has to be rebased before being approved.
Sorry not to have followed.


module Namespace = struct

type t = [`Type|`Module|`Module_type|`Class|`Class_type|`Other]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a polymorphic variant here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason was because env.ml is already using polymorphic variant for namespaces (when handling shadowing). I can revert to simple variants.

@Octachron
Copy link
Member Author

@garrigue , I think it was largely a communication failure on my part.
I have squashed and rebased the PR and I shall do another round of self-review by the end of the week.

@Octachron
Copy link
Member Author

Ok, I have removed the polymorphic variants and cleaned-up the rebase. @garrigue , would you agree to formally (review and) approve the PR?

Copy link
Contributor

@garrigue garrigue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small request, to have no regrets.

| Oide_ident of string
| Oide_ident of string ref
(** Beware that identifier names can be renamed on the fly to avoid
name collision between different identifiers, see {!Printtyp} *)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to define a new type here, rather than use ref.
Say, define

type out_name = {mutable printed_name: string}
type out_ident =
  ...
  | Oide_ident of out_name

My concern is mainly about making the code more robust for maintenance, since you end up adding ref in many places, and this could become hard to track.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new record type (or maybe an abstract type) does make the intent clearer here indeed. I have pushed a new commit to this effect.

@Octachron Octachron merged commit 349db3d into ocaml:trunk Jun 26, 2018
@Octachron
Copy link
Member Author

Merged, thanks for the review.

@v-gb
Copy link
Contributor

v-gb commented Jul 5, 2018

This change breaks some opam packages, by creating spurious warnings 49.

For instance this code, reduced from functoria:

module Foo = struct
  type info = { doc : unit }
  type t = { info : info }
end
let add_extra_info arg = arg.Foo.info.doc
~/ocaml/installed-349db3d869c8659b10f019af92f8dc497012d7ec/bin/ocamlc -nopervasives -c -no-alias-deps -intf-suffix .no-mli lib/functoria_key.ml  
File "_none_", line 1:
Warning 49: no cmi file was found in path for module Foo

It seems like several things are wrong here, though some of them may predate this change:

  • obviously, the warning makes no sense
  • then the code that causes this warning to be printed has no reason to run at all, because it's only trying to print another warning (42) that's disabled. Plus it might impact performance, because type-based disambiguation is used everywhere
  • finally, the compiler is actually looking for foo.cmi on disk. That seems like that could create races similar to the ones that got fixed by Env.without_cmis, because even if there a foo.ml in the same directory, then the build system knows the source file above doesn't depend on it, and so if the compiler's behavior depends on this file, its behavior will be non deterministic.

@gasche
Copy link
Member

gasche commented Jul 5, 2018

Interesting! Thanks a lot for the report.

@Octachron
Copy link
Member Author

I will have a look.

@damiendoligez
Copy link
Member

@Octachron don't hesitate to create an MPR to keep track of this problem.

This was referenced Jul 10, 2018

(** Same as {!ident_name_simple} but lookup to existing named identifiers
in the current {!printing_env} *)
let ident_name namespace id =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function needed? I'm refactoring the various lookup functions in the type-checker, and I'm not sure what to do with the one's used to implement env_ident.

The best I can work out is that this is trying to force us to print t/2 when t has been shadowed even though we are not printing any references to the shadowing type. This doesn't seem desirable to me, especially printing the associated "conflict" explanation. It's also not implemented correctly anyway.

If it's not really needed I'll just remove it as part of my refactoring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is indeed trying to take in account names that have been already bound inside the environment. This is mostly useful when printing signature with ocamlc -i for cases like

type t = A
module M = struct
  type t
  let x = A
end

or having meaningful error message slightly earlier. For intance, the code below

type t = A
module M:sig type t val x:t end = struct type t let x = A end;;

raises

Modules do not match:
sig type t val x : t/2 end
is not included in
sig type t val x : t end
Values do not match: val x : t/2 is not included in val x : t/1

without this function, the first part of the error message is useless

Modules do not match:
sig type t val x : t end
is not included in
sig type t val x : t end
Values do not match: val x : t/2 is not included in val x : t/1

If the feature is broken or too hard to implement, I imagine that it could be removed (temporarily or not).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why does this require checking for shadowing? I would have expected the naming context to contain all the ts that we print and so to add the extra markers in all those cases.

Besides:

Modules do not match:
sig type t val x : t/2 end
is not included in
sig type t val x : t end
Values do not match: val x : t/2 is not included in val x : t/1

doesn't seem to be much of an improvement: is t/1 the same as one of the type ts or is it some third unrelated type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when printing a signature, we don't want to print extra-markers if they are not strictly required?

type t
module M = struct
  type t = A
  val x: t
end

should be printed as

type t
module M: sig
  type t = A
  val x: t
end

thus checking which t is in scope and which one is shadowed.

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
Fix: do not forget some updates in to_cmm_static
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants