NameChoice (disambiguation): more refactoring #9182

gasche · 2019-12-12T20:41:18Z

This PR sits on top of of #9178. It contains 2 commits:

"minor clarifications" has been written with @garrigue and @trefis;
@Octachron, would you be willing to review it?
"implementation comments" has been written with @garrigue and @Octachron;
@trefis, would you be willing to review it?

gasche · 2019-12-16T10:24:40Z

Now that #9178 is merged, this PR is rebased on top of trunk and contains only the PR-specific changes.

trefis · 2019-12-16T10:33:42Z

typing/typecore.ml

+        | exception Not_found ->
+        match lookup_from_type env tpath usage lid with
+        | lbl ->


I summon @Drup to fight this battle for me.

Taratata !

"Justification-strengths should match change-importance." The strength of the justification required for a code-change decision is proportional to the importance of the change. A change with a large impact should be very strongly justified in discussions with reviewers, but for a nitpick-level change the author needs few justifications (possibly just personal preference) to apply or ignore a change suggestion.

This PR is joint work with @Octachron and @garrigue and both saw the code style in real time. OCtachron didn't say anything, and @garrigue approved.

Please.

This does not mean that whitespace or comment changes can be made completely on whims, as different whitespace/comment changes have different impact and importance.

You're trying to sneak in a new way to indent things, and:

everyone can already predict that you'll try to leverage it in other places later ("well if it was fine for that PR on typecore, why wouldn't it be nice for this other thing here...")

it has an impact on readability.

You seem to have contradictory desires @gasche: on the one hand you want your refactor PRs to be merged quickly, on the other hand you insist on adding friction in every one of them.

:)

There's an .ocp-indent file in the project isn't that supposed to be applied ? At least I did in the past even if that was not to my liking.

typing/typecore.ml

gasche · 2019-12-16T21:31:36Z

I pushed a change suggested by @trefis and two further cleanups.

typing/typecore.ml

gasche

Questions on @garrigue's proposal for a documentation comment of disambiguate.

gasche · 2019-12-17T14:55:30Z

typing/typecore.ml

@@ -670,92 +674,109 @@ end) = struct
        in
        List.find check_type lbls

+  (** [disambiguate] selects a concrete description for [lid] using some
+      contextual information: an optional [expected_type], and a list of candidates [lbls].


@garrigue: in fact lbls is not a list but in some result type. Can you explain why?

Because, when the list is empty, we may have an error message, which we may want to use later.

If I understand the code correctly, the error comes from some label-filtering logic that was done by the caller, and we could get rid of this complication by passing the filter to the function instead.

I'm afraid it would just increase the complication, as the caller would have to understand how this function is going to be called. But do we agree than any change in the API would have to be another PR anyway?

I asked about how to proceed with further changes below. I agree with the idea of sending separate PRs, but I wished we would converge more quickly on each PR, because otherwise this could drag on indefinitely.

Concretely, would you be willing to approve the PR in its current state, even though we agree it's imperfect, so that follow-up PRs can start from trunk again? (I think we can iterate more on the documentation comment and I would like to experiment with the filtering logic, but those can go into separate PRs; the problem is that today they would conflict with the changes already in the PR.)

gasche · 2019-12-17T14:59:05Z

typing/typecore.ml

+      contextual information: an optional [expected_type], and a list of candidates [lbls].
+      If [expected_type] is [None], it just returns the head of [lbls].
+      Otherwise, it extracts the description directly from the corresponding type definition,
+      except for extension types, where it has to choose it inside [lbls].


I don't see any special logic to handle extension types here, where does that come from? (Also, can we explain why?)

Actually, the code first tries to look up the answer among lbls, so no specific code path is needed for that. The point I was making here is that, for all other types, the answer will be the one found through the type definition, and the change in order is only for warning purposes.

I still don't understand what happens for extension types and why that needs to be precised.

Just because for extension types there is no information in the type definitions, so the only possible approach is to collect the candidates in scope.

except for extension types, for which the information is not available, so that one has to select it among [lbls] instead.

What happens when an extension type is passed as expected type to disambiguate? The only control path I can see that matches your explanation is that it takes the Some(tpath, tpath0, principal) branch, for some reason principal is false, so we match on lbls, we take the first label suggested, it is in fact at the right path, it is not ambiguous, so no warning is shown.

Is that hypothesis correct?

(And also: if this reasoning holds, why would extension types satisfy not principal in the first place?)

I find the location of the comment about extension types a bit misleading: the disambiguate function has no special logic for them itself. It is the Constructor module that does not return any candidates for extension in its lookup_all_from_type implementations, and similarly it is Env.lookup_all_constructors that filters duplicate extension constructions from its return.

I improve on this in #9196 with a better comment and better placement.

gasche · 2019-12-17T15:02:23Z

typing/typecore.ml

+      except for extension types, where it has to choose it inside [lbls].
+      This code is careful to warn in a number of situations:
+      * if a non principal type was used for disambiguation [Not_principal]
+      * if [lbls] contained several valid candidates [Ambiguous_name]


Why would that matter if type-based selection is used?

This matters because, when the type is not known principally, resolution without the type could have given a different result.

gasche · 2019-12-17T15:08:10Z

typing/typecore.ml

+      * if [lbls] contained several valid candidates [Ambiguous_name]
+      * if the name was chosen out of [scope] [Name_out_of_scope]
+      Here [scope] contains the labels equal to [lid] in the current environment,
+      and [lbls] is a (potentially strict) subset of [scope], see [disambiguate_label_by_ids] *)


I think that these two remarks could be helpful to have above the rest of the specification, otherwise it's hard to follow the explanation if we assume that lbls is just an arbitrary list of labels (where are they coming from?).

Note: reading this API suggests that the code could be clearer if instead of taking a scope, lbls pair whose relation requires explanations, the code of disambiguate was parameterized over a filtering function labels -> (labels, labels) result that it would call itself on scope.

scope only appears on the previous line, and I'm not sure this information is needed to understand anything else.
The semantics of disambiguate does not require this inclusion, but if it is not true the fact we first look up among lbls matters (i.e. we could have a solution in lbls that would cause a scoping warning if reached through the type).

We can remove the explanation relating lbls and scope, and rewrite the explanation above to follow strictly the code:

Otherwise we try to find a description in `lbls` with a matching type, or, if this fails, extract a description from the type definition itself.

Then we don't have to refer to extension types at all.

By the way, I forgot the last warning:

* if the selected name was not the same as before disambiguation was introduced [Disambiguated_name]

We can remove the explanation relating lbls and scope, and rewrite the explanation above to follow strictly the code:

Otherwise we try to find a description in `lbls` with a matching type, or, if this fails, extract a description from the type definition itself.

Then we don't have to refer to extension types at all.

That would be a mistake: if you do that, you're not explaining anything, you're just paraphrasing the code. In this instance I think that the behavior was already obvious from the code (but others might disagree, in which case we can try and clarify the code), but I didn't know why that behavior was chosen.
A good explanation (mentioning extensible types) would be useful.

I agree with @trefis' point that the comment is closer to a description of the implementation (than to a specification). For me the parts that really helps reading the code is the explanation of what lbls and scope are (and maybe we could avoid need with clearer names, eg. candidates_in_scope and candidates_selection).

The part on the warnings is somewhat repetitive of the warnings themselves, I will get rid of it.

I'm interested in feedback on the other parts. (Personally I think that having some description is good, it could be shorter, but there is also some confusion, for example calling lbls a list when their type is not a list.)

The result types were introduced by @lpw25 in c19e8b2.

Yes, it allows to keep all lookup logic in Env, previously it was spread all over the place. You call lookup_all_foo and get a (foo list, lookup_error) result. Either you have some foos (ideally it would use a non-empty list type, but we don't have a convenient one of those lying around), or you have an error describing why we couldn't find any.

The point here, I think, is that the caller passes a ('a, error) result instead of forcing the 'a itself, because we don't know yet whether the value will in fact be needed (it is only needed if type-based disambiguation is not possible or is not principal), and we want to error only when necessary.

Also, I don't think that removing the part on warnings is a good idea: a function name is not a specification. You might rather argue that my description was not precise enough.

and we want to error only when necessary.

That's right yes. Some of this complication is because the code mixes up the disambiguation code, which can work on long identifiers like M.Foo, with the code for type-based lookup, which only works on short identifiers like Foo. The long case potentially has interesting errors -- maybe it is M that doesn't exist rather than Foo -- but should always print the error if it exists. The short case has only a single error -- i.e. there is no Foo in scope -- but can ignore the error if it can find the constructor via the type.

trefis · 2019-12-17T17:14:14Z

typing/typecore.ml

+    let warn_if_ambiguous lbl rest =
+      Printtyp.Conflicts.reset ();
+      let paths = ambiguous_types env lbl rest in
+      let expansion =
+        Format.asprintf "%t" Printtyp.Conflicts.print_explanations in
+      (* warn if there were multiple candidates in scope *)
+      if paths <> [] then
+        warn lid.loc
+          (Warnings.Ambiguous_name ([Longident.last lid.txt],
+                                    paths, false, expansion))
+    in
+    let warn_non_principal () =
+      let name = Datatype_kind.label_name kind in
+      warn lid.loc
+        (Warnings.Not_principal
+           ("this type-based " ^ name ^ " disambiguation"))
+    in
+    let warn_out_of_scope tpath =
+      let path_s =
+        Printtyp.wrap_printing_env ~error:true env
+          (fun () -> Printtyp.string_of_path tpath) in
+      warn lid.loc
+        (Warnings.Name_out_of_scope (path_s, [Longident.last lid.txt], false))
+    in
+    let warn_disambiguated_name lbl scope =
+      match scope with
+      | Ok ((lab1,_) :: _) when lab1 == lbl -> ()
+      | _ ->
+          Location.prerr_warning lid.loc
+            (Warnings.Disambiguated_name (get_name lbl))
+    in


FTR: I refrained from saying so earlier, but I'm curious to see what the code would look like if these helpers were lifted out. Right now they are introducing 30 lines of noise between the code and its explanation.

I pushed a commit that does it. It's not very nice because of the "warn" parameter -- it feels like code that was lambda-lifted by an automated tool rather than written by a human. I'm not convinced this is the best move, but let's wait to see how we like it.

For once, I agree with @gasche: this code is about the warnings, which are only meaningful in that context, so moving them out does not improve things. Maybe we would be better off with a where syntax, but it was dropped in ocaml.

(I think that @trefis' idea was also that sometimes, conciseness brings clarity, and the version with the warning code lifted out could result in a more brain-sized function that is easier to read and improve further.) I did think about using let rec ... and ... but I think it could be confusing.

gasche · 2019-12-17T17:42:42Z

The discussion keeps suggesting small refactorings to do, but I am worried that we will end up with a PR with too many commits that result in a large diff that people don't want to review from scratch.

Could we either:

have people approve current commits, so that they can focus on reviewing only the new commits if we add more, or
work on converging on the current PR, and delay further transformations to another PR?

garrigue · 2019-12-18T10:13:24Z

typing/typecore.ml

+      (Warnings.Name_out_of_scope (path_s, [Longident.last lid.txt], false))
+
+  (* warn if the selected name is not the last introduced in scope
+     -- in thes cases the resolution is different from pre-disambiguation OCaml


gasche · 2019-12-18T10:41:47Z

I pushed a new PR, #9196, which implements my idea of turning the lbls/scope pair into a scope / filter_candidates pair. I do believe that it makes the code clearer.

Note: merging the current PR would make it sensibly easier to work on #9196 and also for you to review it. My personal assessment is that the remaining questions in the current discussion are small (or a matter of personal preference, should the warning functions be in or out?), compared to the improvement in the PR itself, so that we could and should stop here for this PR and iterate in further PRs. If you agree with this assessment, please approve the PR.

trefis · 2019-12-18T10:48:59Z

I'm willing to approve provided you fix the typo @garrigue just spotted. (And that you do not use the indentation style that is generating pushback? :>)

trefis · 2019-12-18T10:49:52Z

To make my last message clearer: I agree with your assessment that this PR is already a noticeable improvement.

garrigue

I'm happy with the current state, including the indentation.
But please get a consensus or revert to standard indentation before merging.

(Joint work with Jacques Garrigue and Thomas Refis)

(Joint work with Jacques Garrigue and Florian Angeletti)

(request from Jacques Garrigue)

Octachron · 2020-01-30T15:00:57Z

typing/typecore.ml

+  match List.filter check_closed labels with
+  | [] -> Error labels
+  | labels ->
+  Ok labels


Why are the two checks not fused together into List.filter (fun x -> check_ids x && check_closed x)?

This is not equivalent because of the error discipline, where we return the last non-empty subset. If the first check is non-empty but the second is empty, your proposal would return the whole list instead of the first filter.

Hm, I missed the shadowing of labels, this is a point against your indentation style here: the binding of labels should not be more indented that the body of the cases.

Octachron · 2020-02-18T11:25:31Z

Merged. This PR has been blocked for too long and there are worse flight of fancies than this ephemeral indentation style.

gasche force-pushed the clarify-disambiguation-2 branch 3 times, most recently from 22ef457 to 8a83bf7 Compare December 16, 2019 10:24

trefis reviewed Dec 16, 2019

View reviewed changes

gasche commented Dec 16, 2019

View reviewed changes

typing/typecore.ml Outdated Show resolved Hide resolved

trefis reviewed Dec 16, 2019

View reviewed changes

typing/typecore.ml Outdated Show resolved Hide resolved

trefis reviewed Dec 17, 2019

View reviewed changes

typing/typecore.ml Outdated Show resolved Hide resolved

garrigue reviewed Dec 17, 2019

View reviewed changes

typing/typecore.ml Show resolved Hide resolved

gasche force-pushed the clarify-disambiguation-2 branch from 242c364 to a464b79 Compare December 17, 2019 14:50

gasche commented Dec 17, 2019

View reviewed changes

trefis reviewed Dec 17, 2019

View reviewed changes

garrigue reviewed Dec 18, 2019

View reviewed changes

gasche mentioned this pull request Dec 18, 2019

disambiguation: improve the interface of NameChoice.disambiguate #9196

Merged

gasche force-pushed the clarify-disambiguation-2 branch from 231bdb9 to 7d98133 Compare December 18, 2019 12:39

garrigue approved these changes Dec 18, 2019

View reviewed changes

gasche force-pushed the clarify-disambiguation-2 branch from 7d98133 to 2eee3ca Compare December 19, 2019 12:25

gasche and others added 8 commits December 19, 2019 13:28

disambiguation: minor clarifications

dfac03e

(Joint work with Jacques Garrigue and Thomas Refis)

disambiguation: implementation comments

dc6a222

(Joint work with Jacques Garrigue and Florian Angeletti)

disambiguation: clarify the API of disambiguate_label_by_ids

3a6cb99

disambiguation: factor warning code

8893a03

Changes entry

8703865

disambiguation: specification comment for NameChoice.disambiguate

8ba9f9f

disambiguation: lambda-lift warning code

92fe209

disambiguation: move the warning documentation to the warning functions

44149ba

gasche added 2 commits December 19, 2019 13:28

disambiguation: split the disambiguate comment in spec. and impl.

a6df85f

disambiguate: document all the warnings

2eee3ca

(request from Jacques Garrigue)

Octachron reviewed Jan 30, 2020

View reviewed changes

Octachron merged commit e03e29f into ocaml:trunk Feb 18, 2020

Octachron mentioned this pull request Mar 26, 2020

Spellchecker hints and type-directed disambiguation for extensible variants: bis #1706

Merged

NameChoice (disambiguation): more refactoring #9182

NameChoice (disambiguation): more refactoring #9182

Conversation

gasche commented Dec 12, 2019

gasche commented Dec 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gasche commented Dec 16, 2019

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gasche Dec 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Octachron Dec 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garrigue Dec 17, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gasche commented Dec 17, 2019

Choose a reason for hiding this comment

gasche commented Dec 18, 2019

trefis commented Dec 18, 2019

trefis commented Dec 18, 2019

garrigue left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Octachron commented Feb 18, 2020

gasche Dec 18, 2019 •

edited

Octachron Dec 18, 2019 •

edited

garrigue Dec 17, 2019 •

edited