Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameChoice (disambiguation): more refactoring #9182

Merged
merged 10 commits into from
Feb 18, 2020

Conversation

gasche
Copy link
Member

@gasche gasche commented Dec 12, 2019

This PR sits on top of of #9178. It contains 2 commits:

@gasche gasche force-pushed the clarify-disambiguation-2 branch 3 times, most recently from 22ef457 to 8a83bf7 Compare December 16, 2019 10:24
@gasche
Copy link
Member Author

gasche commented Dec 16, 2019

Now that #9178 is merged, this PR is rebased on top of trunk and contains only the PR-specific changes.

Comment on lines 724 to 760
| exception Not_found ->
match lookup_from_type env tpath usage lid with
| lbl ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I summon @Drup to fight this battle for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taratata !

"Justification-strengths should match change-importance." The strength of the justification required for a code-change decision is proportional to the importance of the change. A change with a large impact should be very strongly justified in discussions with reviewers, but for a nitpick-level change the author needs few justifications (possibly just personal preference) to apply or ignore a change suggestion.

This PR is joint work with @Octachron and @garrigue and both saw the code style in real time. OCtachron didn't say anything, and @garrigue approved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please.

This does not mean that whitespace or comment changes can be made completely on whims, as different whitespace/comment changes have different impact and importance.

You're trying to sneak in a new way to indent things, and:

  • everyone can already predict that you'll try to leverage it in other places later ("well if it was fine for that PR on typecore, why wouldn't it be nice for this other thing here...")
  • it has an impact on readability.

You seem to have contradictory desires @gasche: on the one hand you want your refactor PRs to be merged quickly, on the other hand you insist on adding friction in every one of them.

:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an .ocp-indent file in the project isn't that supposed to be applied ? At least I did in the past even if that was not to my liking.

typing/typecore.ml Outdated Show resolved Hide resolved
typing/typecore.ml Outdated Show resolved Hide resolved
@gasche
Copy link
Member Author

gasche commented Dec 16, 2019

I pushed a change suggested by @trefis and two further cleanups.

typing/typecore.ml Outdated Show resolved Hide resolved
Copy link
Member Author

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions on @garrigue's proposal for a documentation comment of disambiguate.

@@ -670,92 +674,109 @@ end) = struct
in
List.find check_type lbls

(** [disambiguate] selects a concrete description for [lid] using some
contextual information: an optional [expected_type], and a list of candidates [lbls].
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garrigue: in fact lbls is not a list but in some result type. Can you explain why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, when the list is empty, we may have an error message, which we may want to use later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the code correctly, the error comes from some label-filtering logic that was done by the caller, and we could get rid of this complication by passing the filter to the function instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it would just increase the complication, as the caller would have to understand how this function is going to be called. But do we agree than any change in the API would have to be another PR anyway?

Copy link
Member Author

@gasche gasche Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about how to proceed with further changes below. I agree with the idea of sending separate PRs, but I wished we would converge more quickly on each PR, because otherwise this could drag on indefinitely.

Concretely, would you be willing to approve the PR in its current state, even though we agree it's imperfect, so that follow-up PRs can start from trunk again? (I think we can iterate more on the documentation comment and I would like to experiment with the filtering logic, but those can go into separate PRs; the problem is that today they would conflict with the changes already in the PR.)

contextual information: an optional [expected_type], and a list of candidates [lbls].
If [expected_type] is [None], it just returns the head of [lbls].
Otherwise, it extracts the description directly from the corresponding type definition,
except for extension types, where it has to choose it inside [lbls].
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any special logic to handle extension types here, where does that come from? (Also, can we explain why?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the code first tries to look up the answer among lbls, so no specific code path is needed for that. The point I was making here is that, for all other types, the answer will be the one found through the type definition, and the change in order is only for warning purposes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand what happens for extension types and why that needs to be precised.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because for extension types there is no information in the type definitions, so the only possible approach is to collect the candidates in scope.

except for extension types, for which the information is not available,
so that one has to select it among [lbls] instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when an extension type is passed as expected type to disambiguate? The only control path I can see that matches your explanation is that it takes the Some(tpath, tpath0, principal) branch, for some reason principal is false, so we match on lbls, we take the first label suggested, it is in fact at the right path, it is not ambiguous, so no warning is shown.

Is that hypothesis correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And also: if this reasoning holds, why would extension types satisfy not principal in the first place?)

Copy link
Member

@Octachron Octachron Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the location of the comment about extension types a bit misleading: the disambiguate function has no special logic for them itself. It is the Constructor module that does not return any candidates for extension in its lookup_all_from_type implementations, and similarly it is Env.lookup_all_constructors that filters duplicate extension constructions from its return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improve on this in #9196 with a better comment and better placement.

except for extension types, where it has to choose it inside [lbls].
This code is careful to warn in a number of situations:
* if a non principal type was used for disambiguation [Not_principal]
* if [lbls] contained several valid candidates [Ambiguous_name]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that matter if type-based selection is used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matters because, when the type is not known principally, resolution without the type could have given a different result.

* if [lbls] contained several valid candidates [Ambiguous_name]
* if the name was chosen out of [scope] [Name_out_of_scope]
Here [scope] contains the labels equal to [lid] in the current environment,
and [lbls] is a (potentially strict) subset of [scope], see [disambiguate_label_by_ids] *)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these two remarks could be helpful to have above the rest of the specification, otherwise it's hard to follow the explanation if we assume that lbls is just an arbitrary list of labels (where are they coming from?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: reading this API suggests that the code could be clearer if instead of taking a scope, lbls pair whose relation requires explanations, the code of disambiguate was parameterized over a filtering function labels -> (labels, labels) result that it would call itself on scope.

Copy link
Contributor

@garrigue garrigue Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope only appears on the previous line, and I'm not sure this information is needed to understand anything else.
The semantics of disambiguate does not require this inclusion, but if it is not true the fact we first look up among lbls matters (i.e. we could have a solution in lbls that would cause a scoping warning if reached through the type).

We can remove the explanation relating lbls and scope, and rewrite the explanation above to follow strictly the code:

Otherwise we try to find a description in `lbls` with a matching type,
or, if this fails, extract a description from the type definition itself.

Then we don't have to refer to extension types at all.

By the way, I forgot the last warning:

* if the selected name was not the same as before disambiguation was introduced [Disambiguated_name]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the explanation relating lbls and scope, and rewrite the explanation above to follow strictly the code:

Otherwise we try to find a description in `lbls` with a matching type,
or, if this fails, extract a description from the type definition itself.

Then we don't have to refer to extension types at all.

That would be a mistake: if you do that, you're not explaining anything, you're just paraphrasing the code. In this instance I think that the behavior was already obvious from the code (but others might disagree, in which case we can try and clarify the code), but I didn't know why that behavior was chosen.
A good explanation (mentioning extensible types) would be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @trefis' point that the comment is closer to a description of the implementation (than to a specification). For me the parts that really helps reading the code is the explanation of what lbls and scope are (and maybe we could avoid need with clearer names, eg. candidates_in_scope and candidates_selection).

The part on the warnings is somewhat repetitive of the warnings themselves, I will get rid of it.

I'm interested in feedback on the other parts. (Personally I think that having some description is good, it could be shorter, but there is also some confusion, for example calling lbls a list when their type is not a list.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result types were introduced by @lpw25 in c19e8b2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it allows to keep all lookup logic in Env, previously it was spread all over the place. You call lookup_all_foo and get a (foo list, lookup_error) result. Either you have some foos (ideally it would use a non-empty list type, but we don't have a convenient one of those lying around), or you have an error describing why we couldn't find any.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here, I think, is that the caller passes a ('a, error) result instead of forcing the 'a itself, because we don't know yet whether the value will in fact be needed (it is only needed if type-based disambiguation is not possible or is not principal), and we want to error only when necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think that removing the part on warnings is a good idea: a function name is not a specification. You might rather argue that my description was not precise enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we want to error only when necessary.

That's right yes. Some of this complication is because the code mixes up the disambiguation code, which can work on long identifiers like M.Foo, with the code for type-based lookup, which only works on short identifiers like Foo. The long case potentially has interesting errors -- maybe it is M that doesn't exist rather than Foo -- but should always print the error if it exists. The short case has only a single error -- i.e. there is no Foo in scope -- but can ignore the error if it can find the constructor via the type.

Comment on lines 691 to 721
let warn_if_ambiguous lbl rest =
Printtyp.Conflicts.reset ();
let paths = ambiguous_types env lbl rest in
let expansion =
Format.asprintf "%t" Printtyp.Conflicts.print_explanations in
(* warn if there were multiple candidates in scope *)
if paths <> [] then
warn lid.loc
(Warnings.Ambiguous_name ([Longident.last lid.txt],
paths, false, expansion))
in
let warn_non_principal () =
let name = Datatype_kind.label_name kind in
warn lid.loc
(Warnings.Not_principal
("this type-based " ^ name ^ " disambiguation"))
in
let warn_out_of_scope tpath =
let path_s =
Printtyp.wrap_printing_env ~error:true env
(fun () -> Printtyp.string_of_path tpath) in
warn lid.loc
(Warnings.Name_out_of_scope (path_s, [Longident.last lid.txt], false))
in
let warn_disambiguated_name lbl scope =
match scope with
| Ok ((lab1,_) :: _) when lab1 == lbl -> ()
| _ ->
Location.prerr_warning lid.loc
(Warnings.Disambiguated_name (get_name lbl))
in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR: I refrained from saying so earlier, but I'm curious to see what the code would look like if these helpers were lifted out. Right now they are introducing 30 lines of noise between the code and its explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit that does it. It's not very nice because of the "warn" parameter -- it feels like code that was lambda-lifted by an automated tool rather than written by a human. I'm not convinced this is the best move, but let's wait to see how we like it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For once, I agree with @gasche: this code is about the warnings, which are only meaningful in that context, so moving them out does not improve things. Maybe we would be better off with a where syntax, but it was dropped in ocaml.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that @trefis' idea was also that sometimes, conciseness brings clarity, and the version with the warning code lifted out could result in a more brain-sized function that is easier to read and improve further.) I did think about using let rec ... and ... but I think it could be confusing.

@gasche
Copy link
Member Author

gasche commented Dec 17, 2019

The discussion keeps suggesting small refactorings to do, but I am worried that we will end up with a PR with too many commits that result in a large diff that people don't want to review from scratch.

Could we either:

  • have people approve current commits, so that they can focus on reviewing only the new commits if we add more, or
  • work on converging on the current PR, and delay further transformations to another PR?

(Warnings.Name_out_of_scope (path_s, [Longident.last lid.txt], false))

(* warn if the selected name is not the last introduced in scope
-- in thes cases the resolution is different from pre-disambiguation OCaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these

@gasche
Copy link
Member Author

gasche commented Dec 18, 2019

I pushed a new PR, #9196, which implements my idea of turning the lbls/scope pair into a scope / filter_candidates pair. I do believe that it makes the code clearer.

Note: merging the current PR would make it sensibly easier to work on #9196 and also for you to review it. My personal assessment is that the remaining questions in the current discussion are small (or a matter of personal preference, should the warning functions be in or out?), compared to the improvement in the PR itself, so that we could and should stop here for this PR and iterate in further PRs. If you agree with this assessment, please approve the PR.

@trefis
Copy link
Contributor

trefis commented Dec 18, 2019

I'm willing to approve provided you fix the typo @garrigue just spotted. (And that you do not use the indentation style that is generating pushback? :>)

@trefis
Copy link
Contributor

trefis commented Dec 18, 2019

To make my last message clearer: I agree with your assessment that this PR is already a noticeable improvement.

Copy link
Contributor

@garrigue garrigue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the current state, including the indentation.
But please get a consensus or revert to standard indentation before merging.

match List.filter check_closed labels with
| [] -> Error labels
| labels ->
Ok labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the two checks not fused together into List.filter (fun x -> check_ids x && check_closed x)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent because of the error discipline, where we return the last non-empty subset. If the first check is non-empty but the second is empty, your proposal would return the whole list instead of the first filter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I missed the shadowing of labels, this is a point against your indentation style here: the binding of labels should not be more indented that the body of the cases.

@Octachron Octachron merged commit e03e29f into ocaml:trunk Feb 18, 2020
@Octachron
Copy link
Member

Merged. This PR has been blocked for too long and there are worse flight of fancies than this ephemeral indentation style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants