-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NameChoice (disambiguation): more refactoring #9182
Conversation
22ef457
to
8a83bf7
Compare
Now that #9178 is merged, this PR is rebased on top of trunk and contains only the PR-specific changes. |
typing/typecore.ml
Outdated
| exception Not_found -> | ||
match lookup_from_type env tpath usage lid with | ||
| lbl -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I summon @Drup to fight this battle for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taratata !
"Justification-strengths should match change-importance." The strength of the justification required for a code-change decision is proportional to the importance of the change. A change with a large impact should be very strongly justified in discussions with reviewers, but for a nitpick-level change the author needs few justifications (possibly just personal preference) to apply or ignore a change suggestion.
This PR is joint work with @Octachron and @garrigue and both saw the code style in real time. OCtachron didn't say anything, and @garrigue approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please.
This does not mean that whitespace or comment changes can be made completely on whims, as different whitespace/comment changes have different impact and importance.
You're trying to sneak in a new way to indent things, and:
- everyone can already predict that you'll try to leverage it in other places later ("well if it was fine for that PR on typecore, why wouldn't it be nice for this other thing here...")
- it has an impact on readability.
You seem to have contradictory desires @gasche: on the one hand you want your refactor PRs to be merged quickly, on the other hand you insist on adding friction in every one of them.
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an .ocp-indent
file in the project isn't that supposed to be applied ? At least I did in the past even if that was not to my liking.
I pushed a change suggested by @trefis and two further cleanups. |
242c364
to
a464b79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions on @garrigue's proposal for a documentation comment of disambiguate
.
typing/typecore.ml
Outdated
@@ -670,92 +674,109 @@ end) = struct | |||
in | |||
List.find check_type lbls | |||
|
|||
(** [disambiguate] selects a concrete description for [lid] using some | |||
contextual information: an optional [expected_type], and a list of candidates [lbls]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garrigue: in fact lbls
is not a list but in some result
type. Can you explain why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because, when the list is empty, we may have an error message, which we may want to use later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the code correctly, the error comes from some label-filtering logic that was done by the caller, and we could get rid of this complication by passing the filter to the function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid it would just increase the complication, as the caller would have to understand how this function is going to be called. But do we agree than any change in the API would have to be another PR anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked about how to proceed with further changes below. I agree with the idea of sending separate PRs, but I wished we would converge more quickly on each PR, because otherwise this could drag on indefinitely.
Concretely, would you be willing to approve the PR in its current state, even though we agree it's imperfect, so that follow-up PRs can start from trunk
again? (I think we can iterate more on the documentation comment and I would like to experiment with the filtering logic, but those can go into separate PRs; the problem is that today they would conflict with the changes already in the PR.)
typing/typecore.ml
Outdated
contextual information: an optional [expected_type], and a list of candidates [lbls]. | ||
If [expected_type] is [None], it just returns the head of [lbls]. | ||
Otherwise, it extracts the description directly from the corresponding type definition, | ||
except for extension types, where it has to choose it inside [lbls]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any special logic to handle extension types here, where does that come from? (Also, can we explain why?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the code first tries to look up the answer among lbls, so no specific code path is needed for that. The point I was making here is that, for all other types, the answer will be the one found through the type definition, and the change in order is only for warning purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand what happens for extension types and why that needs to be precised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because for extension types there is no information in the type definitions, so the only possible approach is to collect the candidates in scope.
except for extension types, for which the information is not available,
so that one has to select it among [lbls] instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when an extension type is passed as expected type to disambiguate
? The only control path I can see that matches your explanation is that it takes the Some(tpath, tpath0, principal)
branch, for some reason principal
is false
, so we match on lbls
, we take the first label suggested, it is in fact at the right path, it is not ambiguous, so no warning is shown.
Is that hypothesis correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And also: if this reasoning holds, why would extension types satisfy not principal
in the first place?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the location of the comment about extension types a bit misleading: the disambiguate
function has no special logic for them itself. It is the Constructor
module that does not return any candidates for extension in its lookup_all_from_type
implementations, and similarly it is Env.lookup_all_constructors
that filters duplicate extension constructions from its return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I improve on this in #9196 with a better comment and better placement.
typing/typecore.ml
Outdated
except for extension types, where it has to choose it inside [lbls]. | ||
This code is careful to warn in a number of situations: | ||
* if a non principal type was used for disambiguation [Not_principal] | ||
* if [lbls] contained several valid candidates [Ambiguous_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would that matter if type-based selection is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matters because, when the type is not known principally, resolution without the type could have given a different result.
typing/typecore.ml
Outdated
* if [lbls] contained several valid candidates [Ambiguous_name] | ||
* if the name was chosen out of [scope] [Name_out_of_scope] | ||
Here [scope] contains the labels equal to [lid] in the current environment, | ||
and [lbls] is a (potentially strict) subset of [scope], see [disambiguate_label_by_ids] *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that these two remarks could be helpful to have above the rest of the specification, otherwise it's hard to follow the explanation if we assume that lbls
is just an arbitrary list of labels (where are they coming from?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: reading this API suggests that the code could be clearer if instead of taking a scope, lbls
pair whose relation requires explanations, the code of disambiguate
was parameterized over a filtering function labels -> (labels, labels) result
that it would call itself on scope
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scope
only appears on the previous line, and I'm not sure this information is needed to understand anything else.
The semantics of disambiguate
does not require this inclusion, but if it is not true the fact we first look up among lbls
matters (i.e. we could have a solution in lbls
that would cause a scoping warning if reached through the type).
We can remove the explanation relating lbls
and scope
, and rewrite the explanation above to follow strictly the code:
Otherwise we try to find a description in `lbls` with a matching type,
or, if this fails, extract a description from the type definition itself.
Then we don't have to refer to extension types at all.
By the way, I forgot the last warning:
* if the selected name was not the same as before disambiguation was introduced [Disambiguated_name]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the explanation relating lbls and scope, and rewrite the explanation above to follow strictly the code:
Otherwise we try to find a description in `lbls` with a matching type, or, if this fails, extract a description from the type definition itself.
Then we don't have to refer to extension types at all.
That would be a mistake: if you do that, you're not explaining anything, you're just paraphrasing the code. In this instance I think that the behavior was already obvious from the code (but others might disagree, in which case we can try and clarify the code), but I didn't know why that behavior was chosen.
A good explanation (mentioning extensible types) would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @trefis' point that the comment is closer to a description of the implementation (than to a specification). For me the parts that really helps reading the code is the explanation of what lbls
and scope
are (and maybe we could avoid need with clearer names, eg. candidates_in_scope
and candidates_selection
).
The part on the warnings is somewhat repetitive of the warnings themselves, I will get rid of it.
I'm interested in feedback on the other parts. (Personally I think that having some description is good, it could be shorter, but there is also some confusion, for example calling lbls
a list when their type is not a list.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it allows to keep all lookup logic in Env
, previously it was spread all over the place. You call lookup_all_foo
and get a (foo list, lookup_error) result
. Either you have some foo
s (ideally it would use a non-empty list type, but we don't have a convenient one of those lying around), or you have an error describing why we couldn't find any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point here, I think, is that the caller passes a ('a, error) result
instead of forcing the 'a
itself, because we don't know yet whether the value will in fact be needed (it is only needed if type-based disambiguation is not possible or is not principal), and we want to error only when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think that removing the part on warnings is a good idea: a function name is not a specification. You might rather argue that my description was not precise enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we want to error only when necessary.
That's right yes. Some of this complication is because the code mixes up the disambiguation code, which can work on long identifiers like M.Foo
, with the code for type-based lookup, which only works on short identifiers like Foo
. The long case potentially has interesting errors -- maybe it is M
that doesn't exist rather than Foo
-- but should always print the error if it exists. The short case has only a single error -- i.e. there is no Foo
in scope -- but can ignore the error if it can find the constructor via the type.
typing/typecore.ml
Outdated
let warn_if_ambiguous lbl rest = | ||
Printtyp.Conflicts.reset (); | ||
let paths = ambiguous_types env lbl rest in | ||
let expansion = | ||
Format.asprintf "%t" Printtyp.Conflicts.print_explanations in | ||
(* warn if there were multiple candidates in scope *) | ||
if paths <> [] then | ||
warn lid.loc | ||
(Warnings.Ambiguous_name ([Longident.last lid.txt], | ||
paths, false, expansion)) | ||
in | ||
let warn_non_principal () = | ||
let name = Datatype_kind.label_name kind in | ||
warn lid.loc | ||
(Warnings.Not_principal | ||
("this type-based " ^ name ^ " disambiguation")) | ||
in | ||
let warn_out_of_scope tpath = | ||
let path_s = | ||
Printtyp.wrap_printing_env ~error:true env | ||
(fun () -> Printtyp.string_of_path tpath) in | ||
warn lid.loc | ||
(Warnings.Name_out_of_scope (path_s, [Longident.last lid.txt], false)) | ||
in | ||
let warn_disambiguated_name lbl scope = | ||
match scope with | ||
| Ok ((lab1,_) :: _) when lab1 == lbl -> () | ||
| _ -> | ||
Location.prerr_warning lid.loc | ||
(Warnings.Disambiguated_name (get_name lbl)) | ||
in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR: I refrained from saying so earlier, but I'm curious to see what the code would look like if these helpers were lifted out. Right now they are introducing 30 lines of noise between the code and its explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit that does it. It's not very nice because of the "warn" parameter -- it feels like code that was lambda-lifted by an automated tool rather than written by a human. I'm not convinced this is the best move, but let's wait to see how we like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For once, I agree with @gasche: this code is about the warnings, which are only meaningful in that context, so moving them out does not improve things. Maybe we would be better off with a where
syntax, but it was dropped in ocaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think that @trefis' idea was also that sometimes, conciseness brings clarity, and the version with the warning code lifted out could result in a more brain-sized function that is easier to read and improve further.) I did think about using let rec ... and ...
but I think it could be confusing.
The discussion keeps suggesting small refactorings to do, but I am worried that we will end up with a PR with too many commits that result in a large diff that people don't want to review from scratch. Could we either:
|
typing/typecore.ml
Outdated
(Warnings.Name_out_of_scope (path_s, [Longident.last lid.txt], false)) | ||
|
||
(* warn if the selected name is not the last introduced in scope | ||
-- in thes cases the resolution is different from pre-disambiguation OCaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these
I pushed a new PR, #9196, which implements my idea of turning the Note: merging the current PR would make it sensibly easier to work on #9196 and also for you to review it. My personal assessment is that the remaining questions in the current discussion are small (or a matter of personal preference, should the warning functions be in or out?), compared to the improvement in the PR itself, so that we could and should stop here for this PR and iterate in further PRs. If you agree with this assessment, please approve the PR. |
I'm willing to approve provided you fix the typo @garrigue just spotted. (And that you do not use the indentation style that is generating pushback? :>) |
To make my last message clearer: I agree with your assessment that this PR is already a noticeable improvement. |
231bdb9
to
7d98133
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the current state, including the indentation.
But please get a consensus or revert to standard indentation before merging.
7d98133
to
2eee3ca
Compare
(Joint work with Jacques Garrigue and Thomas Refis)
(Joint work with Jacques Garrigue and Florian Angeletti)
(request from Jacques Garrigue)
match List.filter check_closed labels with | ||
| [] -> Error labels | ||
| labels -> | ||
Ok labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the two checks not fused together into List.filter (fun x -> check_ids x && check_closed x)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not equivalent because of the error discipline, where we return the last non-empty subset. If the first check is non-empty but the second is empty, your proposal would return the whole list instead of the first filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I missed the shadowing of labels, this is a point against your indentation style here: the binding of labels should not be more indented that the body of the cases.
Merged. This PR has been blocked for too long and there are worse flight of fancies than this ephemeral indentation style. |
This PR sits on top of of #9178. It contains 2 commits:
@Octachron, would you be willing to review it?
@trefis, would you be willing to review it?