Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor lookup functions #2127

Merged
merged 4 commits into from Aug 21, 2019

Conversation

@lpw25
Copy link
Contributor

commented Oct 30, 2018

This PR refactors the lookup functions used for searching the environment for a Longident.t:

  • It moves all the code from the Typetexp.find_* functions into the coressponding Env.lookup_* functions. This puts all the handling of lookup errors, deprecation warnings and usage warnings into a single place.
  • The lookup_* functions now always take a location
  • The lookup_* functions now take an optional use parameter indicating whether this should be considered a use of the item from the perspective of usage or deprecation warnings.
  • For cases that should not cause an error, I've added Env.find_*_by_name functions that raise Not_found rather than an error if the item can't be found.
  • Other minor cleanups around looking things up from the environemnt...

These changes are not a complete no-op. They change the locations for constructor usage warnings, improve the semantics of constructor usage warnings for private constructors, and they improve a couple of error messages related to instance variables. I've added some tests to make sure that all these changes are demonstrated.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Oct 30, 2018

The suggested changes look great, and after a first quick review pass, I don't have much comments on the commits. Two remarks:

  • What about returning options instead of raising Not_found for *_by_name functions?

  • To help understand the impact on error messages, it would be good to add the new expect tests in a commit before the actual changes (of course, the reviewers could just copy/paste the tests in a toplevel...).

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Oct 30, 2018

What about returning options instead of raising Not_found for *_by_name functions?

I'd prefer to keep the find_*_by_name functions consistent with the find_* functions. So either both should return options or both should use Not_found.

To help understand the impact on error messages, it would be good to add the new expect tests in a commit before the actual changes (of course, the reviewers could just copy/paste the tests in a toplevel...).

I did.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Oct 30, 2018

I did

Possibly Github's interface doesn't make this clear because the test commits have a later commit date than the main commit, but they are in fact before the main commit in the history.

@@ -117,11 +117,6 @@ end = struct
end
;;
[%%expect {|
Line 5, characters 2-20:

This comment has been minimized.

Copy link
@alainfrisch

alainfrisch Oct 31, 2018

Contributor

Isn't this one a regression? It seems the warning was right that the constructor is never used to build values.

This comment has been minimized.

Copy link
@lpw25

lpw25 Oct 31, 2018

Author Contributor

This is a deliberate change. The warning doesn't make sense for a constructor that was defined as private. Obviously values of that type are never constructed: it was defined as an empty type.

Occasionally people use such types to make empty types for GADT indices that are known to be incompatible. Currently if they do this locally to a module they get a warning that cannot be silenced without using attributes, this allows the warning to be reasonably silenced.

Note that this was already the case if the type was exported as a private type. So code like:

module M = sig type t = private T end = struct type t = private T end

specifically doesn't raise the warning about the declaration in the struct. However this case is completely isomorphic to the other one -- we still know that no-one is going to create a value with this constructor.

This comment has been minimized.

Copy link
@alainfrisch

alainfrisch Oct 31, 2018

Contributor

Ok, it makes sense, but the rationale is not straightforward and writing it down (perhaps just as a comment somewhere in the code) would be useful for future reference. (Sorry if this is already documented.)

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2018

Right, sorry. Indeed I was confused by the UI.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2018

check-typo failures:

./testsuite/tests/typing-objects/Tests.ml:950.1: [white-at-eof] empty line(s) at EOF
./typing/env.ml:78.81: [long-line] line is over 80 columns
./typing/typecore.ml:750.81: [long-line] line is over 80 columns
./typing/typecore.ml:901.81: [long-line] line is over 80 columns
./typing/typecore.ml:2833.81: [long-line] line is over 80 columns
./typing/typecore.ml:2936.81: [long-line] line is over 80 columns

@lpw25 lpw25 force-pushed the lpw25:lookup-errors branch from 2858419 to 434584c Nov 3, 2018

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Nov 3, 2018

Fixed the check-typo failures.

@trefis
Copy link
Contributor

left a comment

As the code began to rot, I also gave this a first round of review.
I left a few notes here and there, but overall I agree with Alain: this looks good.

If you manage to rebase in the next few days, I'll be happy to finish reviewing and approve this.

PS: check_typo is still unhappy.

with
| Not_found -> aux (i+1)
if Env.bound_value name env then aux (i+1)
else name

This comment has been minimized.

Copy link
@trefis

trefis Mar 20, 2019

Contributor

The behavior is clearly changing here, but this looks like a bugfix.

This comment has been minimized.

Copy link
@alainfrisch

alainfrisch Mar 20, 2019

Contributor

What about adding a test that illustrates the fix?

@@ -28,8 +28,7 @@ exception Already_bound

type error =
Unbound_type_variable of string
| Unbound_type_constructor of Longident.t
| Unbound_type_constructor_2 of Path.t
| Undefined_type_constructor of Path.t

This comment has been minimized.

Copy link
@trefis

trefis Mar 20, 2019

Contributor

❤️

else begin
match using with
| None -> nothing
| Some f ->

This comment has been minimized.

Copy link
@trefis

trefis Mar 20, 2019

Contributor

I'm not a fan of this new version. I think I'd rather change find_all to pass None when mark = false


let rec find_all name tbl =
let rec find_all ~mark name tbl =

This comment has been minimized.

Copy link
@trefis

trefis Mar 20, 2019

Contributor

Btw, I don't think this function is ever called with ~mark:false.
The API might be better / more consistent this way, but I'm not sure I'm convinced.

bytecomp/matching.ml Outdated Show resolved Hide resolved

(* Narrowing unbound identifier errors. *)

let rec narrow_unbound_lid_error : 'a. _ -> _ -> _ -> _ -> 'a =

This comment has been minimized.

Copy link
@trefis

trefis Mar 20, 2019

Contributor

Very glad to see this go. The new way you propagate errors is much better than the replaying of various operations that was going on here.

@@ -885,7 +934,8 @@ let rec find_module_descr path env =
| Papply(p1, p2) ->
begin match get_components (find_module_descr p1 env) with
Functor_comps f ->
!components_of_functor_appl' f env p1 p2
let loc = Location.(in_file !input_name) in

This comment has been minimized.

Copy link
@trefis

trefis Mar 20, 2019

Contributor

I was about to question that Location.in_file, asking if there isn't a better location we could propagate down to here. As it turns out, it has simply been pulled out of components_of_functor_appl, and the other calls do pass more precise locations. So I guess it's fine.

| Path.Pdot (p1, s, _) ->
Longident.Ldot (lid_of_path p1, hash ^ s)
| Path.Papply (p1, p2) ->
Longident.Lapply (lid_of_path ~hash p1, lid_of_path p2)

This comment has been minimized.

Copy link
@trefis

trefis Mar 20, 2019

Contributor

Passing ~hash when recursing on p1 was just nonsense, wasn't it?

val bound_type: string -> t -> bool
val bound_modtype: string -> t -> bool
val bound_class: string -> t -> bool
val bound_cltype: string -> t -> bool

This comment has been minimized.

Copy link
@trefis

trefis Mar 20, 2019

Contributor

FTR: it looks like only bound_value is ever used in the codebase. But I'd keep the functions here: giving compiler-libs a decent API seems good.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

I've rebased this and addressed review comments.

Some parts of the rebase pushed parts of the code past my personal complexity threshold. So I decided to make some other changes to Env's internals to try to make the code simpler. I replaced some tuples with records, and made some parts of the environment use the same types instead of gratuitously different ones. I think I've managed to sufficiently decrease the amount of duplicated complex code through these changes.

@trefis
Copy link
Contributor

left a comment

I looked at this once more, apart from the minor comments I left here and there, there are two things that bother me:

  • I believe there might be semantic changes that are not described anywhere.
    This refers to the uses of find_module_by_name, lookup_module and lookup_module_path (and the various values of the load parameter), which I don't think line up with the previous version of the code.
    Can you comment?
  • I'm not sure when to use the new find_xxx_by_name functions, compared to lookup_xxx: there's nothing in the API/types helping me make that choice, and nothing in the doc either.
    In this PR you say "use them when there shouldn't be an error", and I could almost buy it: "OK, let's just use this in printtyp, and when I know for sure the type is there" (e.g. selfpat-*).
    But if there won't be an error, why should I care anyway?
    Also, I can see that approx_type uses find_type_by_name, but that approx_modtype uses lookup_modtype ~use:false; I know there was already a similar difference in the previous version of the code, I'm just pointing out that the principle you stated might not be the right one.
Makefile.dev Outdated Show resolved Hide resolved
?use:bool -> loc:Location.t -> string -> t ->
Path.t * Asttypes.mutable_flag * string * type_expr

(* Lookup by long identifiers. Raise [Not_found] if there would be an error. *)

This comment has been minimized.

Copy link
@trefis

trefis Aug 7, 2019

Contributor

It's not clear from the mli how the find_XXX_by_name functions are different from lookup_XXX.
This is linked to the fact that we don't know what lookup_XXX raises, but also that don't know whether find_... is going to count as an use or not.

This comment has been minimized.

Copy link
@lpw25

lpw25 Aug 15, 2019

Author Contributor

I've added a comment to make things clearer

typing/env.ml Show resolved Hide resolved
Env.add_module ~arg:true id Mp_present dummy env
(* cf #5965 *)
Env.enter_unbound_module (Ident.name id)
Mod_unbound_illegal_recursion env

This comment has been minimized.

Copy link
@trefis

trefis Aug 7, 2019

Contributor

❤️


and value_unbound_reason =
| Val_unbound_instance_variable
| Val_unbound_ghost_recursive

This comment has been minimized.

Copy link
@trefis

trefis Aug 7, 2019

Contributor

❤️

typing/env.mli Show resolved Hide resolved
let find_name_module ~mark name tbl =
match IdTbl.find_name wrap_module ~mark name tbl with
| x -> x
| exception Not_found when not (Current_unit_name.is name) ->

This comment has been minimized.

Copy link
@trefis

trefis Aug 7, 2019

Contributor

Shouldn't this be when not (Current_unit_name.is name) && !Clflags.transparent_modules?

This comment has been minimized.

Copy link
@lpw25

lpw25 Aug 15, 2019

Author Contributor

I don't think so. That was a weird discrepancy between lookup_module and lookup_module_descr in the old code. It meant that in theory M.t could sometimes be resolved when M could not, although in practice I don't think this would have happened except in very strange corner cases involving parallel builds. The new code is more obviously consistent.

@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

I didn't do a proper review, but I had a quick look at the new API, and I like it quite a lot. It's a clear improvement over the previous one.

@lpw25 lpw25 force-pushed the lpw25:lookup-errors branch from c47dfc3 to 81e8d8f Aug 15, 2019

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2019

I've rebased again and updated based on the review comments.

I believe there might be semantic changes that are not described anywhere.
This refers to the uses of find_module_by_name, lookup_module and lookup_module_path (and the various values of the load parameter), which I don't think line up with the previous version of the code.
Can you comment?

Mostly, Typetexp.find_foo has been replaced by Env.lookup_foo and direct uses of Env.lookup_foo have been replaced by Env.find_foo_by_name (or Env.bound_foo if the result was ignored). The exceptions are:

  • The Typetexp.find_foos in Typemod.approx_modtype have been replaced by lookup_foo ~use:false. In principle, this changes the warnings behaviour, although the existing use of Warnings.without_warnings probably makes it a no-op. The change is correct because such uses are going to be looked up again when we typecheck the module types properly later.

  • The Env.lookup_type in Typecore.approx_type is replaced by Env.lookup_type ~use:false. This changes the error behaviour of this lookup. I believe that it is correct since the right behaviour when the type is not bound is to emit an error.

  • The Env.lookup_value in Typecore.type_expression is replaced by Env.lookup ~use:false. In principle, this changes the error behaviour, although in practice we never get this far in the case of an error. It seemed more correct to use lookup_value than find_value_by_name because if somehow we couldn't find the value then the right behaviour would be to emit a proper error.

In addition, the uses of lookup_module_descr ~mark:true in Env.find_all and Env.find_all_simple_list are replaced by lookup_module_components ~errors:false ~use:false. This changes the marking behaviour of these functions. The change is correct since these functions are only used for spellchecking hints and should not be marking things as used.

@trefis

trefis approved these changes Aug 19, 2019

Copy link
Contributor

left a comment

I believe I have reviewed everything, and it all makes sense to me.

Please add a changelog entry and then this should be good to merge.

@trefis

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

This is going to be a pain to rebase, and I don't want to review it again in 6 months. Apparently I can't push to @lpw25's repository, so I'll just merge this now and add the changelog entry after the fact.

@trefis trefis merged commit f2db8ca into ocaml:trunk Aug 21, 2019

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@trefis

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

Changelog entry added in f69e08e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.