Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add hidden directory (proposed in RFC 31) #11989

Closed
wants to merge 2 commits into from

Conversation

bobot
Copy link
Contributor

@bobot bobot commented Feb 3, 2023

It is a POC for RFC 31:

  • The coded criteria verify that the hidden module name "doesn't appear" in the source:
    - In the initial environment the hidden modules are added as Mod_persistent { hidden = true } instead of Mod_persistent.
    - In the lookup_ident_module, those modules raise an error. I think this function is used only for the user provided identifiers, since modules that are not found raises an error which is not the case if it is a name that comes from a signature.
  • The -I and -H are gathered in the same list of path in order to handle the priorities in a consistent way. That change the type of Clflags.include_dirs . The first commit propages this change.
  • -H does not work yet with -no-alias-deps because in this case initial modules are not always added to the environment. So I will change that separately with different fix for Fix reproducibility for -no-alias-deps #9991.
  • Tests are added that checks different cases (missing, present, hidden module and direct or indirect use of a module)

@gasche
Copy link
Member

gasche commented Feb 3, 2023

I worked a bit with @bobot on this prototype implementation of the RFC (the -Ihidden option, shortened in this PR to -H), but I find his description way too short to my taste. Some more details.

There is a testsuite by François that checks that the various "obvious" cases work as intended. This is very reassuring.

The general idea of the implementation is as follows:

  • when creating the load path from -I and -H options, remember whether a directory is added "hidden" or "visible"
  • when populating the initial environment by scanning the directories of the load path, record compilation units corresponding to .cmis with Mod_persistent of { hidden : bool } instead of just Mod_persistent as before
  • in the Env.lookup_* functions which convert from a parsetree Longident.t to a typedtree Path.t, check if the path starts with a compilation unit name that is "hidden" and error in that case

Subtlety: which lookups to check?

I thought at first that we should instrument all lookup_* functions, but the PR instruments only the lookup functions for module identifiers. I am not fully sure that it is correct. For example, if the user writes the variant constructor Toto in the source, which gets disambiguated to M.Toto by the type-checker, and M was hidden by the user, should we error in this case? The current code would not error at this point. I think that the following should fail, but @bobot was not convinced:

(* A.ml : imported hidden *)
type t = Toto

(* B.mli: imported visible *)
val f : A.t -> unit

(* C.ml: being type-checked *)
let () = B.f Toto

It depends on the specification that we want for hidden compilation units:

  • my proposal: it is forbidden to mention all names defined by a hidden compilation unit
  • @bobot's proposal: it is forbidden to mention the name of a hidden compilation unit

Subtlety: use:bool

The Env.lookup_* functions take a propagate a use:bool parameter that indicates whetheer the lookup should be counted as a "use" of the name by the user, in the context of "unused ..." warnings. I thought that we should use this use parameter in this PR, by only failing on hidden-module lookups with use:true and not use:false. But currently the PR does perform this extra check, and it seems to work well. I have been unable to create an example where there is a use:false lookup for a module, but no use:true lookup somewhere else in the type-checker. (But I suspect that those exist, otherwise this parameter would be completely useless, and surely we don't have that in the type-checker codebase, right?)

Subtlety: phantom compilation units

We track the "hidden-or-visible" information in the typing environment, when it is populated from the load path. But the OCaml type-checker accepts references to compilation units that are not part of the current typing environment, and treats them basically as those that are part of the envrionment:

ocaml/typing/env.ml

Lines 828 to 833 in 79fffac

let find_name_module ~mark name tbl =
match IdTbl.find_name wrap_module ~mark name tbl with
| x -> x
| exception Not_found when not (Current_unit_name.is name) ->
let path = Pident(Ident.create_persistent name) in
path, Mod_persistent

For those compilation units, we don't know whether they should be treated as visible or hidden (there is no information in the environment).

One might naively think that this logic only fires in the case of "missing .cmi files", compilation units that are missing and are treated as black boxes. (Black boxes instead of failing because: dune currently implements its own poor man's -Ihidden by ensuring that certain compilation units are not in the load path.) But in fact this is wrong, some compilation units that are in the load path are intentionally not inserted in the environment:

ocaml/typing/env.ml

Lines 853 to 861 in 79fffac

let modules =
(* With [-no-alias-deps], non-material additions should not
affect the environment at all. We should only observe the
existence of a cmi when accessing components of the module.
(See #9991). *)
if material || not !Clflags.transparent_modules then
IdTbl.add id Mod_persistent env.modules
else
env.modules

@bobot and myself believe that the PR is currently incorrect in this case of "phantom" compilation units, but we are not sure how to fix it. There are two approaches:

  • One would be to try another approach to issue Fix reproducibility for -no-alias-deps #9991, as mentioned by @bobot, to provide the guarantee that the only case of phantom compilation units are those that correspond to missing .cmi files, never those that have a .cmi in the load path. (Our idea is, because Fix reproducibility for -no-alias-deps #9991 was about observed differences of sharing in certain cases, to add a memo-table to Ident.create_persistent to hashcons all instances of each compilation-unit name in the environment summary.) I am not sure that this work, but if it does it would provide a nice conceptual simplification compared to the current hard-to-understand logic for Fix reproducibility for -no-alias-deps #9991: I like the guarantee that only missing cmis create phantom compilation units.
  • Another would be to store the "hidden or visible?" information for compilation units separately in the typing environment (not in the Mod_persistent metadatata). This side-steps the problem, and does not simplify the existing code.

@gasche
Copy link
Member

gasche commented Feb 3, 2023

(cc @lpw25, who worked on #9991, and could possibly tell us quickly whether our idea of fixing #9991 by adding more sharing is broken or might work.)

@lpw25
Copy link
Contributor

lpw25 commented Feb 3, 2023

I don't love the idea of fix #9991 a different way. The advantage of the current approach is that it makes sure that the existence or not of unused immaterial cmi files can't possibly affect the result of compilation. Fixing one particular case with hash-consing might not have that property if there turn out to be other ways it can affect things.

I think there is a much easier solution here:

  1. Don't mix the two lists of imports. Have -I always take precedence over -H.
  2. Don't put the -H files in the initial environment. That is only done in the first place to allow shadowing of -opened modules, which can't possibly apply to the hidden imports. This would also be a noticeable performance improvement -- building the table of files shows up clearly on performance profiles of the compiler and merlin when there are many dependencies.
  3. When looking up using a lookup_foo function you only use modules in the environment.
  4. When looking up using a find_foo function you can also look on the file system.

Then there is no need to record which modules are hidden or not. It is just that lookup uses one environment and find another.

@lpw25
Copy link
Contributor

lpw25 commented Feb 3, 2023

  1. When looking up using a lookup_foo function you only use modules in the environment.
  2. When looking up using a find_foo function you can also look on the file system.

I forgot that we only put the material ones in the environment now, so that's not quite right. It should be:

  1. When looking up using a lookup_foo function you only use modules in the environment and the -I directories.
  2. When looking up using a find_foo function you can also look in the -H directories.

@gasche
Copy link
Member

gasche commented Feb 3, 2023

Yes, we tried your earlier suggestion of "When looking up using a lookup_foo function you only use modules in the environment.', and it did not work.

We should try your revised suggestion. If I understand correctly, we need to send the "are we accessing from find_* or lookup_*?" information over to the Persistent_env functions that actually lookup compilation units (Persistent_env.{check,find}), and those in turn should (transitively) call Load_path.find_uncap with this information, so that we lookup in just the visible environment or in both the visible and hidden environments.

@lpw25
Copy link
Contributor

lpw25 commented Feb 3, 2023

Yeah, that sounds right.

@v-gb
Copy link
Contributor

v-gb commented Feb 5, 2023

For conciseness, I'll use I as the basename of modules findable through -I, X for -X, and define things in other modules with qualified paths like val Foo.x : ....

@gasche was saying it's debatable whether this should be a type error or not:

type X.t = C
val I.f : X.t -> unit
let () = I.f C

I think this should be accepted, for two reasons.

First, even if X.C is defined out of scope, it could have been aliased in I, and it's difficult to argue that rejecting I.f C but accepting I.f I.C makes sense. The typer could try finding any alias in any I module, I suppose, but that doesn't seem like the way to go.

I'll also mention that the client code can manipulate C without ever mentioning X, so it's not like the typer is literally providing the X. that the client code can't write. Maybe that happens internally in the typer, but if so, that seems like a property of the compiler, not the language:

module X = struct type t = C end
module I = struct let f (_ : X.t) = () end

(* no mention of X from here *)
module type T = sig type t end
let type_of_arg (type a) (_ : a -> unit) : (module T with type t = a) =
  (module struct type t = a end)
module Type_of_f_arg = (val type_of_arg I.f)
let _ : Type_of_f_arg.t = C

Second, it's probably useful to think about the purpose of this change: we want to force the client code to state it uses X if it does, rather than get access to X merely because I depends on it, so that I can evolve without breaking its clients. But if the client code is using references to X in I's interface, that's fine because if those change it's a breaking change anyway. The difference between C and X.C is probably going to feel a bit arbitrary, but at least the distinction is easily explained (it's syntactic).

@bobot
Copy link
Contributor Author

bobot commented May 16, 2023

Thank you @ccasin for taking the ball I dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants