New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the user to shadow sub-modules of Pervasives #1513

Merged
merged 1 commit into from Feb 2, 2018

Conversation

Projects
None yet
6 participants
@diml
Member

diml commented Dec 4, 2017

Before this PR:

$ echo "let x = 42" > largeFile.ml
$ ocamlc -c largeFile.ml 
$ echo "let x = LargeFile.x" > foo.ml
$ ocamlc -c -I . foo.ml 
File "foo.ml", line 1, characters 8-19:
Error: Unbound value LargeFile.x

After this PR:

$ echo "let x = 42" > largeFile.ml
$ ocamlc -c largeFile.ml 
$ echo "let x = LargeFile.x" > foo.ml
$ ocamlc -c -I . foo.ml 

Sub-modules of Pervasives always shadow modules coming from the include directories, so users cannot redefine them. This PR changes that by filtering the modules that are added to the environment when opening Pervasives.

This is especially important for #1010, since the main goal of #1010 is to allow to add new modules to the standard library.

@diml diml referenced this pull request Dec 4, 2017

Merged

Prefix the stdlib #1010

@nojb

nojb requested changes Dec 5, 2017 edited

I read the code and I think the patch is safe and correct. Two small points (the second one was raised in #1010):

  • A similar change to initial_env needs to be done in ocamldoc/odoc_analyse.ml.
  • This patch introduces a small inconsistency in the handling of case-sensitivity of .cmi filenames. In case-insensitive filesystems (Windows, Mac) one can find module Foo in FOO.CMI for example. However, if we make a file LARGEFILE.CMI visible using an -I flag the module Pervasives.LargeFile will not be shadowed.
@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Dec 5, 2017

Member

Ok. I tried the other approach of filtering the modules to add to the environment using Misc.find_in_path_uncap. This approach doesn't introduce the inconsistency.

Regarding performances, I tried passing 100 include directories and observed a 0.001s slowdown compared to master, so it's negligible.

Member

diml commented Dec 5, 2017

Ok. I tried the other approach of filtering the modules to add to the environment using Misc.find_in_path_uncap. This approach doesn't introduce the inconsistency.

Regarding performances, I tried passing 100 include directories and observed a 0.001s slowdown compared to master, so it's negligible.

@nojb

nojb approved these changes Dec 6, 2017

@nojb

This comment has been minimized.

Show comment
Hide comment
@nojb

nojb Dec 6, 2017

Contributor

This looks good to me. Unless someone objects we could merge this and move onwards to #1010! @gasche? @alainfrisch?

Contributor

nojb commented Dec 6, 2017

This looks good to me. Unless someone objects we could merge this and move onwards to #1010! @gasche? @alainfrisch?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 6, 2017

Member

I am nervous about this change, but also I haven't had the time to review it to understand what it does, and I am not sure that I will have this time today or in the next few days. I would rather wait before inclusion.

(I am actually also a bit nervous about #1010, but so far I have been happy to let other people take decisions for this. Honestly, for this kind of delicate changes with long-term impact, I would rather see a somewhat more formal process than an undirected GPR discussion.)

Member

gasche commented Dec 6, 2017

I am nervous about this change, but also I haven't had the time to review it to understand what it does, and I am not sure that I will have this time today or in the next few days. I would rather wait before inclusion.

(I am actually also a bit nervous about #1010, but so far I have been happy to let other people take decisions for this. Honestly, for this kind of delicate changes with long-term impact, I would rather see a somewhat more formal process than an undirected GPR discussion.)

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 6, 2017

Member

I would like to see a specification of what this change does in slightly higher-level terms than its implementation.

My mental model of the current module lookup semantics in OCaml is that include paths are passed, in some order, to the compiler, and that the initial module-name environment of a program refers, for each possible module name, to the "first" compilation unit with a compatible name in include paths, for a notion of "first" defined by a fixed priority order. The standard library and the current working directory are implicitly added as include directories in some defined position.
Is this correct? What is the priority order? What are the priorities of stdlib and the current working directory?

My understanding of the current semantics of Pervasives and -open flags is that they correspond to open statements implictly added within the initial module environment defined by include directories as defined above. Is this correct?

Assuming that this mental model of the current semantics is more or less correct, what is the intended semantics, in these terms, after this patch is applied? Is it documented somewhere that the user can read about?

Member

gasche commented Dec 6, 2017

I would like to see a specification of what this change does in slightly higher-level terms than its implementation.

My mental model of the current module lookup semantics in OCaml is that include paths are passed, in some order, to the compiler, and that the initial module-name environment of a program refers, for each possible module name, to the "first" compilation unit with a compatible name in include paths, for a notion of "first" defined by a fixed priority order. The standard library and the current working directory are implicitly added as include directories in some defined position.
Is this correct? What is the priority order? What are the priorities of stdlib and the current working directory?

My understanding of the current semantics of Pervasives and -open flags is that they correspond to open statements implictly added within the initial module environment defined by include directories as defined above. Is this correct?

Assuming that this mental model of the current semantics is more or less correct, what is the intended semantics, in these terms, after this patch is applied? Is it documented somewhere that the user can read about?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 6, 2017

Member

(As a meta comment: I am surprised that a patch in this design space would make the behavior more correct by adding more code. If we are looking for consistency and predictability, adding more logic is a bit dangerous, and I would rather expect a patch to make things right by removing code, or at most by moving existing code around.)

Member

gasche commented Dec 6, 2017

(As a meta comment: I am surprised that a patch in this design space would make the behavior more correct by adding more code. If we are looking for consistency and predictability, adding more logic is a bit dangerous, and I would rather expect a patch to make things right by removing code, or at most by moving existing code around.)

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Dec 7, 2017

Member

I agree this shouldn't be discussed just on github. #1010 was discussed and accepted during a developer meeting, but we can discuss it more if doubts remain.

Regarding the current semantic, this is basically what you say except that the current directory is not implicitly added. The stdlib directory is added first and so is always considered after all the user supplied include directories when looking up a module.

My understanding of the current semantics of Pervasives and -open flags is that they correspond to open statements implictly added within the initial module environment defined by include directories as defined above. Is this correct?

Yes

Assuming that this mental model of the current semantics is more or less correct, what is the intended semantics, in these terms, after this patch is applied? Is it documented somewhere that the user can read about?

After the patch is applied, user supplied include directories have precedence over sub-modules of Pervasives.

Here is one way to formalize this a bit more:

  • you have an empty environment: empty
  • you can add the contents of a directory to an environment, which means that for any x.cmi file in this directory, a new toplevel module X is added to the environment and the definition of X is read from x.cmi: add_dir(env, dir)
  • you can open a module that is already in the environment, and all its component are added as toplevel component in the environment: open(env, module)

Assuming the user has passed -I d1 ... -I dn -open M1 ... -open Mm to the compiler, currently the initial environment is built as follow:

env <- empty

env <- add_dir(env, stdlib_dir)
env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, Pervasives)
env <- open(env, M1)
...
env <- open(env, Mm)

After this patch, the initial environment is built as follow:

env <- empty

env <- add_dir(env, stdlib_dir)
env <- open(env, Pervasives)

env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, M1)
...
env <- open(env, Mm)

Note that the order in which the -I options and the -open options are passed doesn't matter, opens are always interpreted after include directories.

So what this patch provides is the ability to shadow sub-modules of Pervasives. -open ... options will still shadow everything, but it's not as bad as they are under the control of the user.

(As a meta comment: I am surprised that a patch in this design space would make the behavior more correct by adding more code. If we are looking for consistency and predictability, adding more logic is a bit dangerous, and I would rather expect a patch to make things right by removing code, or at most by moving existing code around.)

I agree this is a bit surprising. A much more straightforward approach would be to implement add_dir as a function that eagerly fills the environment with pointers to the files. However, due to implementation details, this method doesn't necessarily yields the environment. This is because Sys.file_exists (Filename.concat dirname basename) = Array.mem basename (Sys.readdir dirname) is not true on all file-systems.

It's possible that this change is worth doing though, and it's likely that namespaces will require it, but it'd require more discussion.

Member

diml commented Dec 7, 2017

I agree this shouldn't be discussed just on github. #1010 was discussed and accepted during a developer meeting, but we can discuss it more if doubts remain.

Regarding the current semantic, this is basically what you say except that the current directory is not implicitly added. The stdlib directory is added first and so is always considered after all the user supplied include directories when looking up a module.

My understanding of the current semantics of Pervasives and -open flags is that they correspond to open statements implictly added within the initial module environment defined by include directories as defined above. Is this correct?

Yes

Assuming that this mental model of the current semantics is more or less correct, what is the intended semantics, in these terms, after this patch is applied? Is it documented somewhere that the user can read about?

After the patch is applied, user supplied include directories have precedence over sub-modules of Pervasives.

Here is one way to formalize this a bit more:

  • you have an empty environment: empty
  • you can add the contents of a directory to an environment, which means that for any x.cmi file in this directory, a new toplevel module X is added to the environment and the definition of X is read from x.cmi: add_dir(env, dir)
  • you can open a module that is already in the environment, and all its component are added as toplevel component in the environment: open(env, module)

Assuming the user has passed -I d1 ... -I dn -open M1 ... -open Mm to the compiler, currently the initial environment is built as follow:

env <- empty

env <- add_dir(env, stdlib_dir)
env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, Pervasives)
env <- open(env, M1)
...
env <- open(env, Mm)

After this patch, the initial environment is built as follow:

env <- empty

env <- add_dir(env, stdlib_dir)
env <- open(env, Pervasives)

env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, M1)
...
env <- open(env, Mm)

Note that the order in which the -I options and the -open options are passed doesn't matter, opens are always interpreted after include directories.

So what this patch provides is the ability to shadow sub-modules of Pervasives. -open ... options will still shadow everything, but it's not as bad as they are under the control of the user.

(As a meta comment: I am surprised that a patch in this design space would make the behavior more correct by adding more code. If we are looking for consistency and predictability, adding more logic is a bit dangerous, and I would rather expect a patch to make things right by removing code, or at most by moving existing code around.)

I agree this is a bit surprising. A much more straightforward approach would be to implement add_dir as a function that eagerly fills the environment with pointers to the files. However, due to implementation details, this method doesn't necessarily yields the environment. This is because Sys.file_exists (Filename.concat dirname basename) = Array.mem basename (Sys.readdir dirname) is not true on all file-systems.

It's possible that this change is worth doing though, and it's likely that namespaces will require it, but it'd require more discussion.

@Drup

This comment has been minimized.

Show comment
Hide comment
@Drup

Drup Dec 7, 2017

Contributor

I just want to say that this explanation is very good and should definitely be added to the documentation. :)

Contributor

Drup commented Dec 7, 2017

I just want to say that this explanation is very good and should definitely be added to the documentation. :)

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Dec 7, 2017

Contributor

Another way to see it would be a variant of open, let's say open* which only adds to the environment components which are not already defined in the current environment, i.e. an open that doesn't shadow anything. The proposed change is to use open* instead of open for Pervasives.

Would open* be useful in other contexts? In some cases, I think it could make the code more robust. Imagine that a library exposes some functions and operators one wants to use in an unqualified way:

let log x = print_endline x; x
....
open TheLib
log (foo ++ bar) ** baz

Then the next version of the library adds a toplevel log function, which shadows our own and could either break type-checking or change the semantics silently. Luckily we now have a warning that detects such shadowing caused by open, but the client code is still broken after a change to the library. It would be more robust against addition to MyLib had it used open*.

I'm not arguing in favor of adding open* in the language at this point, but considering its possible general usefulness is a good criterion to see if the solution proposed here is not completely crazy / ad hoc.

Contributor

alainfrisch commented Dec 7, 2017

Another way to see it would be a variant of open, let's say open* which only adds to the environment components which are not already defined in the current environment, i.e. an open that doesn't shadow anything. The proposed change is to use open* instead of open for Pervasives.

Would open* be useful in other contexts? In some cases, I think it could make the code more robust. Imagine that a library exposes some functions and operators one wants to use in an unqualified way:

let log x = print_endline x; x
....
open TheLib
log (foo ++ bar) ** baz

Then the next version of the library adds a toplevel log function, which shadows our own and could either break type-checking or change the semantics silently. Luckily we now have a warning that detects such shadowing caused by open, but the client code is still broken after a change to the library. It would be more robust against addition to MyLib had it used open*.

I'm not arguing in favor of adding open* in the language at this point, but considering its possible general usefulness is a good criterion to see if the solution proposed here is not completely crazy / ad hoc.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 7, 2017

Member

@diml: hanks for the explanation! (I didn't mean to delay any decision taken for #1010, and I know that it's a lot of painful grunt work for you -- I'll stick to discussing the current PR.)

Your explanation raises a question: when -open was introduced, I thought that the fact that it allowed third-party libraries to act "just like Pervasives" was an argument in favor of the feature. (For reference, this is briefly discussed in a thread of my mailbox named "Allowing to use module aliases in place of -pack".)

Member

gasche commented Dec 7, 2017

@diml: hanks for the explanation! (I didn't mean to delay any decision taken for #1010, and I know that it's a lot of painful grunt work for you -- I'll stick to discussing the current PR.)

Your explanation raises a question: when -open was introduced, I thought that the fact that it allowed third-party libraries to act "just like Pervasives" was an argument in favor of the feature. (For reference, this is briefly discussed in a thread of my mailbox named "Allowing to use module aliases in place of -pack".)

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Dec 7, 2017

Member

@Drup I'm happy to add this explanation to the doc, do you have an idea where it should go?

@alainfrisch I guess open* would work and could be useful in other contexts, but to be fair I'd rather leave that as a future improvement. Adding a new language feature will take time and for the case of Pervasives it's the same result

@gasche It's well possible that this shadowing problem was overlooked in the initial discussion about -open. Currently -open has a few use cases:

  • it works well to implement compilation schemes such as the one in #1010
  • it works well to factorize an open X across all your files

but it's not perfect to replace the stdlib because of this shadowing problem. So even with #1010 and this PR, more work needs to be done to be able to really replace the stdlib by something else. But at least we can add new modules to the stdlib.

Member

diml commented Dec 7, 2017

@Drup I'm happy to add this explanation to the doc, do you have an idea where it should go?

@alainfrisch I guess open* would work and could be useful in other contexts, but to be fair I'd rather leave that as a future improvement. Adding a new language feature will take time and for the case of Pervasives it's the same result

@gasche It's well possible that this shadowing problem was overlooked in the initial discussion about -open. Currently -open has a few use cases:

  • it works well to implement compilation schemes such as the one in #1010
  • it works well to factorize an open X across all your files

but it's not perfect to replace the stdlib because of this shadowing problem. So even with #1010 and this PR, more work needs to be done to be able to really replace the stdlib by something else. But at least we can add new modules to the stdlib.

@bluddy

This comment has been minimized.

Show comment
Hide comment
@bluddy

bluddy Dec 7, 2017

@alainfrisch I don't think there's much value in adding open*. Its utility disappears as soon as we modify your example slightly:

open* TheLib
open* Logger (* contains logger)
log (foo ++ bar) ** baz

If TheLib adds log, we have the same problem. The only real solution is to have an easy way to specify exactly what you're importing. There's no easy way to do that in OCaml -- you have to create another module:

module MyLib = struct let x = MyLib.x ... end
module Log = struct let log = Logger.log end
open MyLib
open Log
log (foo ++ bar) ** baz

It would be nice if we had more convenient syntax for this:

open MyLib[foo; bar]
open Logger[log]
log (foo ++ bar) ** baz

For convenience of reuse, it would also be nice if we could have syntax to create modules easily:

module Lib = MyLib[foo; bar]
module Log = Logger[log]
open Log
open Lib

As a final thought, modules are OCaml's 'currency of trade'. The easier we make it create and manipulate them, the better the language becomes.

bluddy commented Dec 7, 2017

@alainfrisch I don't think there's much value in adding open*. Its utility disappears as soon as we modify your example slightly:

open* TheLib
open* Logger (* contains logger)
log (foo ++ bar) ** baz

If TheLib adds log, we have the same problem. The only real solution is to have an easy way to specify exactly what you're importing. There's no easy way to do that in OCaml -- you have to create another module:

module MyLib = struct let x = MyLib.x ... end
module Log = struct let log = Logger.log end
open MyLib
open Log
log (foo ++ bar) ** baz

It would be nice if we had more convenient syntax for this:

open MyLib[foo; bar]
open Logger[log]
log (foo ++ bar) ** baz

For convenience of reuse, it would also be nice if we could have syntax to create modules easily:

module Lib = MyLib[foo; bar]
module Log = Logger[log]
open Log
open Lib

As a final thought, modules are OCaml's 'currency of trade'. The easier we make it create and manipulate them, the better the language becomes.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Dec 7, 2017

Contributor

I think the value for open* would be mostly for "very local opens" (perhaps using the M.(...) syntax). As long as you don't nest such opens, it can definitely be useful. Typically, a function that combines many functions/operators/constructors from a single library.

I'd rather leave that as a future improvement

I wrote explicitly that I'm not suggesting to add open* at this point. I personally find the explanation of the change in terms of this would-be feature equally simple to grasp than the one based on changing the ordering of operations. Knowing that we might be able to add this feature and then recast the explanation for Pervasives in terms of open* makes the proposed semantics look less ad hoc.

To be clear: I'm in favor of this semantics (but I did not review the code).

Contributor

alainfrisch commented Dec 7, 2017

I think the value for open* would be mostly for "very local opens" (perhaps using the M.(...) syntax). As long as you don't nest such opens, it can definitely be useful. Typically, a function that combines many functions/operators/constructors from a single library.

I'd rather leave that as a future improvement

I wrote explicitly that I'm not suggesting to add open* at this point. I personally find the explanation of the change in terms of this would-be feature equally simple to grasp than the one based on changing the ordering of operations. Knowing that we might be able to add this feature and then recast the explanation for Pervasives in terms of open* makes the proposed semantics look less ad hoc.

To be clear: I'm in favor of this semantics (but I did not review the code).

Show outdated Hide outdated typing/envaux.ml Outdated
Show outdated Hide outdated typing/env.mli Outdated
Show outdated Hide outdated typing/env.ml Outdated
@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Dec 7, 2017

Contributor

Regarding performances, I tried passing 100 include directories and observed a 0.001s slowdown compared to master, so it's negligible.

The current Pervasives defines one only module. The proposal in #1010 is that Pervasives (or Stdlib) would export as many modules as they are currently in the stdlib (about 50). In the OPAM world, it's not uncommon to compile with 30 -I directories. So that would be 15x worst than your benchmark. It's still fine if your timing is representative of the reality, but it might be worth checking under Windows where syscall might be much slower.

Contributor

alainfrisch commented Dec 7, 2017

Regarding performances, I tried passing 100 include directories and observed a 0.001s slowdown compared to master, so it's negligible.

The current Pervasives defines one only module. The proposal in #1010 is that Pervasives (or Stdlib) would export as many modules as they are currently in the stdlib (about 50). In the OPAM world, it's not uncommon to compile with 30 -I directories. So that would be 15x worst than your benchmark. It's still fine if your timing is representative of the reality, but it might be worth checking under Windows where syscall might be much slower.

@nojb

This comment has been minimized.

Show comment
Hide comment
@nojb

nojb Dec 7, 2017

Contributor

I checked under Windows and the timing difference seemed to be of a similar order.

Contributor

nojb commented Dec 7, 2017

I checked under Windows and the timing difference seemed to be of a similar order.

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Dec 13, 2017

Member

@nojb thanks, how many include directories did you pass? Maybe we can slightly change the code to do the lookup 50 times and that will give us a good idea of the impact of this PR + #1010. Could you try that?

Member

diml commented Dec 13, 2017

@nojb thanks, how many include directories did you pass? Maybe we can slightly change the code to do the lookup 50 times and that will give us a good idea of the impact of this PR + #1010. Could you try that?

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Dec 13, 2017

Member

Hmm, actually I just tried on Linux and the slow-down is noticeable. I'm going to try to rebase #1010 on top of this PR and try on a real build, because in practice we do some of this lookup anyway for stdlib modules. Here we just do it eagerly for all stdlib modules.

Member

diml commented Dec 13, 2017

Hmm, actually I just tried on Linux and the slow-down is noticeable. I'm going to try to rebase #1010 on top of this PR and try on a real build, because in practice we do some of this lookup anyway for stdlib modules. Here we just do it eagerly for all stdlib modules.

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Dec 18, 2017

Member

So I did the following experiment: I created a big jbuilder workspace including jenga and all its transitive dependencies. Then I built the jenga binary with -j1. When you include the transitive dependencies, jenga depends on 45 libraries, and many of these libraries have a lot of transitive dependencies as well, so I think this is a good test for this PR. Looking at the log, there were 919 calls to ocamlc and 990 calls to ocamlopt.

I did the build with both trunk and #1010 rebased on top of #1513. I did the build on Linux. These are the results I got:

  • trunk:

    real	4m58.552s
    user	3m34.780s
    sys	1m13.414s
    
  • #1513+#1010:

    real	5m3.245s
    user	3m42.124s
    sys	1m20.450s
    

So the build is about 1% slower. Implementing what I describe in #1010 (comment) would solve the performance problem, but it is a bigger change and it is a breaking change on case-sensitive file systems, so it is likely to take time to develop and release.

Personally, I'm in favor of accepting the small slowdown for now and then looking at another implementation later. Does that sound good?

BTW, the build of Base is broken by #1010. We do complicated things in Base to shadow the stdlib (we inspect stdlib.cma and pervasives.cmi) so it's not surprising.

Member

diml commented Dec 18, 2017

So I did the following experiment: I created a big jbuilder workspace including jenga and all its transitive dependencies. Then I built the jenga binary with -j1. When you include the transitive dependencies, jenga depends on 45 libraries, and many of these libraries have a lot of transitive dependencies as well, so I think this is a good test for this PR. Looking at the log, there were 919 calls to ocamlc and 990 calls to ocamlopt.

I did the build with both trunk and #1010 rebased on top of #1513. I did the build on Linux. These are the results I got:

  • trunk:

    real	4m58.552s
    user	3m34.780s
    sys	1m13.414s
    
  • #1513+#1010:

    real	5m3.245s
    user	3m42.124s
    sys	1m20.450s
    

So the build is about 1% slower. Implementing what I describe in #1010 (comment) would solve the performance problem, but it is a bigger change and it is a breaking change on case-sensitive file systems, so it is likely to take time to develop and release.

Personally, I'm in favor of accepting the small slowdown for now and then looking at another implementation later. Does that sound good?

BTW, the build of Base is broken by #1010. We do complicated things in Base to shadow the stdlib (we inspect stdlib.cma and pervasives.cmi) so it's not surprising.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Dec 19, 2017

Member

(For the record, I'm satisfied with Jérémie's explanation. I haven't reviewed the code, so I don't approve the PR, but in general I trust his patches so no objection here -- although I think that having some documentation of the whole thing would be nice.)

@nojb and @alainfrisch, I'll let you drive this PR; any remaining concerns on performance or design, or merge decisions, are yours to express :-)

Member

gasche commented Dec 19, 2017

(For the record, I'm satisfied with Jérémie's explanation. I haven't reviewed the code, so I don't approve the PR, but in general I trust his patches so no objection here -- although I think that having some documentation of the whole thing would be nice.)

@nojb and @alainfrisch, I'll let you drive this PR; any remaining concerns on performance or design, or merge decisions, are yours to express :-)

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Dec 19, 2017

Member

Thanks. BTW, I won't be able to provide support over the chrismas period, so in any case I won't merge this myself before next year.

Member

diml commented Dec 19, 2017

Thanks. BTW, I won't be able to provide support over the chrismas period, so in any case I won't merge this myself before next year.

Show outdated Hide outdated driver/compmisc.ml Outdated
Show outdated Hide outdated Changes Outdated
@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Jan 19, 2018

Contributor
  • I did not test it, but I believe the current code does not support shadowing a Pervasives sub-modules with a module in the current directory (unless "-I ." is explicitly passed to the compiler). I believe it should be the case.

  • Apart from that, the code looks good to me, and the timings from @diml suggest that it is not worth doing something more complicated (at least for now).

  • See my note about the entry in Changes.

  • What about adding a test (shadowing LargeFile)?

Contributor

alainfrisch commented Jan 19, 2018

  • I did not test it, but I believe the current code does not support shadowing a Pervasives sub-modules with a module in the current directory (unless "-I ." is explicitly passed to the compiler). I believe it should be the case.

  • Apart from that, the code looks good to me, and the timings from @diml suggest that it is not worth doing something more complicated (at least for now).

  • See my note about the entry in Changes.

  • What about adding a test (shadowing LargeFile)?

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Jan 22, 2018

Member

I'll add a test.

BTW, I realized there is something untrue about the new semantic I wrote before; because we look for pervasives.cmi after having set the search path, the real new semantic is:

env <- empty

env <- add_dir(env, stdlib_dir)

env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, Pervasives)

env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, M1)
...
env <- open(env, Mm)

i.e. the include directories are added twice: once before opening Pervasives and once again after. The two semantics only yield different results when an include directory contains a pervasives.cmi file, so which one is enforced is unlikely to affect anybody.

Member

diml commented Jan 22, 2018

I'll add a test.

BTW, I realized there is something untrue about the new semantic I wrote before; because we look for pervasives.cmi after having set the search path, the real new semantic is:

env <- empty

env <- add_dir(env, stdlib_dir)

env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, Pervasives)

env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, M1)
...
env <- open(env, Mm)

i.e. the include directories are added twice: once before opening Pervasives and once again after. The two semantics only yield different results when an include directory contains a pervasives.cmi file, so which one is enforced is unlikely to affect anybody.

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Jan 25, 2018

Member

I added a test and addressed the review comments.

Member

diml commented Jan 25, 2018

I added a test and addressed the review comments.

@Drup

This comment has been minimized.

Show comment
Hide comment
@Drup

Drup Jan 28, 2018

Contributor

Ok, the implementation, while quite short, is quite a bit more subtle than I expected. It implements the semantics outlined by @diml (at least, as far as I can tell) but in a fairly different way.

Basically, it modifies the open operation to provide a set of modules that should not be added to the local scope when opening. This set is only non-empty when opening Pervasives, and uses the list of .cmi that are in path. I'm quite convinced that it works, but the implementation is a bit roundabout. Also, I don't think this would be the right way to implement open* that @alainfrisch suggested.

One small thing that I do not like is that the only filtering list which matters (i.e., the one used to open "Pervasives": all the others are empty) is defined in compmisc, which should really not contained such code. Driver is already a mess, I'm strongly against adding anything semantically meaningful to it.

So, I would propose to add two functions, Env.filter_modules_in_path and Env.filter_modules_default, next to the definition of open that implement both mode of filtering. It also provide a natural place to add the documentation you wrote above. :)

Additionally, given how thoroughly you take care of propagating ?filter_modules, it might be better to make it a mandatory argument. I think you caught all the instances, but then why is it optional ?
The discrepancy between the function argument (a predicate) and the constructor argument (a set) is a bit unsatisfying, but I don't see a way around it.

Apart from that, it looks ok to me.

Contributor

Drup commented Jan 28, 2018

Ok, the implementation, while quite short, is quite a bit more subtle than I expected. It implements the semantics outlined by @diml (at least, as far as I can tell) but in a fairly different way.

Basically, it modifies the open operation to provide a set of modules that should not be added to the local scope when opening. This set is only non-empty when opening Pervasives, and uses the list of .cmi that are in path. I'm quite convinced that it works, but the implementation is a bit roundabout. Also, I don't think this would be the right way to implement open* that @alainfrisch suggested.

One small thing that I do not like is that the only filtering list which matters (i.e., the one used to open "Pervasives": all the others are empty) is defined in compmisc, which should really not contained such code. Driver is already a mess, I'm strongly against adding anything semantically meaningful to it.

So, I would propose to add two functions, Env.filter_modules_in_path and Env.filter_modules_default, next to the definition of open that implement both mode of filtering. It also provide a natural place to add the documentation you wrote above. :)

Additionally, given how thoroughly you take care of propagating ?filter_modules, it might be better to make it a mandatory argument. I think you caught all the instances, but then why is it optional ?
The discrepancy between the function argument (a predicate) and the constructor argument (a set) is a bit unsatisfying, but I don't see a way around it.

Apart from that, it looks ok to me.

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Jan 29, 2018

Member

Basically, it modifies the open operation to provide a set of modules that should not be added to the local scope when opening. This set is only non-empty when opening Pervasives, and uses the list of .cmi that are in path. I'm quite convinced that it works, but the implementation is a bit roundabout. Also, I don't think this would be the right way to implement open* that @Frisch suggested.

Yh, it's the only I can think of to avoid a breaking change on Windows due to the file system.

So, I would propose to add two functions, Env.filter_modules_in_path and Env.filter_modules_default, next to the definition of open that implement both mode of filtering. It also provide a natural place to add the documentation you wrote above. :)

Seems fine. Looking at this again, we can make things a bit more explicit by replacing ?filter_modules:(fun string -> bool) by ?where_to_open:where_to_open where:

type where_to_open = Before_load_path | After_load_path

And for the Envaux.env_from_summary, which is only place where we need to filter modules using a fixed set of strings, we can add:

val open_signature_from_env_summary:
  Path.t -> t -> hidden_submodules:Misc.StringSet.t -> t option

Additionally, given how thoroughly you take care of propagating ?filter_modules, it might be better to make it a mandatory argument. I think you caught all the instances, but then why is it optional ?

Making it required seems fine. It's a breaking change, but I don't think we care too much here (?)

Member

diml commented Jan 29, 2018

Basically, it modifies the open operation to provide a set of modules that should not be added to the local scope when opening. This set is only non-empty when opening Pervasives, and uses the list of .cmi that are in path. I'm quite convinced that it works, but the implementation is a bit roundabout. Also, I don't think this would be the right way to implement open* that @Frisch suggested.

Yh, it's the only I can think of to avoid a breaking change on Windows due to the file system.

So, I would propose to add two functions, Env.filter_modules_in_path and Env.filter_modules_default, next to the definition of open that implement both mode of filtering. It also provide a natural place to add the documentation you wrote above. :)

Seems fine. Looking at this again, we can make things a bit more explicit by replacing ?filter_modules:(fun string -> bool) by ?where_to_open:where_to_open where:

type where_to_open = Before_load_path | After_load_path

And for the Envaux.env_from_summary, which is only place where we need to filter modules using a fixed set of strings, we can add:

val open_signature_from_env_summary:
  Path.t -> t -> hidden_submodules:Misc.StringSet.t -> t option

Additionally, given how thoroughly you take care of propagating ?filter_modules, it might be better to make it a mandatory argument. I think you caught all the instances, but then why is it optional ?

Making it required seems fine. It's a breaking change, but I don't think we care too much here (?)

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Jan 30, 2018

Member

I refactored the code a bit. In the end I didn't expose ?filter_modules and added explicit functions for opening the initial module and reconstructing the environment from the summary. I also realized that the behavior is not clear when we hide a module that is already in the environment, so I added a assertion to check that we don't.

I also added Typemod.initial_env to factorize the code between the various places. I tried to put it in Envaux but then this caused a lot of new modules to be needed to build the debugger, in particular Lexer which clashed with the Lexer module of the debugger...

Member

diml commented Jan 30, 2018

I refactored the code a bit. In the end I didn't expose ?filter_modules and added explicit functions for opening the initial module and reconstructing the environment from the summary. I also realized that the behavior is not clear when we hide a module that is already in the environment, so I added a assertion to check that we don't.

I also added Typemod.initial_env to factorize the code between the various places. I tried to put it in Envaux but then this caused a lot of new modules to be needed to build the debugger, in particular Lexer which clashed with the Lexer module of the debugger...

@Drup

Drup approved these changes Jan 30, 2018 edited

Ok, I like the new organization much better.

I'm not sure about the Envaux issue (although I'm not terribly surprised either). Putting the initial typing environment in Typemod is fairly coherent too, so that's fine.

Looks good to me. :)

@diml

This comment has been minimized.

Show comment
Hide comment
@diml

diml Feb 1, 2018

Member

Great, thanks all for your reviews and comments. I will rebase and merge this today and then i'll restart the work on #1010

Member

diml commented Feb 1, 2018

Great, thanks all for your reviews and comments. I will rebase and merge this today and then i'll restart the work on #1010

@diml diml merged commit 49f6dd5 into ocaml:trunk Feb 2, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment