Refactor the construction of the initial environment (fixes #7841) #2041

ghost · 2018-09-11T18:08:09Z

This PR is a work in progress, I'm opening it for early feedback and to make sure I'm approaching the problem the right way, as it is taking longer than I expected.

Overview

This PR refactors the way include directories and handled. In particular, it allows modules implemented by external cmi files to shadow modules in the current environment. The main motivation is so that modules coming from include directories specified by the user have precedence over the implicit open Stdlib performed by the compiler.

This is already the case, however the way it is implemented is not very elegant and doesn't work when incrementally adding include directories, such as in the toplevel.

More details

In the current world, the environment is split in two:

a local in-memory environment
a mapping from module names to external cmi files, specified via a list of include directories

1 always have precedence over 2 which is only used as a fallback for modules that are not found in 1. In particular if we consider the following toplevel session:

# module X = struct end;;
# #directory "foo";;

It is impossible to refer to the compilation unit X stored in foo/x.cmi because the local X has precedence over external modules.

This PRs removes the fallback to 2 and replaces it by a new primitive that allows to add a persistent module to the environment. This means that adding an include directory becomes an operation that eagerly add names to the environment, shadowing existing ones. However, the cmi files are still loaded lazily.

If we reuse the DSL from #1513, this means the initial environment is now effectively constructed as follow:

env <- empty

env <- add_dir(env, stdlib_dir)
env <- open(env, Stdlib)

env <- add_dir(env, d1)
...
env <- add_dir(env, dn)

env <- open(env, M1)
...
env <- open(env, Mm)

Where dX are the include directories specified by the user via the -I option and MX are the implicitly opened modules specified via -open. In the toplevel, the #directory <dir> directive is immediate interpreted as: env <- add_dir(env, dir). So if we consider the toplevel session above, after #directory "foo", X refers to foo/x.cmi rather than the local module X. This seems like a better behavior to me.

toplevel/topdirs.ml

typing/env.ml

alainfrisch · 2018-09-12T08:00:11Z

typing/env.ml

        acc
    | Some l ->
-      let p, desc = lookup_module_descr ~mark:true l env in
+        let p, desc = lookup_module_descr ~mark:true l env in


Please reindent the code below as well.

alainfrisch · 2018-09-12T08:09:24Z

typing/typemod.ml

+    | Some m ->
+        (* Locate the directory that contains [m], adds the units it
+           contains to the environment and open [m] in the resulting
+           environment. *)


Can you explain the logic here? Why don't we add all units from all directories first and then open the initially_opened_module in the resulting env?

We can do that indeed. However, we would need to re-add the units after opening this module, so that they can shadow the sub-modules of Stdlib. That doesn't seem optimal. Given that we know that the Stdlib module has no dependencies outside of the stdlib directory, it seems more natural to do it this way.

In fact, after this PR we almost want to give meaning to the order in which the various -I and -open flags are passed. That would make sense when using an alternative stdlib for instance as currently it's not uncommon to write code like this:

module X_in_this_directory = X open Core module X = X_in_this_directory

which is kind of ugly.

If we ever make the initially_opened_module customizable (currently, it is always "Stdlib" or nothing), the logic would seem a bit ad hoc: modules in the same directory than the initially_opened_module wouldn't be allowed to override sub-modules of that one, but modules in any other -I directory would. Perhaps not a big deal in practice, though.

In fact, after this PR we almost want to give meaning to the order in which the various -I and -open flags are passed.

Yes, it would seem quite natural then. People could explicitly pass the same -I directory several times if needed.

utils/load_path.ml

alainfrisch

I did a first review pass and things look globally good. Some notes:

This will slightly change the behavior on case-insensitive file-systems. With the PR, the casing of filenames (except for the first character) needs to match the one of the module name. Previously, it was ok to rename Foo.cmi into FOO.CMI. I suppose this is ok.
Another change is that, in the toplevel, units which were not yet available when the directory was added won't be usable. Previously, it was possible to start a toplevel with some -I flags while a build system was still running in the background, and still access resulting units once they are build. Or, more realistically, to have some ocaml script explicitly calling the compiler and then referring to the resulting units. This could be addressed by a toplevel directive that explicitly refreshes the view on external units. Not sure it is worth it.
Yet another change in the toplevel is that with the current "delayed" resolution mechanism, load paths are interpreted relative to the current directory when the lookup is done. Users could start the toplevel with some -I directives (with relative directories), and explicitly change the current directory at the beginning of the script (e.g. to choose between different implementations/versions). Again , the change is probably ok.

utils/load_path.mli

alainfrisch · 2018-09-12T08:35:18Z

typing/env.ml

@@ -450,14 +450,20 @@ type type_descriptions =

 let in_signature_flag = 0x01

+type 'a value_or_persistent =
+  | Value of 'a
+  | Persistent


Persistent is just a placeholder for remembering that the module is an external unit. Wouldn't it make sense to keep the actual file name, as discovered during the initial scan?

I wondered about this. However, given that environments are serialized to disk, wouldn't that make cmi files non-relocatable?

Only summaries are serialized, so this would be fine I guess, but it requires a bit more refactoring.

Looking at it, there is t in module_components, however it is always empty when saving a cmi file. I guess we could make it an option, then we would have to bootstrap less often.

alainfrisch · 2018-09-12T08:40:03Z

Another small change: if a file is removed during the compilation or during a toplevel session, one could end up, I believe, with read_cmi being passed a non-existing filename, resulting in uncaught exceptions. Perhaps worth protecting with an explicit error message (or deal gracefully with the situation, as if the file had never been there).

dbuenzli · 2018-09-12T08:45:23Z

@diml only a suggestion but since you are in the area would you maybe consider how this could fit with point 2. of MPR7589

ghost · 2018-09-12T16:40:38Z

Thanks for the review! For the toplevel, yh I agree that a few things will change. Personally, I find the new behavior more intuitive. BTW, you can already rescan a given directory by issuing a new #directory primitive. The cost of looking up files in the load path no longer depends on the number of load paths, so it's not really a problem to have duplicates.

Another small change: if a file is removed during the compilation or during a toplevel session, one could end up, I believe, with read_cmi being passed a non-existing filename, resulting in uncaught exceptions. Perhaps worth protecting with an explicit error message (or deal gracefully with the situation, as if the file had never been there).

Indeed. I guess we can do a Sys.file_exists in Load_path.find. I suggest to do it only when Sys.interactive is true, as otherwise it's one more system call that will most of the time not be necessary as the build system should make sure this doesn't happen.

@dbuenzli I believe the information is already available in Symtable, so I guess it's just a matter of exposing it in Toploop.

alainfrisch · 2018-09-12T17:13:24Z

Indeed. I guess we can do a Sys.file_exists in Load_path.find.

Or just catch the exception?

ghost · 2018-09-12T17:14:35Z

This would work but requires a bit more refactoring. Basically we need to catch it wherever we call Load_path.find. I changed all the Misc.find_in_path given that they were all of the form Misc.find_in_path x !Config.load_path.

ghost · 2018-09-12T17:18:16Z

BTW, currently I'm still debugging a build error. The following doesn't type, saying that the value is unbound:

let x = Stdlib.Seq.empty

However, the following does type, which is a bit confusing:

module X = Stdlib.Seq
let x = X.empty

ghost · 2018-09-13T18:20:44Z

We had a look with @lpw25, the problem was that module aliases coming from cmi files where resolved in an empty environment. I've restored the fallback to 2, which solves the issue.

alainfrisch · 2018-09-14T13:54:39Z

We had a look with @lpw25, the problem was that module aliases coming from cmi files where resolved in an empty environment. I've restored the fallback to 2, which solves the issue.

I did not try to understand where this comes from, but my intuitive reaction is that this does not look right. Can you give some hints on why aliases are resolved in an empty environment, and why the fix should not be to resolve them instead in another environment?

lpw25 · 2018-09-14T14:15:21Z

I can't comment on the fix because I haven't read it, but the problem itself is fairly simple. The components value created for a loaded cmi file requires an environment. This environment needs to contain all free identifiers in the cmi file so that they can be looked up. Before this PR an empty environment was sufficient for this since it implicitly contained all persistent identifiers available from the include directories -- and any free identifier in a cmi file must be a persistent identifier[1]. With this PR an empty environment is genuinely empty and so is not sufficient for this purpose.

[1]: Actually, it could also be an identifier from Predef, but I guess we never happen to look one of these up in the environment for the components value. So it would probably be better to use the initial environment rather than an empty environment anyway.

alainfrisch · 2018-09-14T14:17:22Z

Before this PR an empty environment was sufficient for this since it implicitly contained all persistent identifiers available from the include directories

But then why not simply use the initial environment instead of the empty one?

ghost · 2018-09-17T13:52:19Z

Looking at this again, building an environment that only contain the names in cmi_crcs should be enough

ghost · 2018-09-17T17:27:57Z

I did this and went a bit further. I also had to fix Matching.get_mod_field. It feels like this function could be implemented in a more straightforward way, i.e. without constructing a proper environment.

ghost · 2018-09-17T17:28:36Z

BTW, for some reason make bootstrap tries to build the debugger and a bunch of other stuff. This makes bootstrapping more painful than it should...

alainfrisch · 2018-09-17T17:40:05Z

Yes, make boostrap does the actual bootstrapping (make coreboot) followed by a make all to make sure everything's ok.

ghost · 2018-09-17T17:48:38Z

Ah, I see. Thanks

ghost · 2018-09-18T14:00:19Z

I implemented #remove_directory and fixed a few other issues. The testsuite now passes. I had to add another hook in Toploop to implement the self-contained-toplevel test.

This PR is now ready for review.

toplevel/toploop.mli

ghost · 2018-09-20T21:02:18Z

It was to fix the selt-contained-toplevel test: the names now needs to be added to Toploop.toplevel_env as well, and this needs to be done after the setup steps as they reset Toploop.toplevel_env. I suppose another way would be to add a dedicated API for that in toploop, i.e.:

val register_module : name:string -> Env.Persistent_signature.t Lazy.t -> unit

which would take care of all this.

damiendoligez · 2018-11-08T16:13:59Z

I guess we can do a Sys.file_exists in Load_path.find.

Another process might remove the file right after you called Sys.file_exists. The only way to be correct is to catch the Sys_error when you call open.

ghost · 2018-11-12T13:37:32Z

Indeed, however Sys_error exceptions are hard to match on. All we can do is catch them all, which is not great. For instance, in this case we might want to distinguish ENOENT errors from other ones.

ghost · 2019-02-18T11:53:31Z

Indeed, the fix should be simple. I'll have a look

ghost · 2019-02-18T16:25:03Z

Fixed in #2256

shindere · 2019-02-22T10:41:28Z

Currently the 'test_unix_cmdline.byte.output' is broken on Cygwin.

Bisecting seems to show that it is this GPR and in particular commit
7e0862a which introduced the failure.

Both bytecode and native tests fail when the compiled program is run for the
first time. The native version times out while the bytecode version produces
the following output:

Fatal error: no bytecode file specified
/path/to//test_unix_cmdline.byte: line 2: syntax error near unexpected token `$'\001\001C''
/path/to/test_unix_cmdline.byte: line 2: `Tß�W�����%.7@IR[gt}���c(�C'

@diml any hint about what could have gone wrong?

Anything I may do to help debugging this?

ghost · 2019-02-25T07:55:31Z

@shindere seeing the compilation/execution log could help

shindere · 2019-02-25T07:58:02Z

Jérémie Dimino (2019/02/24 23:55 -0800):

@shindere seeing the compilation/execution log could help

Well no, not really actually :( the logs are totally similar until the run whose output has already been shown here.

ghost · 2019-02-25T08:04:58Z

What program is printing Fatal error: no bytecode file specified?

shindere · 2019-02-25T10:07:55Z

Jérémie Dimino (2019/02/25 00:05 -0800):

What program is printing `Fatal error: no bytecode file specified`?

As far as I can tell it's `test_unix_cmdline.byte`.

ghost · 2019-02-25T10:31:36Z

I read the code of the test and I can't see anything special.

My only wild guess is that this PR somehow causes the wrong file to be linked in (is it even possible to happen silently?)

shindere · 2019-02-25T10:42:37Z

I odn't know really. I'm just re-testing that commit and the one just before. What's odd is that the produced executable starts by a reference to ocamlrun, with no #! in front, whereas in the commit before, the executable is a Windows executable with MZ at the beginning. @diml do you want me to send you the files produced by the two invocations of ocamltest, before and after the commit?

ghost · 2019-02-25T10:54:23Z

The header for non-custom bytecode files is read from the camlheader that is looked up in the load path. One difference between the previous commit and this one is that it is now looked up in a case sensitive way. Maybe that's the reason somehow?

@diml do you want me to send you the files produced by the two
invocations of ocamltest, before and after the commit?

Sure

shindere · 2019-02-25T15:32:53Z

OK so the difference between the behaviours before and after
7e0862a is in bytecomp/bytelink.ml.

Before the commit, the camlheader file is found and copied to the output.
After the commit, the "try" block raises an exception which is then silently
ignored.

ghost · 2019-02-25T16:13:32Z

Where is the camlheader file during the execution of tests? Is it possible that it is in a directory d such as Sys.readdir d fails, but Sys.file_exists (Filename.concat d "camlheader") succeeds?

That would explain the difference of behaviours.

dra27 · 2019-02-26T09:26:39Z

I'm just looking at this - I have a strong hunch that it's because of Cygwin's .exe shim handling. Before, I think it would have attempted to open camlheader and Cygwin would spot that camlheader.exe exists and open it. Now, we use Sys.readdir which will I think be returning camlheader.exe

ghost · 2019-02-26T09:32:14Z

Oh, are we actually calling this file camlheader.exe on Windows?

dra27 · 2019-02-26T09:36:32Z

Specifically Cygwin (the native Windows builds don't add .exe for these) - and possibly specific to the build system. One of the joys of Cygwin is not its speed... I'm still waiting for the build to finish :)

ghost · 2019-02-26T09:48:36Z

I see. So the fix is to lookup camlheader.exe rather than camlheader on Cygwin?

shindere · 2019-02-26T10:22:33Z

By the way: one problem that still needs to be understood is that the native version of the same test also fails, but in a different way: the test times hangs and Jenkins finally marks it as aborted after 20 minutes. As far as I can tell, the problem appears at the same commit than for the bytecode test.

dra27 · 2019-02-26T10:47:14Z

The fix is that the file should never have the .exe - there's a parallelism-related bug which I'd noticed there before, so I'm working on a fix for both.

@shindere - I only glanced at the test, but the native version invokes the incorrectly-built bytecode program which seems to be the cause of the hang. When I renamed stdlib/camlheader_ur.exe to just camlheader_ur those tests all start working again.

The precheck log from this build hasn't been saved (I didn't realise they got purged) - I'm guessing that I ignored the Cygwin failure as being transient, so sorry for not checking that more thoroughly.

shindere · 2019-02-26T10:52:40Z

All this sounds great @dra27! If you submit a PR I'll review it promptly and then I think CI will work again (after the rebase has been disabled on Cygwin 64), which will be great and help us to catch the real errors.

See ocaml/ocaml#2041 for the fix in OCaml 4.08.0

…odule (library) shadows Stdlib.Bigarray in OCaml < 4.08 This is a problem for libraries that depend on bigarray-compat but where packages that uses those libraries are using Bigarray themselves. See ocaml/ocaml#2041