-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the construction of the initial environment (fixes #7841) #2041
Conversation
typing/env.ml
Outdated
acc | ||
| Some l -> | ||
let p, desc = lookup_module_descr ~mark:true l env in | ||
let p, desc = lookup_module_descr ~mark:true l env in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reindent the code below as well.
| Some m -> | ||
(* Locate the directory that contains [m], adds the units it | ||
contains to the environment and open [m] in the resulting | ||
environment. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the logic here? Why don't we add all units from all directories first and then open the initially_opened_module in the resulting env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that indeed. However, we would need to re-add the units after opening this module, so that they can shadow the sub-modules of Stdlib
. That doesn't seem optimal. Given that we know that the Stdlib
module has no dependencies outside of the stdlib directory, it seems more natural to do it this way.
In fact, after this PR we almost want to give meaning to the order in which the various -I
and -open
flags are passed. That would make sense when using an alternative stdlib for instance as currently it's not uncommon to write code like this:
module X_in_this_directory = X
open Core
module X = X_in_this_directory
which is kind of ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ever make the initially_opened_module
customizable (currently, it is always "Stdlib" or nothing), the logic would seem a bit ad hoc: modules in the same directory than the initially_opened_module wouldn't be allowed to override sub-modules of that one, but modules in any other -I directory would. Perhaps not a big deal in practice, though.
In fact, after this PR we almost want to give meaning to the order in which the various -I and -open flags are passed.
Yes, it would seem quite natural then. People could explicitly pass the same -I directory several times if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first review pass and things look globally good. Some notes:
-
This will slightly change the behavior on case-insensitive file-systems. With the PR, the casing of filenames (except for the first character) needs to match the one of the module name. Previously, it was ok to rename Foo.cmi into FOO.CMI. I suppose this is ok.
-
Another change is that, in the toplevel, units which were not yet available when the directory was added won't be usable. Previously, it was possible to start a toplevel with some -I flags while a build system was still running in the background, and still access resulting units once they are build. Or, more realistically, to have some ocaml script explicitly calling the compiler and then referring to the resulting units. This could be addressed by a toplevel directive that explicitly refreshes the view on external units. Not sure it is worth it.
-
Yet another change in the toplevel is that with the current "delayed" resolution mechanism, load paths are interpreted relative to the current directory when the lookup is done. Users could start the toplevel with some -I directives (with relative directories), and explicitly change the current directory at the beginning of the script (e.g. to choose between different implementations/versions). Again , the change is probably ok.
@@ -450,14 +450,20 @@ type type_descriptions = | |||
|
|||
let in_signature_flag = 0x01 | |||
|
|||
type 'a value_or_persistent = | |||
| Value of 'a | |||
| Persistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persistent
is just a placeholder for remembering that the module is an external unit. Wouldn't it make sense to keep the actual file name, as discovered during the initial scan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this. However, given that environments are serialized to disk, wouldn't that make cmi files non-relocatable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only summaries are serialized, so this would be fine I guess, but it requires a bit more refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it, there is t
in module_components
, however it is always empty
when saving a cmi file. I guess we could make it an option, then we would have to bootstrap less often.
Another small change: if a file is removed during the compilation or during a toplevel session, one could end up, I believe, with read_cmi being passed a non-existing filename, resulting in uncaught exceptions. Perhaps worth protecting with an explicit error message (or deal gracefully with the situation, as if the file had never been there). |
Thanks for the review! For the toplevel, yh I agree that a few things will change. Personally, I find the new behavior more intuitive. BTW, you can already rescan a given directory by issuing a new
Indeed. I guess we can do a @dbuenzli I believe the information is already available in |
Or just catch the exception? |
This would work but requires a bit more refactoring. Basically we need to catch it wherever we call |
BTW, currently I'm still debugging a build error. The following doesn't type, saying that the value is unbound: let x = Stdlib.Seq.empty However, the following does type, which is a bit confusing: module X = Stdlib.Seq
let x = X.empty |
We had a look with @lpw25, the problem was that module aliases coming from cmi files where resolved in an empty environment. I've restored the fallback to 2, which solves the issue. |
I did not try to understand where this comes from, but my intuitive reaction is that this does not look right. Can you give some hints on why aliases are resolved in an empty environment, and why the fix should not be to resolve them instead in another environment? |
I can't comment on the fix because I haven't read it, but the problem itself is fairly simple. The components value created for a loaded cmi file requires an environment. This environment needs to contain all free identifiers in the cmi file so that they can be looked up. Before this PR an empty environment was sufficient for this since it implicitly contained all persistent identifiers available from the include directories -- and any free identifier in a cmi file must be a persistent identifier[1]. With this PR an empty environment is genuinely empty and so is not sufficient for this purpose. [1]: Actually, it could also be an identifier from Predef, but I guess we never happen to look one of these up in the environment for the components value. So it would probably be better to use the initial environment rather than an empty environment anyway. |
But then why not simply use the initial environment instead of the empty one? |
Looking at this again, building an environment that only contain the names in |
I did this and went a bit further. I also had to fix |
BTW, for some reason |
Yes, |
Ah, I see. Thanks |
I implemented This PR is now ready for review. |
It was to fix the selt-contained-toplevel test: the names now needs to be added to val register_module : name:string -> Env.Persistent_signature.t Lazy.t -> unit which would take care of all this. |
Another process might remove the file right after you called |
Indeed, however |
Indeed, the fix should be simple. I'll have a look |
Fixed in #2256 |
Currently the 'test_unix_cmdline.byte.output' is broken on Cygwin. Bisecting seems to show that it is this GPR and in particular commit Both bytecode and native tests fail when the compiled program is run for the
@diml any hint about what could have gone wrong? Anything I may do to help debugging this? |
@shindere seeing the compilation/execution log could help |
Jérémie Dimino (2019/02/24 23:55 -0800):
@shindere seeing the compilation/execution log could help
Well no, not really actually :( the logs are totally similar until the
run whose output has already been shown here.
|
What program is printing |
Jérémie Dimino (2019/02/25 00:05 -0800):
What program is printing `Fatal error: no bytecode file specified`?
As far as I can tell it's `test_unix_cmdline.byte`.
|
I read the code of the test and I can't see anything special. My only wild guess is that this PR somehow causes the wrong file to be linked in (is it even possible to happen silently?) |
I odn't know really. I'm just re-testing that commit and the one just
before. What's odd is that the produced executable starts by a reference
to ocamlrun, with no #! in front, whereas in the commit before, the
executable is a Windows executable with MZ at the beginning.
@diml do you want me to send you the files produced by the two
invocations of ocamltest, before and after the commit?
|
The header for non-custom bytecode files is read from the
Sure |
OK so the difference between the behaviours before and after Before the commit, the camlheader file is found and copied to the output. |
Where is the camlheader file during the execution of tests? Is it possible that it is in a directory That would explain the difference of behaviours. |
I'm just looking at this - I have a strong hunch that it's because of Cygwin's |
Oh, are we actually calling this file camlheader.exe on Windows? |
Specifically Cygwin (the native Windows builds don't add |
I see. So the fix is to lookup |
By the way: one problem that still needs to be understood is that the
native version of the same test also fails, but in a different way: the
test times hangs and Jenkins finally marks it as aborted after 20
minutes. As far as I can tell, the problem appears at the same commit
than for the bytecode test.
|
The fix is that the file should never have the @shindere - I only glanced at the test, but the native version invokes the incorrectly-built bytecode program which seems to be the cause of the hang. When I renamed The precheck log from this build hasn't been saved (I didn't realise they got purged) - I'm guessing that I ignored the Cygwin failure as being transient, so sorry for not checking that more thoroughly. |
All this sounds great @dra27! If you submit a PR I'll review it promptly
and then I think CI will work again (after the rebase has been disabled
on Cygwin 64), which will be great and help us to catch the real errors.
|
See ocaml/ocaml#2041 for the fix in OCaml 4.08.0
…odule (library) shadows Stdlib.Bigarray in OCaml < 4.08 This is a problem for libraries that depend on bigarray-compat but where packages that uses those libraries are using Bigarray themselves. See ocaml/ocaml#2041
This PR is a work in progress, I'm opening it for early feedback and to make sure I'm approaching the problem the right way, as it is taking longer than I expected.
Overview
This PR refactors the way include directories and handled. In particular, it allows modules implemented by external cmi files to shadow modules in the current environment. The main motivation is so that modules coming from include directories specified by the user have precedence over the implicit
open Stdlib
performed by the compiler.This is already the case, however the way it is implemented is not very elegant and doesn't work when incrementally adding include directories, such as in the toplevel.
More details
In the current world, the environment is split in two:
1 always have precedence over 2 which is only used as a fallback for modules that are not found in 1. In particular if we consider the following toplevel session:
It is impossible to refer to the compilation unit
X
stored infoo/x.cmi
because the localX
has precedence over external modules.This PRs removes the fallback to 2 and replaces it by a new primitive that allows to add a persistent module to the environment. This means that adding an include directory becomes an operation that eagerly add names to the environment, shadowing existing ones. However, the cmi files are still loaded lazily.
If we reuse the DSL from #1513, this means the initial environment is now effectively constructed as follow:
Where
dX
are the include directories specified by the user via the-I
option andMX
are the implicitly opened modules specified via-open
. In the toplevel, the#directory <dir>
directive is immediate interpreted as:env <- add_dir(env, dir)
. So if we consider the toplevel session above, after#directory "foo"
,X
refers tofoo/x.cmi
rather than the local moduleX
. This seems like a better behavior to me.