Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/3] Restructure LIBDIR: Move Dynlink, Str and Unix to sub-directories #11198

Merged
merged 4 commits into from
May 24, 2022

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Apr 16, 2022

This PR is the first and main PR of a brave (in the Apple sense) series of changes which seek to separate the components in OCaml's $LIBDIR. This PR specifically moves the Dynlink, Str and Unix libraries into sub-directories of $LIBDIR. The idea of the change was provisionally agreed at the last developers' meeting.

The motivation for this change is hygiene in build systems: at present, it is very difficult to impossible for Dune (for example) to ensure that a library specifies its dependency on the Unix library.

The change itself is quite simple, and in the first commit. A small tweak to two Makefiles ensures the libraries are installed to their new locations. Any previous artefacts are removed (cf. #1724, #1728 and #10301, noting that the removal of artefacts is "precise" w.r.t. basenames).

That change on its own is very breaking, of course.

The rest of the commits relate to backwards compatibility. The compilers, toplevel and debugger will all automatically add the required include path to the load path, but display an alert for each directory as this happens. For example:

$ ocaml unix.cma
File "_none_", line 1:
Alert ocaml_auto_include: OCaml's lib directory layout changed in 5.0. The unix
subdirectory has been automatically added to the search path, but you should add
-I +unix to the command-line to silence this alert.
OCaml version 5.0.0+dev0-2021-11-05
Enter #help;; for help.

or similarly with:

$ ocaml
OCaml version 5.0.0+dev0-2021-11-05
Enter #help;; for help.

# type t = Unix.file_descr;;
Alert ocaml_auto_include: OCaml's lib directory layout changed in 5.0. The unix
subdirectory has been automatically added to the search path, but you should add
-I +unix to the command-line to silence this alert.
type t = Unix.file_descr

In implementing this alert, I've attempted to avoid penalising code which has specified the appropriate include directories, especially as we know that the load path is a relatively "hot" part of the build for some users. The mechanism I've gone for is to add an extra attempt around Load_path.find and Load_path.find_uncap for the case when there is an actual miss on the load path. In almost all cases, a miss on the load path is fatal. The exceptions are #remove_directory in the toplevel and gaining inlining information from .cmx files when linking in ocamlopt, so at less critical stages. The toplevel, compilers and debugger add "+dynlink", "+str" and "+unix" to a "secondary" load path which is lazily scanned on the first miss on the normal load path. If the file being searched (say unix.cma and unix.cmi) is found, then that directory's entries are promoted to the main load path, and the fact that this took place is recorded. From the user's perspective in, say, the toplevel what happens at #load "unix.cma" is as if the user issued #directory "+unix" and then tried the #load again.

Load_path is very low-down the dependency graph of compiler-libs, so rather than displaying alerts itself, it simply maintains a list of directories which have been automatically added. This list is queried by functions added with the other alert functions in, ahem, Location.

The initialisation code is a duplicated between the debugger, driver and toplevel. It's not huge, but I'm not sure exactly where to put a function to do that - how averse are we to adding modules to ocamlcommon?

Another option might be to install the files in both places, but this isn't a silver bullet either - including the new directory can cause warnings in some tools (ocamlfind doesn't like finding .cmi files in multiple directories, for one) and it seems to be harder to record the need to display an alert without altering main load path data structure, which is a much more intimidating prospect than altering an error path.

Very similar work was previously attempted and abandoned in #1569. This work differs both in implementation and timing in a few ways which hopefully means we can reach a consensus on merging it this time. The key differences:

  • OCaml 5.0 already breaks quite a few older packages, which means that this change is not as sensitive as it was in 2018.
  • We have superior testing systems in place to assess the impact of this change. For OCaml 5, accepting that both ocamlfind and dune require patching, only one other package is broken in the whole of opam-repository by this PR, and that's an old release of that particular package, the current release using Dune and being unaffected.
  • No changes are made to compiler-libs.
  • Rather than implementing new compiler flags, this PR displays an alert

I have tested this change quite extensively on 4.14 - in many cases, the packages affected are already lying to their build systems. Dune and ocamlbuild users, for example, should not use Unix without having unix in their libraries stanza or use_unix in their _tags file. Even packages building directly with ocamlfind should include -package unix in the command-line. These packages can be fixed regardless of whether this PR is merged. Given a flurry of PRs now with this change (these packages can merge these updates regardless of whether this PR is accepted), I imagine that by OCaml 5.2, the number of packages still displaying this alert will be close to zero and the entire mechanism can then be reverted - at which point this change lives entirely in the build system.

There are other packages which contain commands of the form ocaml unix.cma which want to become ocaml -I +unix unix.cma - PRs are opened for these as well.

There are three possible pieces of future work which I haven't done yet, mainly because they are more philosophical (separating the otherlibs out mitigates actual issues):

  1. C stubs for otherlibs are still installed to $LIBDIR (this is already the case for systhreads, so this is partially done to stay consistent). This could be trivially fixed for all these, but it could be viewed as separate. Note that at present we're saying that -L /usr/lib/ocaml is all you need to give to link in C. I can't remember the reference, but there's at least one application in the wild which links manually with our C stub libraries, because I remember dealing with a bug report relating to it...
  2. std_exit.cmo and std_exit.cmx remain, despite the fact these are really part of the compiler, and not the Standard Library. The compilers impose a requirement that a standard library replacement implements a function called do_at_exit, but the actual implementation of std_exit is part of the compiler, not the stdlib. Indeed, in many ways there's no need for a module called Std_exit at all, we could generate a temporary file.
  3. The runtime and stdlib still share the same directory. This is also a more principled matter, than actually causing issue: there's an open question for whether the runtime should go in +runtime or the stdlib in +stdlib. FWIW, I would advocate the latter - lib/ocaml now reflects the files of the OCaml runtime and compiler with the sub-directories all being libraries. But that can readily be for another day and another release...

This was referenced Apr 16, 2022
@dra27
Copy link
Member Author

dra27 commented May 24, 2022

Thank you everyone for the reviewing time and feedback on this - it's a nice change to have got over the line for 5.0!

gretay-js added a commit to gretay-js/flambda-backend that referenced this pull request Jan 31, 2023
gretay-js added a commit to gretay-js/flambda-backend that referenced this pull request Jan 31, 2023
gretay-js added a commit to ocaml-flambda/flambda-backend that referenced this pull request Feb 1, 2023
gretay-js added a commit to ocaml-flambda/flambda-backend that referenced this pull request Feb 1, 2023
ccasin pushed a commit to ccasin/ocaml-jst that referenced this pull request Feb 10, 2023
…b-directories (port part of ocaml/PR11198) (#1094)

Port part of ocaml/ocaml#11198
@gasche
Copy link
Member

gasche commented Feb 23, 2023

I just observed the following interaction in the OCaml toplevel which I think is a bit confusing for non-experts.

$ ocaml
OCaml version 5.0.0
Enter #help;; for help.

# Unix.sleep 1;;
Alert ocaml_deprecated_auto_include: 
OCaml's lib directory layout changed in 5.0. The unix subdirectory has been
automatically added to the search path, but you should add -I +unix to the
command-line to silence this alert (e.g. by adding unix to the list of
libraries in your dune file, or adding use_unix to your _tags file for
ocamlbuild, or using -package unix for ocamlfind).
Error: Reference to undefined global `Unix'

Is there something easy we can do to give better advice here?

Note: I you start with #load "unix.cma";; you get this alert as well, the "right" way to do things in the toplevel now is #directory "+unix" then #load "unix.cma".

@Octachron
Copy link
Member

The easy way in the toplevel is rather #require "unix";;. We could tune the alert message in the toplevel to point in this direction.

@dra27
Copy link
Member Author

dra27 commented Feb 23, 2023

I'm not sure we should have the compiler recommend a command which requires findlib, though? In this particular case, I think what might be wanted is not to display the alert?

@dra27
Copy link
Member Author

dra27 commented Feb 23, 2023

I've not double-checked, but I think we'd be able to defer displaying the alert for long enough in the toplevel that we'd have the undefined global case already. The alert does want to be displayed if you do something like type t = Unix.file_descr, of course.

@Octachron
Copy link
Member

I think that having an alert in itself is fine? The toplevel user that uses low-level (aka not-library-aware) commands ought to first add the unix directory to its path and then load the library cmxa.

@dra27
Copy link
Member Author

dra27 commented Feb 23, 2023

My thinking is that the code is already an error in 4.x, the alert is meant to warn that accidental use of the types might become an error in future?

@dra27
Copy link
Member Author

dra27 commented Feb 23, 2023

i.e. that the alert is displayed only in cases where OCaml 4.x would not have displayed anything, because it silently worked (because unix.cmi was in the stdlib directory)

@Octachron
Copy link
Member

Unix.sleep failing on an undefined global rather than an unbound identifier is in itself an observable effect of the special casing of Unix.

I think that having a strange and unexplained difference between Mylib.f and Unix.f is already worth an explanation rather than trying to hide what is happening because the code is ultimately failing.

@dra27
Copy link
Member Author

dra27 commented Feb 23, 2023

Oops, yes, I follow your point now. It feels bad to me for us to recommend a directive when it isn't present (I'm sure there are still plenty of first-time users who run the ocaml command before discovering either utop or #use "topfind"). However, the toplevel does know the directives which have been added - what you would both think to the alert recommending #require if it detects that a "require" directive has been added? It's mildly heretical, but in practice topfind and utop mean that #require in effect has only one meaning? Otherwise, it should display the #directory and #use runes.

@dbuenzli
Copy link
Contributor

As we see sometimes trying to be smart makes things more confusing. Why not simply let it go without crazy special casing ? There's release notes to document these changes.

@nojb
Copy link
Contributor

nojb commented Feb 23, 2023

As we see sometimes trying to be smart makes things more confusing.

Just my 2c: personally I am also of the opinion that the "automatic" include logic is not worth the hassle.

@dra27
Copy link
Member Author

dra27 commented Feb 23, 2023

That's be the target, yes - but I'm not sure we can get away with completing a deprecation made in 5.0 in 5.1? I was intending - as with all new deprecations - to check when 5.1 is branched how many packages would still fail if this were made a hard error (and do another round of PRs, like the list above).

@gasche
Copy link
Member

gasche commented Feb 23, 2023

My thinking:

  • I agree with @dbuenzli that we don't want to add more complexity here.
  • What is the behavior when using utop or down/topfind? If users there are not affected, maybe doing nothing is an option. (There are other ways raw ocaml is unpleasant.)

@dra27
Copy link
Member Author

dra27 commented Feb 23, 2023

utop has the correct behaviour (Unix is not a special case in any way):

utop # Unix.sleep 1;;
Error: Unbound module Unix
Hint: Did you mean Unit?

It was able to do that for 5.0 because it's not used for executing scripts in the same way as ocaml. We could for 5.1 remove the automatic inclusion completely for ocaml as well (and then do that for the compilers in a further release or so's time)?

@gasche
Copy link
Member

gasche commented Feb 23, 2023

We could for 5.1 remove the automatic inclusion completely for ocaml as well (and then do that for the compilers in a further release or so's time)?

This sounds like a good idea to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants