Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards an `OCAMLPATH` #8898

Open
dbuenzli opened this issue Aug 27, 2019 · 10 comments

Comments

@dbuenzli
Copy link
Contributor

commented Aug 27, 2019

Most programming languages have a notion of include path to allow to specify relative file lookups in an ordered sequence of absolute install directories. For example there is LIBRARY_PATH and C_INCLUDE_PATH for C, CLASSPATH for Java, the include_path directive of php.ini files, etc.

OCaml has this but in a very limited form via the -I +DIR notation which allows to specify the DIR directory relative to one absolute directory (ocamlc -where) which itself can be set via the CAMLLIB or OCAMLLIB environment variable.

I would like to propose to extend the semantics of this notation in a backward compatible way to support lookup in a proper OCAMLPATH variable. That is:

  1. If OCAMLPATH is undefined nothing changes.
  2. If OCAMLPATH=D1:D2..., the notation -I +DIR indicates for the compiler to lookup for objects in D1/DIR then if not found in D2/DIR, etc.

This would be a first step towards eventually providing better library lookup in the compiler and for the eco-system it formalizes a general file lookup procedure that build systems and infrastructure tools can follow to lookup compilation objects and attribute them short root names under the form of these +DIR relative file path (which you can see as "package" names, see below).

The odig and omod tools have been exploring the idea to move towards suppressing the ocamlfind indirection from the eco-system, something it seems upstream is not entirely cold about it.

However at the moment dealing with opam system compilers and/or mixed opam and system OCaml package installs is messy and done entirely in ad-hoc fashion by these tools -- see the first two points here. I think it would be beneficial for the eco-system if upstream helps in formalizing this. Doing so should have no impact on the ocamlfind system itself which relies on META files to do its own lookups.

The ideas behind this move are the following:

  1. We want to be able to use the file system to determine root (or "package") names and the files that are associated to them. A simple convention for this is given an installation directory $LIBDIR to attribute the files in $LIBDIR/PKG to the package name PKG (and those of $LIBDIR/PKG/SUB to names PKG/SUB or PKG.SUB, etc.). This directly translates into the existing -I +PKG and -I +PKG/SUB.
  2. We want to support multiple package install sources, that is lookup in more than one $LIBDIR, hence have a notion of left-leaning package PATH for lookup: this is OCAMLPATH.
  3. We want this to have as little impact or need for change as possible on the existing world. This proposal disturbs nothing. The ocamlfind world continues to exist untouched. Users of the -I +DIR notation are not affected as long as they don't define OCAMLPATH or define it but start it with $(ocamlc -where). Finally it almost entirely works in the world as it exists: opam installs, ocamlc -where installs. Except for one package: ocaml itself (see below).

The idea is that in opam switches OCAMLPATH will be set to $(ocamlc -where):$(opam var lib). And if you have other packages installed somewhere else by another entity you can simply add them at the front or the end of the OCAMLPATH as you see it fit.

Note that all of this does absolutely not touch the problem of looking up objects/libraries and their dependencies and constitutes in no way an ocamlfind replacement, it just lays down the ground to define root names and subnames based on directories existing on the file system.

This is mostly similar to @lpw25 OCAMLNAMESPACES environment variable used by his namespace proposal. One difference with OCAMLPATH is that you can't bind toplevel root (or "package") names to directories directly. The root names are defined by the toplevel directory names of the directories in OCAMLPATH.

I argue the proposed scheme is not less powerful than the one @lpw25 proposes. In practice you can always rename by creating a directory with appropriate symlinks that you put in front of OCAMLPATH. It also more trivial to understand the naming structure for the end users and other consumers of the variable, an ls of OCAMLPATH directories will do and a stat in each of these dirs can be used to test for the existence of a given name.

One issue that needs to be touched with this proposal, is that ocamlc -where is what should be added in practice to OCAMLPATH that's e.g. where debian installs its packages. However we also want to have ocaml as the root name for the stdlib and compiler-libs "subpackages". Unfortunately ocaml installs the compilation objects of these directly in ocamlc -where. Fundamentally we would like those to be installed in $(ocamlc -where)/ocaml. So that setting OCAMLPATH to $(ocamlc -where) the ocaml upstream libraries are seen in the ocaml "package". So maybe a configure option should be added to allow this install structure, which should eventually be enabled by both system packagers and opam.

Note except for the "where is the ocaml package" problem. The new -I +dir semantics doesn't even have to be implemented to be useful for the eco-system, the tools I mention perform the lookups by themselves. I just think it's good if we can agree with that convention and that tools that need to lookup ocaml installs can do so in a similar and principled manner using the same file-system derived names and an environment variable (OCAMLPATH) blessed by upstream.

@gasche

This comment has been minimized.

Copy link
Member

commented Aug 30, 2019

The idea of installing the standard library in $(ocamlc -where)/ocaml strikes me as odd; I would find it more natural to define OCAMLPATH as $(ocamlc -where)/..; but I don't know what would be the backward-compatibility implications of this approach. More explanations below.

The command ocamlc -where is specified as returning the location of the standard library -- in practice it is more "where the compiler distribution installs all its library stuff".

Currently -I +foo is equivalent to something like -I $(ocamlc -where)/foo. All uses of this feature that I remember seeing in the past where of the specific form +camlp4/..., they were used in legacy build systems to access the camlp4 .cmo files -- before we moved camlp4 into a separate package and this, I think, eventually stopped working. There could remain uses in the wild to access other subdirectories of $(ocamlc -where), which currently are ocamldoc, compiler-libs, caml (contains C headers only), stublibs and {vm,}threads; but generally people favor using a "package name" instead, typically through ocamlfind, to be robust to subdirectories moving to a different place.

The core of your proposal is to eventually be able to write -I +foo to access the package/library foo, wherever we have currently set things up to install ocaml libraries. So for example I could do -I +bigarray and -I +zarith and -I +sexplib, and those would be equivalent to ocamlfind query -i-format {bigarray,zarith,sexplib}. That makes sense and it sounds like a reasonable thing to have.

You propose to keep + with its current semantics, that is $(ocamlc -where), for backward-compatibility, but have a configuration option to eventually install compiler-distribution libraries into a new $(ocamlc -where)/ocaml subdirectory. Personally I find this a bit backwards:

  • ocamlc -where was always specified as the stdlib location, so moving the stdlib to a different place is odd
  • more importantly, on opam setups (which are the common case today), $(ocamlc -where) is the same as $(opam var lib)/ocaml, a subdirectory of $(opam var lib) dedicated to compiler-distribution libraries. Moving things to $(opam var lib)/ocaml/ocaml seems fairly strange and somewhat boilerplate-y to me.

In terms of "nice final state", I feel it would be nicer to interpet + as $(opam var lib)/ in the common case (that is, have opam set OCAMLPATH=$(opam var lib)), and fix legacy build systems that still use + to use +ocaml/ instead -- or a more currently-robust solution such as ocamlfind.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Aug 30, 2019

One thing that has come up trying to work through the problems with CAML_LD_LIBRARY_PATH and searching for correct runtimes is that it is easy to end up calling either the compiler or the runtime with environment variables which are intended for a different compiler/runtime, and it would be good to be considering that with any new environment variable. Two (vague, at this stage) thoughts on that:

  • in the test for +DIR, ignore a candidate directory if the OCaml object files in it's DIR subdirectory are for a different compiler (which limit OCAMLPATH inheriting the problems OCAMLLIB has)
  • allow the use the of "dynamic" paths in OCAMLPATH, and also in the "default " value (i.e. allow the default value of OCAMLPATH to be configurable). So rather than evaluating $(opam var lib) in the shell to set OCAMLPATH, have the compiler invoke it, somewhat like opam's probes for things like sys-ocaml-version). At first glance, I like this less, since it conflicts with being able to relocate the compiler - it's tempting then to build a compiler which knows to default OCAMLPATH to opam var lib --switch=where-i-built-it, and that prevents that compiler being cloned.
@dbuenzli

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2019

@gasche Thanks for your comments.

The idea of installing the standard library in $(ocamlc -where)/ocaml strikes me as odd; I would find it more natural to define OCAMLPATH as $(ocamlc -where)/..;

I agree this is not a good idea and should not be done because it doesn't bring us to a "nice final state". If we do this there's an unwanted aspect due to include paths that are prefix of each others; in an opam non-system switch with OCAMLPATH set to $(ocamlc -where):$(opam var lib) the packages that install in ocamlc -where show up under two different names: +PKG and +ocaml/PKG. It's not good if the "nice final state" has this property.

However the problem we have is the following: in the current world, system packagers (e.g. debian) and some opam packages (IIRC mostly stuff build with ocp-build, these were ocp-* directories) do treat $(ocamlc -where) as their LIBDIR for installing their OCaml compilation objects.

So if we set OCAMLPATH to $(ocamlc -where)/.. in the current world as you suggest then these packages get named as +ocaml/PKG whereas we want them to be named +PKG; the idea of an OCAMLPATH is precisely to have uniform names regardless of your install structure. Solutions for the current world are:

  1. On opam system switches set OCAMLPATH to $(ocamlc -where):$(ocamlc -where)/..:$(opam var lib)
  2. On opam non-system switches set OCAMLPATH to $(ocamlc -where):$(opam var lib)

With the problem that some packages show up under different names. This is not a problem if you are specifying name constraints or lookups, it is however if you are trying to list things (list all installed packages and subpackages) or attribute files to package names (e.g. for generating documentation).

So to sum maybe the best is to simply:

  1. Implement the suggested OCAMLPATH lookup semantics for +.
  2. Ask remaining opam packages that do so not to treat ocamlc -where as their LIBDIR,
    they should install to $(opam var lib)/PKG.
  3. Ask system packagers to no longer treat ocamlc -where as a LIBDIR when they start supporting compilers with OCAMLPATH implemented. They should either use ocamlc -where/.. or another dedicated directory if that's not possible (most likely for system packagers) and set OCAMLPATH so that nothing breaks for their users that use +.

In the debian case this would entail e.g. installing ocaml packages in an /usr/lib/ocaml-pkgs prefix rather than in /usr/lib/ocaml as is currently the case and set OCAMLPATH accordingly.

What do people like @glondu or @rwmjones think about all this ?

@dbuenzli

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2019

@dra27 I'd rather keep the semantics of OCAMLPATH simple and avoid trying to do "smart" things.

EDIT: First because "smart" things tend to make things more complicated to understand when they go wrong. Second because the lookup procedure should be easy to implement by other tools that are not part of upstream.

@rwmjones

This comment has been minimized.

Copy link
Contributor

commented Aug 30, 2019

In the debian case this would entail e.g. installing ocaml packages in an /usr/lib/ocaml-pkgs prefix rather than in /usr/lib/ocaml as is currently the case and set OCAMLPATH accordingly.

What's the reason to change the directory name?

Or are you proposing that the ocaml compiler continues to use ${libdir}/ocaml while extra packages use ${libdir}/ocaml-pkgs? This is churn which doesn't make any real sense for Debian and Fedora where the packaging system can easily resolve who owns what file.

I would also note incidentally that Perl and Python have the concept of vendor and site install directories. I've never found this anything other than confusing however and it's also largely irrelevant when you're using a proper package manager.

@dbuenzli

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2019

@rwmjones in short you are free to do what you want, the only thing we don't want is to have the ocaml libraries in say ${libdir}/ocaml and the compilation objects of a package PKG in ${libdir}/ocaml/PKG.

Why ? Because we would like to be able to get to a system where OCaml packages and subpackages can be specified via relative paths (e.g. +PKG/SUB) that are made absolute w.r.t to the directories of an OCAMLPATH variable.

With the current install structures either we set OCAMLPATH to ${libdir} but then package names show up as +ocaml/PKG, undesirable. Or we set OCAMLPATH to {libdir}/ocaml but then we cannot specify the +ocaml package, undesirable.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Aug 30, 2019

In my namespacing design, I proposed adding a site-packages directory under $OCAMLLIB. So that would normally be <prefix>/lib/ocaml/site-packages. Then OCaml can add its own libraries to that directory along with user's packages. Of course I was also proposing that the directory structure in these directories would become significant -- so you want a brand new place to put things to indicate that they conform to the expected structure.

@glondu

This comment has been minimized.

Copy link
Contributor

commented Sep 2, 2019

I do not get the overall rationale of this. What is wrong with ocamlfind? It seems to already support an OCAMLPATH variable, and if this is not enough maybe it should be fixed there.

Nowadays, I consider the -I +dir feature an anomaly that should be consistently replaced with calls to ocamlfind.

Speaking with my Debian hat, I am not fond of gratuitously introducing another new directory in /usr/lib. However, I agree it would be cleaner to put stdlib in its own sub-directory of ocamlc -where (which should stay the root of packages IMHO), but that should be transparent to users.

@dbuenzli

This comment has been minimized.

Copy link
Contributor Author

commented Sep 2, 2019

I do not get the overall rationale of this. What is wrong with ocamlfind?

Did you ever try to explain the way the OCaml eco-system is structured to a newcomer ?

At the moment we have: opam package names (or system package names), ocamlfind package names, install directory names, library names and module names and each indirection allows to choose a different name for a given piece of software. It's good if we can eventually streamline the system to suppress at least one indirection and make the system as obvious as possible by simply consulting the file system (which is useful for infrastructure tools like odig).

The ocamlfind names are a good candidate for suppression. ocamlfind is a (too) flexible indirection, but it turns out 99% of the packages are not making use of this flexibility and we have a one-to-one map between ocamlfind package names and library names and a one-to-one map between root ocamlfind package names and install directory names.

From that perspective what ocamlfind really provides is only the recursive dependency lookup of libraries via its META files. But the (possibly wrong...) information you provide in these META files is in fact already written in the OCaml objects themselves (see for example the omod and odig projects for a constructive proof of that) and something that could should eventually be done by the compiler itself (which is something it likely should have done from the onset).

If we all try to cooperate here, we can gradually streamline the naming orgy we have at the moment with little end-user disturbance and move towards a system that is easy to explain both to newcomers and to ourselves with as little concepts and names as possible and whose structure is apparent by simply consulting the file system.

This OCAMLPATH proposal is a move towards that.

Speaking with my Debian hat, I am not fond of gratuitously introducing another new directory in /usr/lib.

As I said to @rwmjones that's for you to choose, you could elect to have the libdir of the ocaml compiler install be /usr/lib/ocaml/ocaml, install an ocaml third-party package PKG in /usr/lib/ocaml/PKG and set OCAMLPATH to /usr/lib/ocaml.

Again the only thing that is being asked here is:

  1. Have the OCaml provided libraries be installed somewehere in a directory named ocaml
  2. Do not install third-party libraries in the directory choosen at point 1.
  3. Set the OCAMLPATH variable accordingly.

However, I agree it would be cleaner to put stdlib in its own sub-directory of ocamlc -where (which should stay the root of packages IMHO), but that should be transparent to users.

This I will let @gasche defend more thoroughly if needed, but I agree with him that ocamlc -where should stay the location of the ocaml stdlib install and not the root of ocaml packages. Here are two reasons:

  1. Fundamentally that's the semantics of ocamlc -where as can be witnessed by OCAMLLIB=bla ocamlc -where
  2. You don't want to break -include $(shell ocamlc -where)/Makefile.config.
@gasche

This comment has been minimized.

Copy link
Member

commented Sep 2, 2019

In Fedora today (at least: on my machine), OCaml packages live in /usr/lib64/ocaml/PKG, and the standard library is directly in /usr/lib64/ocaml. Rather than moving OCaml packages to a different place, the most natural state for me would be to have the standard library in its own subdirectory: /usr/lib64/ocaml/ocaml. This way, /usr/lib64/ocaml would have one subdirectory per installed OCaml package (including the compiler), instead of a mix of the (fairly noisy) stdlib files and user packages.

Then /usr/lib64/ocaml would be a natural choice of OCAMLPATH value according to Daniel's proposal here, and in any case its structure would mirror the organization of $(opam var lib), which I suspect could make things easier / more uniform.

Re. ocamlfind: personally I am in no hurry to replace ocamlfind which has served me well over the years and continues to do a superb job, but I also agree with Daniel's point that there are too many different concepts in the OCaml packaging ecosystem, and I am happy that some people continue to experiment with alternative approaches to improve our tooling. If there is a plan that makes some new experiments possible, some things simpler, I think it's worth playing along.

dbuenzli added a commit to dbuenzli/ocaml that referenced this issue Sep 8, 2019
Add Config.ocamlpath, a file search path (ocaml#8898).
If the OCAMLPATH environment variable is set, this is the list of
non-empty, colon separated, paths found therein. If undefined this a
singleton list with the value Config.standard_library.
dbuenzli added a commit to dbuenzli/ocaml that referenced this issue Sep 9, 2019
dbuenzli added a commit to dbuenzli/ocaml that referenced this issue Sep 15, 2019
Add Config.ocamlpath, a file search path (ocaml#8898).
If the OCAMLPATH environment variable is set, this is the list of
non-empty, colon separated, paths found therein. If undefined this a
singleton list with the value Config.standard_library.
dbuenzli added a commit to dbuenzli/ocaml that referenced this issue Sep 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.