Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] project dependency in dune-project files #1498

Closed
ghost opened this issue Oct 24, 2018 · 15 comments
Closed

[RFC] project dependency in dune-project files #1498

ghost opened this issue Oct 24, 2018 · 15 comments
Labels
package management proposal RFC's that are awaiting discussion to be accepted or rejected

Comments

@ghost
Copy link

ghost commented Oct 24, 2018

This ticket proposes the addition of metadata to describe project dependencies in the dune-project file. Such metadata will be ignored by dune itself and are expected to be used and possibly maintained by third-party tools such as duniverse and esy. The actual format is open to discussion. If we get something that works for both duniverse and esy we should consider that the format is good enough.

The motivations for having this information in the dune-project file are:

  • to have a generic format that can be consumed by several tools, so that a user may write such information only once and allow their project to be developed with different tools
  • for existing dune users, this is less configuration files to learn and maintain

Overview

The goal is to allow users to specify a list of external dune projects to consider in order to perform a build from scratch. By build from scratch we mean in an environment where only the OCaml compiler is available but no other OCaml tools are present. External dependencies coming from third party package managers such as debian, homebrew, opam, npm, ... are not considered by this proposal.

Such project dependencies are distinct from usual package dependencies as found in package managers. Each dune project may define several packages and the dependency graph between such packages may be different from the one between projects. When considering a self-contained workspace, dune should have enough information to synthesise package dependencies, taking this burden away from developers.

Metadata

The metadata consist of a list of external dune projects specified directly as source code repository together with a selector. This will typically be a git url + either a commit hash, a tag name or a branch name. Using plain URLs is a simpler way to identify a piece of external code. In particular, it avoids the need for a central database to resolve abstract names to actual source code.

This list should only include the direct dependencies of the project, not the transitive ones. Transitive dependencies should be discovered by third-party tools such as duniverse or esy by scanning the project dependencies recursively.

Here is how such a list could typically look like:

(imports
 (https://github.com/ocaml/ocaml 4.08)
 (https://github.com/ocsigen/lwt 4)
 (https://github.com/ocaml/re 1)
)

Recommended usage

When considering the transitive dependencies, it is important that identical URLs have identical selectors. To allow some flexibility it is expected that the selector matches only the major version number. When this selector points to a branch, the tip of this branch should be considered and it is expected that developers only push backward compatible changes to such branches. In particular, while this is allowed, it is not recommended to use master as a selector given that the master of most project doesn't provide backward compatibility guarantee.

When starting a project or adding new project dependencies to an existing project, it might not be obvious to the user how to choose an appropriate set of selectors. They should be able to only write the project URL and let the tool (duniverse/esy) choose an appropriate set for them. Once the tool has made a choice, this choice should be stored in the dune-project file by adding selectors to the project URLs without one.

Support

Dune will provide a dune.project library supporting reading and writing the imports stanza in the dune-project file.

@avsm
Copy link
Member

avsm commented Oct 24, 2018

This:

(imports
 (https://github.com/ocaml/ocaml 4.08)
 (https://github.com/ocsigen/lwt 4)
 (https://github.com/ocaml/re 1))

is sufficient for the duniverse to operate -- I can modify the duniverse tool metadata to use this format very quickly. Some thoughts:

  • should we import into a specific subdirectory? The git url can sometimes be duplicated in the last segment, so explicitly specifying a local subdirectory to import to would also let Dune do 'smart vendoring' in the future and be able to distinguish imported code from locally written code.
(imports
(vendor/ocaml https://github.com/ocaml/ocaml 4.08)
(vendor/lwt https://github.com/ocsigen/lwt 4)
(vendor/re https://github.com/ocaml/re 1))
  • it's good practise to replace the mutable tags with sha256 references for being robust against upstream changes, but this is obviously difficult for a user to use. We could add an optional field for tools to resolve the tag specified above into a sha hash, and promote that into the source tree.
(imports
(vendor/ocaml https://github.com/ocaml/ocaml 4.08 a24ac531b7a6f40651dcafab64ec28d3ef8e880e)
(vendor/lwt https://github.com/ocsigen/lwt 4 9914bbde11384313f7b180e2ff466d6a08cb36cf)
(vendor/re https://github.com/ocaml/re 1 7e8ad7580a65431a0fb4cb8bac46e91ea769dcc5))

A bit ugly, but the user would never have to cut and paste the hash in directly... a tool could generate that when it does the import.

@hcarty
Copy link
Member

hcarty commented Oct 24, 2018

should we import into a specific subdirectory?

It may be worth specifying only a top-level target directory and using a (potentially simplified) subset of the URL as a subdirectory name:

(imports
 (path vendor/)
 (https://github.com/ocaml/ocaml 4.08)
 (https://github.com/ocsigen/lwt 4)
 (https://github.com/ocaml/re 1))

which could map to vendor/github.com/ocaml/ocaml, vendor/github.com/ocsigen/lwt, vendor/github.com/ocaml/re in the local source tree. This would eliminate some duplication in the project configuration and makes the mapping from import to local source tree location consistent.

@ghost
Copy link
Author

ghost commented Oct 24, 2018

What's the point of specifying a vendoring directory instead of letting the duniverse tool choose one?

@avsm
Copy link
Member

avsm commented Oct 24, 2018

Because then dune will unambiguously be able to know which subdirectories are vendored code and which are direct local code. This could, e.g., affect default profiles so that we filter out warnings on vendored code (which is out of the control of the local developer, and annoying to see all the time).

It would be good to be precise about the chosen directory layout, as different tools may have different resolution mechanisms for clashing urls. For example, consider https://github.com/ocaml/ocaml and https://github.com/lib-bindings/ocaml. Both could have different conventions for the subdirectory name chosen (e.g. ocaml and ocaml-2, or ocaml-ocaml and lib-bindings-ocaml). By specifying it in the file, there is no ambiguity, but at the cost of more verbosity.

@rgrinberg
Copy link
Member

it's good practise to replace the mutable tags with sha256 references for being robust against upstream changes, but this is obviously difficult for a user to use. We could add an optional field for tools to resolve the tag specified above into a sha hash, and promote that into the source tree.

As I've said in slack, this idea is outside of this feature's scope. Once duniverse creates a lock file for dune, that should be the source of truth for creating a reproducible build. Note that this only a single lock file, it's possible that a project may have multiple lock files (for different platforms for example), and it doesn't make sense to this lock info as project global.

Because then dune will unambiguously be able to know which subdirectories are vendored code and which are direct local code. This could, e.g., affect default profiles so that we filter out warnings on vendored code (which is out of the control of the local developer, and annoying to see all the time).

Note that we'd like to add a (vendored) stanza to mark subtrees as vendored. I think this would give a tool the flexibility to put the files anywhere as long as a dune file with that stanza is added.

@avsm
Copy link
Member

avsm commented Oct 24, 2018

As I've said in slack, this idea is outside of this feature's scope. Once duniverse creates a lock file for dune, that should be the source of truth for creating a reproducible build.

yes, that's fine by me -- I agree with this.

@ghost
Copy link
Author

ghost commented Oct 24, 2018

@andreypopp or @jordwalke could you have a look at this proposal and comment whether that's something you could use for esy?

@jordwalke
Copy link
Contributor

jordwalke commented Oct 25, 2018

Such project dependencies are distinct from usual package dependencies as found in package managers. Each dune project may define several packages and the dependency graph between such packages may be different from the one between projects.

I think I see a way that new users could be confused by this proposal. Right now when people use Dune with package managers, they are already confused because there are already two levels of "groupings". First they have to declare "package dependencies", and then inside of their dune configs they create "libraries", and describe the dependencies between these libraries which may or may not span the boundaries of package manager packages. Both of those have utility although it's not obvious to everyone why.

I wonder if adding a new kind of dependency "distinct from usual package dependencies as found in package managers" will confuse people more if they already have to wrap their heads around the other two kinds of dependencies.

I initially thought your proposal was to just place opam package dependencies in a dune.project file and that might have been noisy, but it wouldn't be so bad because it would merely be redundant, not different. Will people cloning / contributing to projects still encounter the three kinds of "grouping" metadata? Have you considered having opam files support these "distributed dependencies" in the opam file? Then if you really like the sexpression format, you can have the opam file generated from the dune file.

If generating an opam file from a dune.project, something also needs to record the constraints somewhere. So I imagine it would end up in the dune.project file and then it just becomes a replacement for an opam file (which is cool, as I like the sexpression format better).

@jordwalke
Copy link
Contributor

The goal is to allow users to specify a list of external dune projects to consider in order to perform a build from scratch. By build from scratch we mean in an environment where only the OCaml compiler is available but no other OCaml tools are present.

What if those dune projects you depend on did not also include a dune.project, but merely had their dependencies declared in an opam file? I imagine dune.project requires all your dependencies to also have dune.project if you want to be able to download all artifacts without any package manager. But then it just kind of becomes a package manager at that point right?

The motivations for having this information in the dune-project file are:

  • to have a generic format that can be consumed by several tools, so that a user may write such information only once and allow their project to be developed with different tools

The opam file format is one example where we have done this successfully with esy. esy speaks opam file format natively. opam-cli also speaks opam file format natively. I suppose we could make esy also speak dune.project without much trouble. I'm just wondering how much of this proposal is motivated by opam not supporting git dependencies natively in the opam file, and not being s-expression based. What if it did support git dependencies, and then what if you made opam-cli accept s-expression formatted opam files. Isn't that what you want?

@ghost
Copy link
Author

ghost commented Oct 25, 2018

The format is not very important, although we are starting to have a multitude of formats and it's becoming hard to follow and maintain, so there is some value is regrouping all the relevant configuration in a single file. Opam files don't have enough information to setup the build of OCaml projects in general. So essentially, we do need to write something like dune files. So if we have to pick one between dune/opam/... for the configuration files, we might as well choose dune.

Currently, the grouping in dune works as follow:

  • a library is a shared collection of modules
  • a package is a shared collection of libraries, executables and other data files
  • a project is a shared collection of packages and non-shared private items

With the current proposal, the package grouping becomes less relevant, however it is still convenient when writing integrated tests as it is simpler to declare a dependency on a whole package rather than on individual items. That said, we should indeed allow packages to be declared by other means than by the presence of opam files, as this part is often source of confusion.

For the rationale behind this proposal, the idea is that the easiest way to describe an external piece of code is by a URL. Once you consider a project and all the external code it depends on as given by this list of URLs, then you have everything you need to understand the whole project, in particular how to build every binary from scratch.

@andreypopp
Copy link
Member

I really the idea of using dune-project to describe project dependencies and how to acquire them! I'd like to see support for dune-project files in esy.

Comments on the proposal

I have few comments though.

The motivations for having this information in the dune-project file are:

  • to have a generic format that can be consumed by several tools, so that a user may write such information only once and allow their project to be developed with different tools
  • for existing dune users, this is less configuration files to learn and maintain

I think the proposal contradicts these points listed as motivational.

  • to have a generic format that can be consumed by several tools, so that a user may write such information only once and allow their project to be developed with different tools

The proposed format doesn't look generic to me. It excludes packages published on opam (and npm). As far as I can see to use this in my project I'd need to make sure I have a clear mapping from currently used packages published on opam/npm to their corresponding source code repositories and more over I'd have to have mapping from released versions to source code refs/branches/commits.

I like the simplicity though — one can generate a simple bash script with calls to git commands to install all the dependencies. I don't think though there's a good transitional path from the current way people develop their projects to this.

  • for existing dune users, this is less configuration files to learn and maintain

Don't existing dune users still want to publish their dune projects on opam? With the proposed dune-project dependencies format they'd still need to maintain *.opam files by hand as I understand. I think it would be great if this could allow to generate *.opam files out of dune-project files.

For the rationale behind this proposal, the idea is that the easiest way to describe an external piece of code is by a URL. Once you consider a project and all the external code it depends on as given by this list of URLs, then you have everything you need to understand the whole project, in particular how to build every binary from scratch.

This maybe true if you have every package you use built by dune. But this isn't the case even now when dune is seeing much adoption.

Even if OCaml community will adopt dune 100% (I hope this will happen soon!) there will still be software which won't be using dune — written in C/C++/Rust/... and so on. To use such dependencies with the proposed scheme you'd still need to rely on opam or OS package manager, am I right?

(In esy we try to bring such dependencies via esy — this is how it is done with pkg-config (https://github.com/esy-packages/pkg-config)).

imports vs. dependencies

nitpick: I think using (dependencies ...) will be less confusing than (imports ...).

Alternative proposal

I'd like to propose an alternative way to specify dependencies in dune-project file. The main idea is to make dependencies declaration open to interpretation.

Example:

(dependencies
  (ocaml github:ocaml/ocaml 4.08)
  (reason npm ^3.3.3)
  (lwt opam >=4.0.0 < 5.0.0)
  )

For each dependency the scheme is:

(<package-name> <package-source> <version-constraint>...)

Where <version-constraint>... format depends on <package-source>.

I think it is also useful to allow to specify a list of source/constraint pairs for any specific dependency. Example:

(dependencies
  (ocaml github:ocaml/ocaml 4.08)
  (reason (npm ^3.3.3) (github:facebook/reason v3.3.3))
  (lwt (opam >=4.0.0 < 5.0.0) (github:ocsigen/lwt v4.3.3))
  )

This will allow for different tools to choose what source to use depending on the internal heuristics and/or level of support for different package sources.

Note that Dune itself won't need to interpret neither <package-source> nor <version-constraint>.

Aside: it might makes sense for dune to understand <package-name> though. For example to automatically include corresponding (same named) ocamlfind packages into (library ...) and (executable ...) stanzas if those don't have (libraries ...) configured. This will allow to make configuration even more simple, especially for starters.

Example: duniverse

Given the following dune-project:

(dependencies
  (ocaml github:ocaml/ocaml 4.08)
  (reason (npm ^3.3.3) (github:facebook/reason v3.3.3))
  (lwt (opam >=4.0.0 < 5.0.0) (github:ocsigen/lwt v4.3.3))
  )

Duniverse will filter out unsupported package source for every listed package:

(dependencies
  (ocaml github:ocaml/ocaml 4.08)
  (reason github:facebook/reason v3.3.3)
  (lwt github:ocsigen/lwt v4.3.3)
  )

If some package don't have a package source supported by duniverse then duniverse will fail. I think it makes sense for duniverse to allow to locally set/override package source for any of the dependencies.

Example: opam

The mechanics is the same as with duniverse. opam would need to be able to read metadata from dune-project files and extract compatible sources from there.

Example: esy

Same as above.

Discussion: kinds of dependencies

In esy (and in opam) we have different kinds of dependencies - dev, test, build, doc. It might make sense to include such info in dune-project too:

(dependencies
  (lwt (opam >=4.0.0 < 5.0.0) (github:ocsigen/lwt v4.3.3))
  (reason (kind build) (npm ^3.3.3) (github:facebook/reason v3.3.3)
  (odoc (kind doc test) (opam *) (github:ocaml/odoc master)
  )

@ghost
Copy link
Author

ghost commented Oct 29, 2018

I was wondering about leaving the contents open to interpretation but I hadn't considered allowing several sources. That seems like a good idea to me, especially if opam can use this information as well. Although it's problematic for projects that define several packages, not sure how to deal with these :/

Regarding 100% adoption of Dune, that's indeed entirely up to individual developers whether they want to use Dune or not. All we can do is make Dune as good as possible. However, using Dune systematically does unlock exciting new features such as large scale refactoring. For instance, if all OCaml projects where using Dune, then it would become possible for one person to do a breaking change in the stdlib and then upgrade the entire OCaml universe. It's nice if we can provide this feature at least for self-contained subsets of OCaml projects using Dune.

Aside: it might makes sense for dune to understand though. For example to automatically include corresponding (same named) ocamlfind packages into (library ...) and (executable ...) stanzas if those don't have (libraries ...) configured. This will allow to make configuration even more simple, especially for starters.

That would be nice as well, but there is some work to do before this can work. It feels like we almost want to write something like this and let the system sort out what it means:

(using github.com/ocsigen/lwt v4.3.3)
(using github.com/facebook/reason v3.3.3)

@orbitz
Copy link

orbitz commented Nov 27, 2018

@avsm referenced this ticket in a response to me in the Ocaml mailing-list. I have been doing something very similar to this for a few years with my Ocaml build tools and wanted to document my experiences in hopes they could be useful to you.

The setup I have is two tools:

  • pds - A build tool which takes a TOML file and produces a Makefile (not really important for this, but what is important is that the pds config is designed to be easily read by other languages).
  • hll - A tool that takes a pds configuration and an hll configuration and generates an opam package. The important part is that hll just deals in abstract names and depends on opam to map those to artifacts.

An example can be found here:

pds.conf defines dependencies in terms of ocamlfind packages.

hll.conf provides a way to add other dependencies to the build as well as remap dependencies to their package name, for in the example I gave the build dependencies ppx_deriving.show and ppx_deriving.eq are remapped to ppx_deriving.

I think two takeaways that relate to this proposal from my experience:

  1. Originally wanted pds to specify all its dependencies but once I realized nobody else is going to use pds I decided that just made a lot of work for me and punted on the issue while using opam. Eventually I decided opam is the common language across all packages and made hll to create opam packages. That way I could hook into all of the rest of stuff generated by the ocaml community and I didn't have to try to build my own set of package definitions from scratch.
  2. Keeping the tools and config files separate made building tooling around it easier. The original problem I wanted to solve was just building and then I got sick of hand-making opam packages. My original thought was to add that information to pds.conf but then I thought of other types of packages I might want to generate from pds.conf, the config file specification would quickly grow to be very large and largely irrelevant to most users.

My use case is different than the one described here of building from scratch. IMO, using opam as the common package definition and fleshing out ideas like opam-bundler would be higher leverage. But as always, YMMV.

@davidmlw
Copy link

Version control is the job of git (submodule). I would suggest a scheme for multiple project work together with project dependencies.

@rgrinberg rgrinberg added proposal RFC's that are awaiting discussion to be accepted or rejected package management labels Feb 23, 2023
@rgrinberg
Copy link
Member

We're working on package management now, and the points of this proposal are acknowledged. For now, to change the resolution of packages, one can configure repositories when solving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package management proposal RFC's that are awaiting discussion to be accepted or rejected
Projects
None yet
Development

No branches or pull requests

7 participants