Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Dune Package Management #7680

Closed
tmattio opened this issue May 4, 2023 · 39 comments
Closed

[RFC] Dune Package Management #7680

tmattio opened this issue May 4, 2023 · 39 comments
Labels

Comments

@tmattio
Copy link
Collaborator

tmattio commented May 4, 2023

This document serves as a Request for Comments (RFC) on adding package management support to Dune.

At present, the project is in the prototyping stage, and we are exploring different approaches. Consequently, this document is intended to provide a guideline rather than a definitive specification for the implementation of package management in Dune. As we continue to prototype features and make progress, we will update this document accordingly.

The primary aim of this document is to stimulate community feedback and invite discussions about our plans.

We will generally stay high-level and describe the user experience rather than how we intend to implement each part. The specific implementation details can be discussed in separate issues and Pull Requests.

Outlined below are the high-level principles that guide our integration of package management into Dune:

  1. Seamless integration: Package management features should be fully integrated. Users shouldn't have to read the opam manual to learn how to use package management features in Dune.
  2. Compatibility with existing features: The package management features should work seamlessly with existing Dune features, such as watch mode. These features should work in an obvious way with package management.
  3. Windows support: The first stable version of the package management system must include Windows support; it will not be considered stable until Windows is supported.
  4. Cross-compilation: Support for cross-compilation should be a first-class citizen.
  5. Build directory isolation: No state should exist outside the _build directory.
  6. Explicit state modification: Commands that modify the state implicitly, such as pinning, are forbidden.
  7. Performance: Features should be as fast as they can be.

TL;DR: Here are some elements from the RFC we'd like to highlight:

  • We will cache the build of dependencies, including the compiler, eliminating the need to recompile the compiler for every project.
  • Lockfiles will contain dependencies source and build instructions, with dune build using the lockfile by default, it will result in faster compilation as we don't need the solver or the opam repository.
  • We will support cross-compilation for arbitrary opam packages, removing the need to port packages to Dune to leverage cross-compilation.
  • dune-workspace files will be used to configure compiler versions through contexts, allowing seamless testing on different compiler versions.
  • Windows support will be provided as a first-class citizen.
  • We will support local sources for dependencies, enabling concurrent work on multiple projects. Changes in local sources will be detected, and dependent projects will be automatically rebuilt.

Overview

Package management in Dune aims to improve the user experience on the OCaml Platform by addressing the friction caused by using separate opam and dune CLI tools.

From the user's perspective, downloading and building dependencies become an integral part of building a project. Dune becomes the primary tool that users install on their systems to work in OCaml. The workflow to go from a clean environment to a compiled project looks like:

$ apt install dune # or winget install dune
$ git clone git@github.com/xyz/project-a.git
$ cd project-a
$ dune build

Executing dune build will download and build the project's dependencies before proceeding with the project build.

Note: extending the dune build command to install dependencies is still under discussion from the Dune team and we might introduce a separate command to install dependencies.

To achieve this, several steps are taken under the hood:

  1. Dune retrieves project dependencies from the dune-project file, where they are defined in the existing package stanza.
  2. Dune reads the dune-workspace file to gather additional workspace configurations, such as the opam repositories to use and the compiler version. If no dune-workspace file is present, Dune will use sensible defaults like the ocaml/opam-repository and the latest version of the compiler.
  3. Using the dependencies and configuration, Dune generates a lock file containing specific versions of each dependency, their source, and build commands.
  4. Relying on the generated lockfile, Dune fetches the dependencies, stores them in the _build directory, and builds them.
  5. Once the dependencies are built, the project build proceeds as usual by compiling the project's source tree.

An important aspect of our current plan involves leveraging opam as a library, often referred to as opam 3. This approach allows us to re-use existing opam features that we would otherwise need to re-implement in Dune.

In this scenario, the opam client will continue to exist as a separate tool that utilizes the opam library. This separation ensures that the opam client remains available for non-dune users, and enforces a compatibility between the Dune and opam client package management functionnalities.

It is important to note that following Principle 1. the use of opam as a library is considered an implementation detail and will not have any impact on the user.

Fetching sources

PR: Source fetching

The source fetching in Dune will support every source specified in the opam url field, ensuring compatibility with existing packages on the opam-repository. Additionally, Dune will also support local sources, offering a workflow to work on changes spanning multiple projects.

When fetching sources, Dune will create a dedicated directory within the _build directory. This directory is intended for Dune's private use and will be hidden from the user.

Caching

RFC: Shared Build Sandboxes

Dune's package management will use a global cache for all dependencies. The caching mechanism will also work with the compiler, eliminating the need to recompile it for each project.

The shared build sandboxes will be set up in a way that they produce the same paths regardless of the workspace location. This is achieved by maintaining all sandboxes in a single directory and using a hash based on the action and its dependencies. A file lock system will be used to manage concurrent access to the sandbox during the build process.

Sandboxing

All components and state associated with a project remain confined within the project itself, with the exception of the cache. The best comparison with opam is that we're using local opam switch for every project.

We plan to support all the sandboxing functionality offered by opam through bubble wrap. Moreover, we'll forbid package builds from accessing packages that aren't listed as dependencies.

All state information is contained within the _build directory, which is internal and managed exclusively by Dune.

We plan to implement caching for the compiler to prevent the need for recompiling the compiler for every project.

By default, the opam-system compiler will not be used.

It's worth noting that opam switches won't be supported by Dune, the sandboxing mechanism is specific to Dune.

State and Configuration

All components and settings are maintained within the project, and that nothing resides outside the workspace.

Configuration information is stored in the dune-workspace file, which includes details about the opam repository, compiler, and sources.

Dune will support multiple opam repositories within the dune-workspace.

We will also keep the notion of context in the dune-workspace, where users can define multiple contexts with different configurations. This will make testing with different versions of the compiler, or different opam repository extremely easy.

Building opam Packages

PR: Build opam Packages

As a requirement to build every packages from the opam repository, we will implement support for building arbitrary opam packages, by reading build instructions from opam files, executing them and producing build artifacts in the _build directory.

Support for building opam packages will extend to vendored packages as well. Dune will be capable of vendoring non-dune opam packages.

In the initial version, for opam packages that already use Dune, the build process will remain the same as for other opam packages. Dune will be installed as a dependency and the installed Dune will be used to build the dependencies.

To keep things simple for the initial version, this workflow will be used universally, with the possibility of optimizing and using fully composed rules in future versions.

Cross-compilation

Dune will support cross-compilation for all opam packages, eliminating the need to port packages to Dune to leverage cross-compilation.

Cross-compilation support is possible under the condition that code generators are located in a separate package from their users. Specifically, a package cannot build a code generator and use it simultaneously.

There are still unanswered questions, like how to support packages that already support cross-compilation without Dune, such as the low-level Mirage packages (mirage-*-solo5, mirage-*-xen, etc.), or packages that install cross-compilers such as ocaml-freestanding. We aim to maintain compatibility and work with these existing cross-compilation configurations and we'll explore solutions for these as we prototype building opam packages.

Windows support

Following Principle 3., we will not accept features or changes that would cause issues on Windows. Support for Windows will be first-class as soon as the first stable version. We will be uncompromising on this.

Lockfiles

Dune will produce a lockfile when building projects. The lockfile will contain a stanza listing all dependencies, their sources, and the required build actions. When a lockfile is present, dune build will use it by default.

A important design for the lockfile is that it contains both sources and build commands, which means that reading from the opam-repository is not required when a lockfile is available. As a result, when a lockfile is available, we're removing two steps that are currently needed when installing dependencies with opam: (1) running the solver and (2) reading from the opam-repository to get the build instructions and source. This will make compilation of projects significantly faster, essentially consisting of downloading the sources of the dependencies and building them.

The generation of the lockfile isn't part of the Build, Dune will provide a separate command (e.g. dune lock) to generate the lockfile.

Watch Mode

Following Principle 2., package management will integrate well with watch mode and will work in the way that users would expect (i.e. in an obvious way).

Dune will monitor the dune-project and dune-workspace files as well as the lock file and will automatically update the dependencies on changes. From a user point of view, this means that you won't need to leave the editor when running Dune in watch mode. Every change on the workspace can be performed by updating the source files.

System Dependencies

Dune will not automatically install missing system dependencies (aka depext). Instead, it will provide a user interface to inform users about missing dependencies and offer hints on how to install them. Building on top of opam's depext system, Dune can include build rules that check for the availability of required system packages and provide custom error messages when these rules do not succeed.

To facilitate the installation of system dependencies, we will explore the possibility of introducing an explicit command that handles the installation of system dependencies. This command would leverage the functionality of opam-depext to test the presence or absence of such dependencies and install them if needed.

Vendoring and Pinning

In version 1, there will be no specific vendoring workflow beyond the existing vendoring features already supported by Dune.

However, an alternative workflow will be provided by supporting the local filesystem as a source for dependencies. This can be configured in the dune-workspace. With this approach, Dune will be able to monitor changes in the dependencies sources and rebuild the project as needed. This effectively supports use cases to work on multiple projects at the same time.

This also offers an alternative workflow to opam pins. It's worth noting that opam pins are not supported by the opam repository, therefore this configuration is specific to the workspace (hence why it belongs to dune-workspace and not dune-project), and it will not be used when releasing packages.

This workflow enables substituting custom sources for any package listed in the lock file, not only the leaves. Sources for packages are typically defined by actions that fetch URLs; however, with this approach, the source of a package can be populated from a directory in the workspace. To implement this feature, the corresponding directory should be excluded from the normal build process by using data_only_dirs. This approach addresses the pain point of having to vendor all the transitive reverse dependencies of any deep dependency that needs modification, allowing for more flexibility in vendoring specific parts of the package graph.

Solver

In the initial version of the integrated package management, Dune will use 0install as its solver.

0install will receive its input from the dune-project package field and the dune-workspace configuration.

Users will have the option to point to alternative opam-repository and even use multiple repositories in their dune-workspace configuration.

From experience with opam-monorepo, which uses 0install, we expect that one challenge will be the clarity of error messages. While we recognise that having user friendly error messages from solvers is generally a hard challenge, we will be working upstream when appropriate to improve the quality of error messages.

@tmattio tmattio changed the title # [RFC] Dune Package Management [RFC] Dune Package Management May 4, 2023
@tjdevries
Copy link

There's a few commands that I've found really make the package management feel very "built in" when using cargo for rust projects. The main ones are things like:

  • cargo add project[@version] - Add a dependency, by default adds current version as required and exactversion. Updates all assorted project and lock files
  • cargo rm project - Remove a dependency, also updates all assorted project and lock files

Not sure if these are planned, but would love to see dune add X and dune rm Y, which makes it very easy for people (and tutorials!) to configure projects without having to manually edit, at least at first. I also like not having to edit the files later if it's easy enough to run the command (or have my tooling do it for me 😉 dune add Base is a lot easier than going and editing the file by hand).

One other one that I've found quite useful is cargo tree found here: https://docs.rs/crate/cargo-tree/latest but that's definitely more of a "nice-to-have" kind of utility.

@rgrinberg
Copy link
Member

Thanks for the interest. I'm not sure how cargo works, but in OCaml we can have more than one package per project. So it's not clear to me how dune add foo[@version] would know which package in the project should take foo as a dependency.

Moreover, even if we add the package as a dependency, it would still not be usable since you would need to add it the appropriate library dependency in your executable or library (Due to legacy reasons, we have two totally separate library and package namespaces).

Adding something like cargo tree would be reasonable. In fact, we have something that's mostly machine readable already with dune describe workspace. Perhaps we could work on improving that.

@dangdennis
Copy link

What if you can perform “dune add” when you’re executing it in a particular package’s directory? If the dune file correlates to a project defined in dune project, then dune can also add the newly installed dependency there too?

Only prodding the idea, not pushing for it.

@Et7f3
Copy link
Contributor

Et7f3 commented May 6, 2023

There's a few commands that I've found really make the package management feel very "built in" when using cargo for rust projects. The main ones are things like:

  • cargo add project[@version] - Add a dependency, by default adds current version as required and exactversion. Updates all assorted project and lock files
  • cargo rm project - Remove a dependency, also updates all assorted project and lock files

Not sure if these are planned, but would love to see dune add X and dune rm Y, which makes it very easy for people (and tutorials!) to configure projects without having to manually edit, at least at first. I also like not having to edit the files later if it's easy enough to run the command (or have my tooling do it for me 😉 dune add Base is a lot easier than going and editing the file by hand).

One other one that I've found quite useful is cargo tree found here: https://docs.rs/crate/cargo-tree/latest but that's definitely more of a "nice-to-have" kind of utility.

Why requiring dune add/rm when you can just read it from written stanza. If I use base.something in libraries field I probably want's base package. I think this command should be used when we need version range and or we want something specific like pkgs[subfeature] like pip. When autodetecting package the default range can semver around actual version. If latest base is 0.15.3 default range is >=0.15.3 <0.16.0. We should strive for published package name=public name so we have unique correspondence. Go decided for it's unique name to choose repo url. The add/rm can also be used when we need to override a dependency to add patch for instance or points to another source.

the lockfile contains top library used so we can read direct dependency and track propagated one.

also please do like alpine/apk choose del (3 letters) rather than rm (2 letters) so it is the same that add. Don't do like apt with install/remove but not uninstall....

@rgrinberg
Copy link
Member

What if you can perform “dune add” when you’re executing it in a particular package’s directory? If the dune file correlates to a project defined in dune project, then dune can also add the newly installed dependency there too?

Only prodding the idea, not pushing for it.

I'm lukewarm about it. The additional cd negates half the convenience and this convention isn't obvious enough that I'm sure it will confuse at least some users.

@psafont
Copy link

psafont commented May 9, 2023

Thank you so much for starting this project. I'm one of the maintainers of the ocaml daemons for the Xenserver linux distribution. This project has a convoluted set of requirements, which the current dune / opam stack can cater to, even if it has quite a bit of maintenance overheard.

From what I've read it seems that the new dune packaging is going to be much better: developers want a frictionless way to install dependencies, maintainers want to easily lock and manage dependency updates, and the distribution build system needs to be completely offline.

While the current proposal works wonders for the first 2, the third one still needs some more consideration:

  • There's support for using the system's ocaml when compiling (in an opt-in way)
  • There's support for using local libraries (currently OCAMLPATH is used to use the xen-compiled ocaml libraries)
  • There's support for avoiding downloading sources and using local sources
  • What's missing from the proposal is the creation of an archive / tarball of the sources so it can be later used to build from local sources. Is there any plan for this, could you please consider inclusion of a command that does this?

@rgrinberg
Copy link
Member

There's support for using the system's ocaml when compiling (in an opt-in way)

This will be available.

There's support for using local libraries (currently OCAMLPATH is used to use the xen-compiled ocaml libraries)

Are the packages mentioned in the lock file allowed to use these libraries?

There's support for avoiding downloading sources and using local sources

For now, we're heavily relying on opam's capabilities for this. This is unlikely to change for v1.

What's missing from the proposal is the creation of an archive / tarball of the sources so it can be later used to build from local sources. Is there any plan for this, could you please consider inclusion of a command that does this?

It seems like an interesting feature, but we're focusing on the smallest possible set of features that would make this feature useful for v1. I don't think the design that we've chosen right now would prevent this feature from being added later, so for now, this feature is postponed until v1 is out.

@avsm
Copy link
Member

avsm commented May 9, 2023

There's support for avoiding downloading sources and using local sources

For now, we're heavily relying on opam's capabilities for this. This is unlikely to change for v1.

The dune support is independent of the user's opam configuration, isnt it? If so, then just defining a well-known place in the repository structure that sources can be dropped by an external CI system would be sufficient. This must already exist with the build cookies in _build...

@avsm
Copy link
Member

avsm commented May 9, 2023

One other one that I've found quite useful is cargo tree found here: https://docs.rs/crate/cargo-tree/latest but that's definitely more of a "nice-to-have" kind of utility.

Note that opam 2.2 will have support for this: ocaml/opam#3775, so it should become easier to pull into a future version of the dune integration as well.

@rgrinberg
Copy link
Member

There's support for avoiding downloading sources and using local sources

For now, we're heavily relying on opam's capabilities for this. This is unlikely to change for v1.

The dune support is independent of the user's opam configuration, isnt it? If so, then just defining a well-known place in the repository structure that sources can be dropped by an external CI system would be sufficient. This must already exist with the build cookies in _build...

By local sources I meant setting up download mirrors, custom repositories, etc. It will definitely be possible to use sources available in the local filesystem (it's already supported in fact).

@rgrinberg rgrinberg added proposal RFC's that are awaiting discussion to be accepted or rejected accepted accepted proposals and removed proposal RFC's that are awaiting discussion to be accepted or rejected labels May 9, 2023
@psafont
Copy link

psafont commented May 10, 2023

Are the packages mentioned in the lock file allowed to use these libraries?

Currently they don't need to, previously some shady rsyncing business was in place to allow for it. We're trying to avoid the need in general, although we might be able to build the xen libraries using dune.

I don't think the design that we've chosen right now would prevent this feature from being added later, so for now, this feature is postponed until v1 is out.

That's nice to hear, thanks!

@hannesm
Copy link
Member

hannesm commented May 16, 2023

Great initiative. Will this then replace opam? If I understand it correctly, still the opam-repository parts (i.e. update / solver) are done by opam, not by dune?

What I wonder about is:

Lockfiles

Dune will produce a lockfile when building projects. The lockfile will contain a stanza listing all dependencies, their sources, and the required build actions. When a lockfile is present, dune build will use it by default.
...
The generation of the lockfile isn't part of the Build, Dune will provide a separate command (e.g. dune lock) to generate the lockfile.

So, what is the difference between "when building projects" and "part of the Build"?

I would like to point to opam switch export --full --freeze -- which is an opam subcommand (since 2.1 IIRC) that was designed for reproducible builds (using the definition from https://reproducible-builds.org/ -- collect input that allows to produce the bitwise-identical output, see https://github.com/roburio/orb for the full tool (using opam's API but not the opam binary to conduct builds in containers / jails)), or rephrased is part of the build-info (or SBOM if you put some json around it) of an OCaml project (next to host packages, and environment variables). That may be of interest, though I don't quite understand what the goal of "dune lockfiles" is (neither do I understand the "opam lock" subcommand) -- but I'm sure you've a clear vision for it.

@rgrinberg
Copy link
Member

Great initiative. Will this then replace opam? If I understand it correctly, still the opam-repository parts (i.e. update / solver) are done by opam, not by dune?

There's no goal to replace opam. We are focused only on implementing a project local workflow and we aren't interested in our own package repository. We also reuse opam's code whenever it's reasonable (such as solving as you've alluded)

So, what is the difference between "when building projects" and "part of the Build"?

Not part of the build just means it's not expressible as build rules. In particular, lock generation allows you to mutate the sources in your project directory. So we think it's best to make its own stateful operation.

Opam's lock files share some of our goals - we both try to provide an ability to save a build plan computed at a particular point in time and share it. Our focus is to combine this with a project local workflow though. The level of reproducibility of the build is going to be determined by the reproducibility of the build rules themselves - as in the rest of dune.

@code-ghalib
Copy link

Hello, will this be an opt-in feature, or have an opt-out facility for those that wish to keep package management and build separate?

@rgrinberg
Copy link
Member

rgrinberg commented May 19, 2023

If you don't have a lock file in your workspace, everything will work exactly the same as before.

If you have a lock file, you will need to opt out explicitly.

@Lupus
Copy link

Lupus commented May 21, 2023

Great proposal, many thanks to looking into this aspect of the ecosystem!
Some questions/suggestions are below.

Modern git-based package workflow

Are there any plans to improve on the workflows without opam repository? For example cargo allows one to easily add dependency on some other crate by specifying git repo url and optionally some ref:

[dependencies]
regex = { git = "https://github.com/rust-lang/regex.git" }

We currently maitain internal opam repo at work because it's not feasible to build such workflow on opam pins. Ideally we would drop our internal opam repo (and associated CI maintenance cost for publishing packages etc) and just use git urls with certain version constraints, nicely specified in dune-project file. Whole Golang package ecosystem is based on this approach and it seems to be working well.

The "v" version prefix problem

Some packages have "v" prefix in their versions on opam, some don't, and you need to keep tracking of this when specifying version constraints, otherwise they won't work and you'll learn it when your CI job fails. Would be great to address that in the new workflow.

Staged lock files

Modern CI pipelines are often using Dockerfile based flow to build a container image. Ours for example uses buildkit to create production images for our OCaml services. Buildkit does layer caching and skips computing lines in Dockerfile that it has already built previously. OCaml builds are quite slow with lots of dependencies (Jane Street libraries for example) and effective use of caching in Dockerfile is exceptionally important for fast CI turnarounds.

We started with opam lock files, early in the Dockerfile we copied foo.opam.locked and ran corresponding opam command to install the dependencies according to the lock file. It works fine, but as we use a lot of individual packages for various internal libraries, we tend to update those dependencies in the lock file often enough to greatly diminish buildkit layer cache benefits.

To address this problem (and the one described below, IIRC opam does not put test or doc dependencies in the lock files), we've created a special locking utility, that uses opam libraries to produce switch state exports. It collects all dependencies (and their transitive dependencies) in .opam files found in current dir (just like opam lock is doing) into a set S, performs switch export, and filters resulting state to only include packages in S before writing it to a file, called full.switch. Aside of that, it also reads some config file that lists "heavy" dependencies, those heavy dependencies (and their transitive dependencies) are collected into a set S', the same switch export is now filtered to include only packages from the set S', and the result is written into a file, called base.switch.

This design works great for our use case. base.switch includes all of Jane Street libraries, and other large dependencies that we rarely update, and Dockerfile is organized in such a way that a layer reconstructing base opam switch goes early, while the one importing rest of the packages from full.switch goes afterwards. Update of some of our libraries does not trigger rebuild of the large layer with most of the dependencies. A Dockerfile example to illustrate this design:

# copy base.switch and import it into current switch (actually creating it as there is no switch in `.` at this point
# copying full.switch file at this point will invalidate the layer caches, as they are hashed by all their inputs (including files being copied)
COPY --chown=opam:opam base.switch ./
RUN opam update && \
	opam switch import base.switch --switch .

# .......

# copy full.switch and import additional packages into current switch
# as long as base.switch is not changed from previous build, we get down here in no time
COPY --chown=opam:opam full.switch ./
RUN opam update && \
	opam switch import full.switch --switch .

# copy rest of the sources and build the project
COPY --chown=opam:opam . ./
RUN make fmt && make build

I believe this design is generic enough and can benefit other users of Dockerfile-based CI flows improving layer cache hit ratio. Would be great if it could be considered to be included as optional feature for proposed dune lock files.

Optional inclusion of :dev, :with-test, :with-doc dependencies to lock file

Reproducible builds in CI include running tests, generating documentation, etc, would be awesome to have a way to include all those packages and their specific versions to reconstruct the full environment in CI to the state developer's environment was when they committed their changes.

@nojb
Copy link
Collaborator

nojb commented May 21, 2023

Modern git-based package workflow

Just chiming in to mention that at LexiFi we also cobbled together a similar workflow. My feeling is that this kind of git-centric workflow is a natural solution in many industrial settings.

@Et7f3
Copy link
Contributor

Et7f3 commented May 21, 2023

If you are heavy user of docker then nix might be able to solve some of your issues with a generic solution https://grahamc.com/blog/nix-and-layered-docker-images

@nojb @Lupus

You can also mixed-layer: have alpine base image + binary built by nix in /nix/store. You can also opt out base layer and the have minimal docker by construction (only dependencies listed are bundled: some project over estimate dependency but this should be fixed upstream)

@Lupus
Copy link

Lupus commented May 22, 2023

If you are heavy user of docker then nix might be able to solve some of your issues with a generic solution

My team is already confused with 2 tools that they need to use (opam and dune) at times, and the desire is to reduce this number to only one tool (dune), this is going to be a UX improvement. Having yet another tool in the stack is not going to help, especially when it's not official, but 3rdparty with even some more tools required to translate opam packages to nix flakes.

Extracting the layers out of dependency graph based on popularity score is certainly nice, but having only 2 layers is sufficient to get most of the benefits, while the implementation and integration costs are still manageable. I don't want to migrate my whole Dockerfile into dune-project, I just want to separate dependencies into two stages so that they could be built (and thus cached) separately.

Alternatively dune could provide distributed caching solution to cache build artifacts, this way I can stop worrying about cache misses in my Dockerfile as dune will get the artifacts from it's own cache over the network. But this is a very complex problem to solve, requires infrastructure investment and is not guaranteed to be fast enough to be helpful.

@tmattio
Copy link
Collaborator Author

tmattio commented May 22, 2023

Hey @Lupus - thanks for the great feedback / questions!

Modern git-based package workflow

Yes, ultimately you will be able to configure the source of a dependancy in your dune-workspace.

It's important to note that if your package depends on a source that's not on opam, you won't be able to publish it though (similarly to how a package with pin-depends can't be published today).

The "v" version prefix problem

We won't be making changes to opam-repository. We aim to be compatible with the existing opam infrastructure and unless we discover limitations that will prevent having package management in Dune, any changes to opam-repository is out of scope.

More generally, package management in Dune doesn't mean merging opam and dune! In particular, the opam infrastructure (including opam-repository and ocaml-ci) needs to remain independent of Dune. As far as opam-repository is concerned, dune is just another opam package.

Not to say that your suggestion shouldn't be considered for opam-repository, only that it wouldn't be as part of this project.

Staged lock files

With dune lock files, you won't need to run the solver in the CI or when constructing docker images, so we expect builds to be a lot faster.

That being said, dune will rely on the Dune cache to optimise build cycles. One equivalent of what you're doing could be to create a dummy lockfile for all of Jane Street's dependencies, build that, then add your project's lock file and build your project.

Docker would be using the dune cache and wouldn't recompile the Jane Street dependencies if your project's lockfile changes.

Optional inclusion of :dev, :with-test, :with-doc dependencies to lock file

I don't think we settled on a design for optional dependencies yet, but it looks like a problem we'll have to find a solution for yes. Perhaps @rgrinberg has already thought about this?

@Lupus
Copy link

Lupus commented May 22, 2023

Modern git-based package workflow

Yes, ultimately you will be able to configure the source of a dependancy in your dune-workspace.

Sounds great! Will it automagically fetch available versions out of git repository tags? Will I be able to specify semver style constraints on versions for packages from git source? Will it be feasible to scale this approach to large dependency tree of internal packages?

It's important to note that if your package depends on a source that's not on opam, you won't be able to publish it though (similarly to how a package with pin-depends can't be published today).

Yeah, it's certainly understood and makes perfect sense.

The "v" version prefix problem

We won't be making changes to opam-repository. We aim to be compatible with the existing opam infrastructure and unless we discover limitations that will prevent having package management in Dune, any changes to opam-repository is out of scope.

To clarify - I wasn't suggesting to change opam repository, I was suggesting some heuristic in dune to hide this complexity from the end user. I want to just write versions without any "v" prefixes, and dune either gives me an error while generating lock file, or automatically figures out that certain packages include those "v" prefixes and adds them on my behalf when communicating with the opam layer.

More generally, package management in Dune doesn't mean merging opam and dune! In particular, the opam infrastructure (including opam-repository and ocaml-ci) needs to remain independent of Dune. As far as opam-repository is concerned, dune is just another opam package.

That's understood and is fine. But if users want to opt in, they could avoid having opam installed on their system altogether, right? Just use some insecure curl | sudo sh and you have dune, and then dune build to get things rolling after cloning a project.

Staged lock files

With dune lock files, you won't need to run the solver in the CI or when constructing docker images, so we expect builds to be a lot faster.

It's not solver that takes most of the time in our CI builds, it's mostly building of a nearly hundred of opam packages that takes a lot of time.

That being said, dune will rely on the Dune cache to optimise build cycles. One equivalent of what you're doing could be to create a dummy lockfile for all of Jane Street's dependencies, build that, then add your project's lock file and build your project.

Not sure how that should work, typically lock files are per repository. Will dune have per-package lock files? I need to create a dummy package that depends on most heavy dependencies used in other packages in my repo, and then I depend on this dummy package in other packages? Dockerfile would include building dummy package as early layer and then later I build the rest of my packages?

Docker would be using the dune cache and wouldn't recompile the Jane Street dependencies if your project's lockfile changes.

If you mean that _build folder will persist between Dockerfile layers, that is indeed so, and as long as I can copy some lock file at earlier layer and ask dune to build stuff in that lock file, and that lock file does not change, Docker cache would skip building those layers once again.

@rgrinberg
Copy link
Member

Just chiming in to mention that at LexiFi we also cobbled together a similar workflow. My feeling is that this kind of git-centric workflow is a natural solution in many industrial settings.

Yes, we are very conscious of this. Providing git repositories as package sources is going to be first class.

I don't think we settled on a design for optional dependencies yet, but it looks like a problem we'll have to find a solution for yes. Perhaps @rgrinberg has already thought about this?

Optional dependencies are going to eliminated by the solver stage. So there's no issue there. Flags such as with-test are going to be handled by providing options to the solver. For things like dev tools, I have some other ideas, but they're too ambitious for v1.

Sounds great! Will it automagically fetch available versions out of git repository tags? Will I be able to specify semver style constraints on versions for packages from git source? Will it be feasible to scale this approach to large dependency tree of internal packages?

All of this falls under opam-repository management and I'm afraid is wholly out of scope for the near future. I don't think any of this is a bad idea, it's just that I haven't spent any time thinking about it. Even if I had a fleshed out design, I wouldn't necessarily have the resources to implement it.

That's understood and is fine. But if users want to opt in, they could avoid having opam installed on their system altogether, right? Just use some insecure curl | sudo sh and you have dune, and then dune build to get things rolling after cloning a project.

Users will be able to avoid having opam installed in just about every use case. We'll be including the relevant parts of opam in dune to make that possible. However, (as Thibaut mentioned earlier), we will be relaxing the requirement to have opam-repository downloaded to build existing opam files.

It's not solver that takes most of the time in our CI builds, it's mostly building of a nearly hundred of opam packages that takes a lot of time.

As for your idea of "staged lock files", I don't think we'll have time to provide full 1st class support, but I think you'll be able to recreate your workflow. One might imagine the following steps:

  1. create a dummy package that depends on all your "heavy" dependencies (e.g. core, async)
  2. generate a lock file for this single package to determine the concrete versions for all of these heavy dependencies
  3. add a separate docker layer to "build" this package. This step is here to pre-populate the dune cache.
  4. when solving for the build plan for real packages, use the lock file generated in 2) as a set of constraints to give as input to the solver. This will make sure that you're re-using as much of the build plan as possible from your dummy package.

This is all hacky, and as you've mentioned, a distributed cache is the ideal solution. Thankfully there's already something working internally here at janestreet, so it's just a matter of open sourcing it. Let me see if I can heckle them to give us an ETA on this.

@rgrinberg
Copy link
Member

Reproducible builds in CI include running tests, generating documentation, etc, would be awesome to have a way to include all those packages and their specific versions to reconstruct the full environment in CI to the state developer's environment was when they committed their changes.

Let me just say that the system we're implementing will be flexible enough to accommodate all of these use cases. However, we haven't yet thought about conventions and best practices that will make this all standard across the ecosystem. For the initial versions, I imagine users would need to guide the lock file generation to enable/disable various variables and maintain different lock files for development and production that would reflect these differences.

@Lupus
Copy link

Lupus commented May 22, 2023

Yes, we are very conscious of this. Providing git repositories as package sources is going to be first class.

That sounds perfect!

Optional dependencies are going to eliminated by the solver stage. So there's no issue there. Flags such as with-test are going to be handled by providing options to the solver. For things like dev tools, I have some other ideas, but they're too ambitious for v1.

If optional dependencies are eliminated by the solver, does that imply that they won't end up in the lock file?

Let me just say that the system we're implementing will be flexible enough to accommodate all of these use cases. However, we haven't yet thought about conventions and best practices that will make this all standard across the ecosystem. For the initial versions, I imagine users would need to guide the lock file generation to enable/disable various variables and maintain different lock files for development and production that would reflect these differences.

This sounds more reassuring than the fact that optional dependencies will be eliminated 😊
We're absolutely fine to get our hands dirty with some tweaking of how lock files are generated as long as the amount of possible tweaks allows to capture the required package environment suitable to build, test and produce necessary artifacts (OCaml or others like html for docs). Having additional opam install commands for odoc and alcotest-lwt is quite disappointing, and IIRC that's what we had to do with opam lock before we reinvented the wheel with switch exports.

All of this falls under opam-repository management and I'm afraid is wholly out of scope for the near future. I don't think any of this is a bad idea, it's just that I haven't spent any time thinking about it. Even if I had a fleshed out design, I wouldn't necessarily have the resources to implement it.

No problems at all, just raising the pain point that we have. It's not a blocker, it's just very annoying when it happens.
Probably if dune updates the constraints in dune-project in semver compatible way when one requests additional package to be added as dependency, that will also solve the problem. I don't care much about what version of core I get in a new project, I just want that version to be added to constraints, for example for v0.15.1 I would expect (and (>= v0.15.1) (< v0.16.0)) to be written into dune-project automatically , or (and (>= v0.15.0) (< v0.16.0)) as a wider default which I can narrow down myself. Dune can check which package version is currently used, and figure out if it has that "v" prefix in version name, and copy that over to the constraints in dune-project, doing that by hand is just quite depressing.

Users will be able to avoid having opam installed in just about every use case. We'll be including the relevant parts of opam in dune to make that possible. However, (as Thibaut mentioned earlier), we will be relaxing the requirement to have opam-repository downloaded to build existing opam files.

That's absolutely sane requirement, thanks for clarification!

As for your idea of "staged lock files", I don't think we'll have time to provide full 1st class support, but I think you'll be able to recreate your workflow. One might imagine the following steps:
<.....>
4. when solving for the build plan for real packages, use the lock file generated in 2) as a set of constraints to give as input to the solver. This will make sure that you're re-using as much of the build plan as possible from your dummy package.

Giving an input to the solver does not sound like everyday activity for ordinary dune user 😄 Exposing such low-level details through the command-line interface would probably complicate the maintenance.

This is all hacky, and as you've mentioned, a distributed cache is the ideal solution. Thankfully there's already something working internally here at janestreet, so it's just a matter of open sourcing it. Let me see if I can heckle them to give us an ETA on this.

Well, we have a solution that works for us right now, it's also hacky and needs to be updated to keep up with opam libraries (I hope no major API changes any time soon 🤞, 2.0 => 2.1 took some time to figure things out). We'll migrate to distributed cache provided by dune natively if it's available and works reliably and fast. Dune package management is unlikely to be released tomorrow, so would be nice to have distributed cache released publicly before dune package management hits the prime time 😃

@rgrinberg
Copy link
Member

If optional dependencies are eliminated by the solver, does that imply that they won't end up in the lock file?

Depends if they were selected or not. Perhaps I'm misunderstanding the question, but the dependencies present in a lock file are all required. Anything "optional" is something that must be resolved when generating the lock file.

@gasche
Copy link
Member

gasche commented Jul 6, 2023

Thanks for the very interesting RFC! I found it clear and pleasant to reason.

I have a question on the long-term vision for the opam client, which I asked on the Discuss thread.

While I am posting here: I found it interesting that the RFC includes the requirement for dune to be able to build non-dune packages, but my intuition is that this is non-trivial and would deserve an RFC of its own. In particular: (1) how would one specify the dependencies of those external builds (by adding extra dune-specific files? would dune infer them implicitly, how?), and (2) how would the external build communicate to Dune how to "install" files, is the plan to use .install files for example? (Would Dune know how to map .install files to its own internal expectations in terms of artifact layout?)

@rgrinberg
Copy link
Member

Thanks for the interest Gasche.

The dependencies will be derived from the opam files. There will be a translation stage from the opam files to dune's own lock file format.

Yes, dune will read .install files. Dune can already generate them from its own internal data structures, so this will be just going the other way.

@rgrinberg
Copy link
Member

I suppose I didn't answer your suggestion to use an RFC. We could do so, but bear in mind the following:

  1. The implementation is heavily constrained by the requirement to stay compatible with opam. We must be able to build essentially all existing packages unmodified. So there's not much room for creativity here.

  2. Most of the work is already implemented. We're just mostly implementing the long tail of features that are used only by a minority of opam packages.

@gasche
Copy link
Member

gasche commented Jul 6, 2023

Yeah, I wasn't trying to suggest that you should write an RFC (you do you), but rather than this part was less clear than the rest and could deserve more explanations, that could go in a separate place because it is mostly independent from the rest.

@jonahbeckford
Copy link
Collaborator

At the moment I don't know how this will work for native Windows, but that may be just indicative that there is a lot of work to be done.

Currently with the DkML distribution I have to:

  1. Inject opam global variables to specify the location of MSYS2 (there are often many MSYS2 installations on a machine). Stepping back, I am using MSYS2 because many many opam packages require a POSIX filesystem and/or a sh or bash interpreter. And some packages require Unix utilities like sed, awk, pkgconf, etc. It sounds like Dune packages need a cached MSYS2 or Cygwin environment, but I'm curious what the thinking is
  2. Use wrappers on each opam build to:
  • give a MSVC environment (the PATH to the cl.exe and link.exe MSVC compiler and linker, and INCLUDE and LIB variables for the Windows SDK) to each opam package
  • give a UNIX environment (the PATH to the previously mentioned sh, sed, etc., and also environment variables that MSYS2 needs to operate)
  1. Install a C compiler (Visual Studio Build Tools) and a custom MSYS2 installation (https://gitlab.com/diskuv-ocaml/distributions/msys2-dkml-base) or none of the above works.
  2. Use a custom opam repository for several packages that still don't work with MSVC.

Thoughts?

And does "Windows" mean Cygwin/GCC? (I hope we aren't hardcoding a non-standard compiler for Windows, so maybe %LOCALAPPDATA%/dune/config can be used so people who need real Windows compatibility don't have to abandon MSVC)

(Please be patient if I take a very long time for me to respond. Will do likewise)

@rgrinberg
Copy link
Member

All of those things you're doing seem like very reasonable things to do on Windows. However, I'm not sure why any of them would need any special support from dune. Shouldn't it be possible to provide the correct environment by adding appropriate opam packages or forking opam-repository?

@jonahbeckford
Copy link
Collaborator

jonahbeckford commented Aug 21, 2023 via email

@leostera
Copy link
Contributor

leostera commented Aug 26, 2023

@rgrinberg i'm a little late to the party but to chime in from the Rust/Cargo side, usually when you work with workspaces (eg. have multiple crates/packages) you call: cargo add {dep}[@version] -p {pkg} – which makes sure that the right subpackage gets the dependency added.

However, one problem this has is that now you have dependency versions all over the place. Cargo introduced a shorthand for workspace-wide dependencies that you can use:

# file: project/my-crate/Cargo.toml
[dependencies]
my_dep.workspace = true
my_dep = { workspace = true, ...other-overrides }

# file: project/Cargo.toml
[workspace]
members = [ "my-crate" ]

[workspace.dependencies]
my_dep = { version = "1.5.0", features = ["..."] }

And I think this is what makes the most sense given that currently package versions are defined at the dune-project level. Having a single list of dependencies with versions there would mean you just have to add the library to your dune file where you actually use it. So the dune-project (or dune-workspace) would look like:

(dependencies
  (ocaml (>= "4.12.0")
  (dune (>= "3.5"))
  (ppx_deriving (> "5.2.1"))

(package
  (name serde)
  (depends ocaml dune))

(package
  (name serde_derive)
  (depends ocaml dune serde ppx_deriving))

I'm also inclined to split the list of deps at least between dev-time and core deps, as is done in cargo. And potentially also between dev-time and build-steps (eg. deps required for building, like ppxs, but not for anything else). This may be too much :)

(dependencies
  (ppx_deriving (>= "5.2.1"))

(dev-dependencies
  (dune (>= "3.5."))
  (bechamel (>= "4.12.0"))
  (ocamlformat (>= "..."))
   ...)

@rgrinberg
Copy link
Member

Thanks for the suggestion @leostera. It's general enough that I don't think it needs to wait for package management to arrive. If you'd like to realize it, the next step would be to make a separate issue with the official proposal.

My personal opinion is that changes like that are generally good when they address issues for large groups of users. Otherwise, I prefer to err on the side of less ways of doing the same thing. I'd like to see some real world OCaml projects where the boilerplate can get repetitive for specifying dependencies.

@rgrinberg
Copy link
Member

Thanks for all the discussion everyone. The issue has been accepted and is being actively worked on. If there are any feature requests, don't hesitate to start new issues to discuss them.

@yawaramin
Copy link
Contributor

The RFC says that the lockfiles will have version info and build instructions–I'm not clear though on how much detail will be captured. Will the dependencies (including transitives) be hashed like in, say, npm lockfiles?

With lockfiles, will dependency upgrades always be initiated by the developer? I.e. if a dependency has a range (and (>= 1.2.0) (< 2.0.0)) and the lockfile currently says it's at 1.2.0, will dune always get exactly 1.2.0 unless the developer changes the range to minimum version e.g. 1.2.1?

Is there a plan to tackle the possibility of dependency projects moving from say GitHub to BitBucket? I.e. a lockfile says to download a dependency from GitHub, but that project moved so the GitHub link is a 404.

@rgrinberg
Copy link
Member

I'm not clear though on how much detail will be captured.

The information captured will be:

  1. A hash of the source of the package
  2. The list of package dependencies
  3. The build instructions as they appear in the opam file

In short, we'll have the guarantee that two different lock files will have the same build plan. But it may not be reproducible (because system dependencies are not captured).

With lockfiles, will dependency upgrades always be initiated by the developer? I.e. if a dependency has a range (and (>= 1.2.0) (< 2.0.0)) and the lockfile currently says it's at 1.2.0, will dune always get exactly 1.2.0 unless the developer changes the range to minimum version e.g. 1.2.1?

Yes. To upgrade dependencies, the lock file needs to be regenerated.

Is there a plan to tackle the possibility of dependency projects moving from say GitHub to BitBucket? I.e. a lockfile says to download a dependency from GitHub, but that project moved so the GitHub link is a 404.

There's no plan to facilitate this, but there's also no specific dependence on github. Since we capture the hashes of all sources, such a tool can be developed externally. I.e. the tool can rewrite the lock directory in a way that will change all the URL's but preserve the source signatures.

@yawaramin
Copy link
Contributor

One more question--will we be able to specify dependencies to download directly from GitHub eg

(depends
  (github.com/yawaramin/dream-html (>= 2.1.0)))

The current pin-depends mechanism adds a layer of complexity, it would be very good if we could get rid of that. Especially because it would open up the door to easy decentralization of package publishing and take pressure off the opam repository.

@rgrinberg
Copy link
Member

We'll have something, but it won't be exactly as per your suggestion. It would be something like this:

(package_source
 (url "git+https://github.com/yawaramin/dream-html#ref") ;; commit, branch, tags are all accepted
 (package
  (version 2.5.0) ;; set to dev if unset
  (name dream-html))

(package
 ..
 (depends
  (dream-html (>= 2.1.0))) ;; the constraint is ignored when a source is set

So it is going to similar to pin-depends in the sense that it "fixes" the version. However, it is recursive.

Your suggestion seems fine as well though. If you'd like to sketch it, you can create another ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests