Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wishlist for Opam Lock #5857

Open
rbardou opened this issue Feb 27, 2024 · 5 comments
Open

Wishlist for Opam Lock #5857

rbardou opened this issue Feb 27, 2024 · 5 comments
Labels
AREA: DESIGN AREA: DOCUMENTATION KIND: FEATURE WISH KIND: PLUGIN CANDIDATE Feature requests that might best be implemented as plugins to opam

Comments

@rbardou
Copy link

rbardou commented Feb 27, 2024

We are trying to use opam lock files in Octez, an implementation of the Tezos blockchain. We find that lock files do not completely solve our use case. We thought of sharing why here, in case there are options that we did not consider, and in case our use case is of interest to the authors of opam for potential future improvements.

Our Requirements

1) Reproducibility of Building Octez

The opam lock file would ensure that developers, users and our CI build Octez using the exact same dependencies (for a given platform). This means:

  • same set of dependencies for everyone;
  • with the same version numbers;
  • and more precisely the same hash for tarballs.

2) Reproducibility of Generating the Lock File

Developers should be able to run a script that generates the lock file, and the same lock file should be generated for all developers (for a given platform). This means:

  • that their switches should have no impact on the solution chosen by the solver, in particular their local switch (_opam);
  • that one should be able to specify the set of possible dependencies to consider, e.g. by giving the hash of a commit of ocaml/opam-repository.

3) Flexibility for Developers

While we do want to fix the version of Octez dependencies, we also want to let developers install development tools such as utop and merlin, without having those in the lock file since they are not needed by all users nor by our CI.

Ideally

Ideally, given a set of dependencies DEPS with some version constraints, a set of development tools DEV-DEPS, and a given HASH of the public opam repository, we would be able to ask opam to:

Ignore everything I have installed. Find a solution from HASH that contains DEPS and that is compatible with DEV-DEPS, and store the solution in LOCK.

At the same time, or as a follow-up step, we would be able to ask opam to:

Enrich LOCK to make sure that it contains enough information to check that tarball hashes are the ones that I would obtain if I installed now.

Finally, we would be able to ask opam to:

Install LOCK (optionally: and DEV-DEPS), ensuring that tarball hashes are as expected for packages in LOCK.

Our Current Solution

To Generate the Lock File

DEPS is an opam file that depends on our dependencies, with some version constraints. DEV-DEPS is an opam file that depends on some development tools like utop and merlin.

We do the following:

  • clone ocaml/opam-repository, checkout a given HASH;
  • run opam admin filter --remove --has-flag avoid-version;
  • run opam repository with set-url or add to add this clone, and remove the default repository;
  • run opam install --yes --deps-only --criteria=-notuptodate,+removed DEPS DEV-DEPS;
  • run opam remove utop;
  • run opam lock DEPS;
  • restore the default repository.

This solution:

  • is complicated;
  • feels hackish;
  • takes a long time to run;
  • and does not work all the time.

Let's explain the different steps.

  • We clone ocaml/opam-repository so that:
    • we can ensure that a particular commit hash is used (this commit hash will then be used when installing the lock file, to ensure that tarball hashes have not changed since we generated the lock file);
    • we can run opam admin filter on it to remove some unwanted packages (namely those with flag avoid-version, which are otherwise sometimes selected because apparently using --criteria breaks version avoidance).
  • We replace the default repository because we want to ensure that only packages from the commit hash that we cloned are available.
    • This does not actually work 100% because apparently opam still sees packages from the local switch (_opam). It will sometimes select those packages, resulting in unwanted packages (with flag avoid-version) in the lock file.
    • We use --dont-select because we don't want to break other switches. (I think. The documentation of this flag is rather confusing.)
  • We run opam install on both DEPS and DEV-DEPS because even though we only want DEPS in the lock file, we want a solution that is compatible with DEV-DEPS.
    • We do not actually want to install dependencies (neither DEPS nor DEV-DEPS), we are merely trying to ask the solver to find a solution and write it down. This step can thus take much longer than necessary, as it requires compiling dependencies.
  • We run opam install with -notuptodate because our script is meant to upgrade all packages. It also gives a better chance at producing the same solution for everyone.
  • We run opam install with +removed because we want it to find a minimal solution.
    • This has the inconvenient side-effect of removing other dependencies that the developer may have installed for themselves, in particular if those dependencies are not in DEV-DEPS.
  • We run opam remove utop so that it does not appear in the lock file. utop is in DEV-DEPS but not in DEPS. It is however an optional dependency of pyml, which is in DEPS. If utop is installed when we run opam lock, it thus ends up in the lock file.
    • This solution is very brittle. If one day utop becomes a non-optional dependency (in DEPS), the lock file will be broken. If one day another package suffers from the same issue, we will have to update our script.
    • Also, this makes each invocation of the script install and remove utop, which is wasted time.

To Install the Lock File

To install the lock file, we:

  • add an opam repository with URL ocaml/opam-repository#hash and remove the default one;
  • run opam install --deps-only on the lock file (and, optionally, on DEV-DEPS);
  • restore the default repository.

Let's explain the different steps.

  • We add #hash to the opam repository URL to make sure that tarball hashes are the same as the one that were chosen when generating the lock file.
  • We still want to restore the default repository after that so that devs can install other libraries and tools that they may want locally.

Other Solutions we Looked At

Switch Exports

opam switch export --full is closer to what we need from a lock file because it contains hashes, but full exports have other issues:

  • they are much bigger and thus harder to read during reviews, and one cannot easily manually change a version number;
  • importing a switch would probably remove installed packages that are not in the export. I didn't actually test, but it is not specified in opam switch --help so I have to assume the worst if I don't want to rely on unspecified behavior.

Calling the Solver Manually

We are considering the following solution to generate our lock files but did not try it yet:

  • ask opam to generate a CUDF file;
  • filter this CUDF file to remove packages with avoid-version;
  • call a solver directly on this CUDF file to ask it to find a solution that contains DEPS and DEV-DEPS;
  • call the solver again on the resulting CUDF file to ask it to only install DEPS (and not DEV-DEPS);
  • parse the resulting CUDF file to convert it to a lock file.

Basically, this amounts to reimplementing parts of opam.

@rjbou rjbou added AREA: DOCUMENTATION KIND: PLUGIN CANDIDATE Feature requests that might best be implemented as plugins to opam labels Feb 27, 2024
@rjbou
Copy link
Collaborator

rjbou commented Feb 27, 2024

Thank you for your detailed explanations.

First, i want to explain the origin of opam lock. The idea is, when a developer is in the process of working on its project, there is a given switch state (packages at a version) for which the developer can attest that the project is compiling and working as expected. It is that state that is "exported" via opam lock: the currently installed packages in the switch that are required by the project.
On your case, you want to have a solver solution, given a repository, ignoring the current switch state. opam lock is not what you are looking for then.
It is the same for opam switch export. As it exports the full current state of the whole switch, not an subset, or a non switch aware solution, it is not what you are looking for.

In the current state of opam, there is no command that permit to generate directly the file you want. Your case is a good candidate for a plugin, that would generate that file ; using opam-lib, like that there is no need to reimplement what opam already does.
The file that needs to be generated is not a simple opam file (lock file is a simple opam file), as you need to have archive hashes or repo hash specified. If you want to use a predefined format, you'll better use switch export file format, it permit to contain all those information, and imported via opam switch import. But as you said, it is quite dense.

On the "normal" vs "development" dependencies, we introduced in opam 2.2 dev-setup variable, for that purpose. It is then possible to have only one opam file containing DEPS and DEV-DEPS, and use that variable to differentiate them in the plugin (use them for solving, but not for generating the file).

@kit-ty-kate
Copy link
Member

The idea is, when a developer is in the process of working on its project, there is a given switch state (packages at a version) for which the developer can attest that the project is compiling and working as expected. It is that state that is "exported" via opam lock: the currently installed packages in the switch that are required by the project.

The issue with this definition is that "the installed packages required by the project" doesn't mean much when the state of the repository is unknown. I would personally argue that having the state of the repositories should be required for lock files

@rbardou
Copy link
Author

rbardou commented Feb 28, 2024

Thank you @rjbou for your answer. I was not aware of dev-setup, which sounds indeed useful for us.

The idea is, when a developer is in the process of working on its project, there is a given switch state (packages at a version) for which the developer can attest that the project is compiling and working as expected.

I think this is actually our use case.

But as @kit-ty-kate says, if the lock file does not actually guarantee that, say, cohttp.1.2.3 is the same cohttp.1.2.3 as the one that we used (because its opam has been updated to refer to a tarball with a different hash), then we cannot actually "attest that the project is compiling and working as expected" with cohttp.1.2.3.

Another important point is: how to extend this workflow to the case where many developers are working on the same project, with strict review processes etc.? If all developers regularly update the lock file, the lock file will go back and forth between various versions of installed packages. For instance, let's assume that Alice has tezt.3.1.1, and Bob has tezt.4.0.0. Alice wants to upgrade some package; she updates the lock file, which now contains tezt.3.1.1. Bob wants to upgrade another package; he updates the lock file, which now contains tezt.4.0.0. Alice upgrades something else, the lock file now contains tezt.3.1.1. Etc. Now imagine that with 250 dependencies. Having a stable way to update the lock file becomes important.

For us, the "developer" that can attest is actually the CI. The goal is indeed to be able to attest that if you compile Octez with its lock file, you are guaranteed to obtain a version that is the one that has been successfully tested by the CI of Octez.


That being said, someone read this post and contacted me privately to suggest that --fake might help. And I think it does! With --fake, I can ask opam to work from a fresh switch, which means that opam will not try to use leftover packages (unless it can use some from other existing switches??), without having to lose half an hour installing all dependencies.

If you're curious, here is the updated version of the script that updates our lock files: https://gitlab.com/tezos/tezos/-/blob/8c0ded8f28cd56e7cf20d9094969dad80a2741e7/scripts/update_opam_lock.sh (edit: since then the script has been simplified; last version can be found in https://gitlab.com/tezos/tezos/-/blob/2dbdd532c15e46c40a22e5867642efb2f70152da/scripts/update_opam_lock.sh).

The script could be made simpler:

  • if using --criteria didn't cause opam to sometimes select packages with avoid-version, or if we could tell opam to ignore all packages with avoid-version (edit: since we now start from a fresh switch, we don't use --criteria, so this is no longer an issue — but we do need to tell opam not to use ocaml-system);
  • if we could ask opam not to select dev dependencies, just be compatible with them, which maybe with-dev-setup could help with (?).

But those two points are probably out of topic here.

@rbardou
Copy link
Author

rbardou commented Feb 28, 2024

Another minor point that came up: opam lock can put ocaml-system in the lock file, and I think that it is a bad idea in general, since the version of the system-wide install of OCaml will not be the same for everyone. In our script, we solved this by removing packages/ocaml-system from the clone of ocaml/opam-repository.

@kit-ty-kate
Copy link
Member

Related discussion happening in CycloneDX/cdxgen#793

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AREA: DESIGN AREA: DOCUMENTATION KIND: FEATURE WISH KIND: PLUGIN CANDIDATE Feature requests that might best be implemented as plugins to opam
Projects
None yet
Development

No branches or pull requests

3 participants