Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local cache of binary packages #629

Closed
UnixJunkie opened this issue May 30, 2013 · 26 comments
Closed

local cache of binary packages #629

UnixJunkie opened this issue May 30, 2013 · 26 comments

Comments

@UnixJunkie
Copy link
Contributor

Hello,

It would be nice if I build a package, then uninstall it,
then reinstall it, that the already built one is re-used
instead of recompiling everything again.

Thanks,
F.

@samoht
Copy link
Member

samoht commented May 30, 2013

Adding the rest of the discussion that happened on the caml-list.

Chet Murthy wrote:

OK. A little more. OPAM is already a tremendous improvement. But to
really make it possible to build -systems- in Ocaml, you have to be
able to distribute collections of programs, config, and libraries,
across multiple (admittedly identical) machines. And distribute
updates to same. OPAM is in some ways like BSD ports -- it works
great for maintaining single machines from source-code.

But what's needed is a way to maintain -many- machines, and to
distribute updates in a granular manner that be -managed- -- rolled
forward, rolled back, with full knowledge of which versions of which
packages are being installed. And everything with -zero- version
skew. So any nondeterminism happened at buiild-time -- by
deploy-time, all machines are getting identical files with identical
timestamps.

It's a tall order, b/c OPAM will need to figure out how to capture
enough of the environment (in order to check it on the target machine
where a binary is installed) to verify whether it's safe to install
that binary. But boy would it be nice.

And as a bonus, we could wrapper opam in the debian apparatus (I
think) and get a really nice way to turn opam packages into debian
packages.

Malcolm Matalka wrote:

I think out would be wrong for opam to try to solve this problem. There
are already many tools available for deploying (Ansible, Puppet, Chef,
Fabric, Capistrano). Such a later can be build on top of opam of need be.

Chet Murthy wrote:

I think this is incorrect. Let me explain.

(1) when we look at deploying complex collections of code/libs/data
onto multiple machines, usually we assume that the code has already
been built.

(2) but let's first dispatch the case where the code has -not- been
built. In such a case, I presume you're proposing that the code be
built on each machine, yes?

(a) this drastically increases the CPU required to perform upgrades
and deploys

(b) but far, far, far more importantly, it means that on each
machine, a nontrivially complex script runs that builds the actual
installed binaries. If that script contains -any- nondeterminism or
environmental sensitivity, it could produce different results on
different machines. The technical term is "version skew".

In scale-out systems, this sort of "skew" is absolutely fatal, because
it means that machines/nodes are not a priori interchangeable. And
all of fast-fail fault-tolerance depends on nodes being
interchangeable.

(3) But let's say that what you really mean is that we should use
tools like puppet/chef/capistrano to copy collections of
binaries/libs/data to target machines and install them. These
scripts/recipes are written by some person. You could have equally
well suggested that that person build Debian packages (or RPMs) of
each OPAM package, writing out all the descriptions and manifests.

And manually specifying all dependencies and requiremeents.

Either way, that person is doing a job that OPAM already does a lot
of, and does quite well. Gosh, wouldn't it be nice if OPAM could
generate those RPMs? Well, it's a little more complicated than that,
but really, not much more. The complexity comes in that you -might-
(I'm not saying I have this part figured out yet) want ways to
-generalize- (say) the camlp5 package so that it could be installed on
many different base OPAM installations.

But setting aside that nice-to-have, imagine that OPAM knew how to
generate RPMs from each package it installed, and from the ocaml+opam
base itself. You combine those, and you can:

(i) install ocaml, opam, and a bunch of packages

(ii) push a button, and out come a pile of RPMs, along with
dependencies amongst them (and hopefully on the relevant
environmental RPMs (e.g., libpcre-dev for pcre-ocaml, etc) so that
you can just stuff those RPMs into a YUM repo, go to a second box,
and say

"yum install opam ocaml pcre-ocaml"

and get everything slurped down and installed, just as if OPAM had
installed it all, package-by-package.

-P.S. And this doesn't even get into the unsuitability of chef/puppet
for managing software package installation. There's a reason that no
distro uses such schemes to install the large and complex sets of
packages needed to run amodern Linux box. And why there is no Linux
version of Microsorft's "DLL Hell". Linux distros by and large (and
esp Debian and Ubuntu) have worked hard to make package installation
foolproof -- and chef/puppet etc are anything but.

@UnixJunkie
Copy link
Contributor Author

fpm looks extremely interesting:
https://github.com/jordansissel/fpm

I think OPAM doesn't need to implement everything.
If it uses the right tool already out there to create
packages, then so be it.

The thread had distinct part:

  • I (the impatient unix user) want a cache of binary packages
  • some (the sysadmins) also want to have real packages generated from the OPAM ones: deb, rpm.
    That would also fit my binary cash of packages request in fact
  • some (the lawyers) want to separate cache from config from permanent data and follow a specification

Regards,
F.

@UnixJunkie
Copy link
Contributor Author

Hello,

Can someone provide hints on how to plug fpm into OPAM?
For example, what OPAM source files to look into, which step of the build of a package to hook to, etc.

Let's say this will be an experimental feature, just try and play with it.

I'd like to give a try at it, but no idea on when I'll have time for this in fact.

Regards,
F.

@UnixJunkie
Copy link
Contributor Author

Funnily, on a machine with low RAM (1Gb) and no swap, the compiler cannot be compiled.
I guess it may also be true for several packages included in OPAM.

How is the front of supporting binary packages repositories in OPAM?

@AltGr
Copy link
Member

AltGr commented Dec 15, 2017

There is a prototype of this using hooks, see https://github.com/ocaml/opam/blob/master/shell/opam-bin-cache.sh

@Khady
Copy link
Contributor

Khady commented Aug 22, 2018

Is it possible to combine those cache hooks with the sandbox hooks?

@Khady
Copy link
Contributor

Khady commented Aug 22, 2018

A quick comment to talk about esy too. Don't know if it's the good place. You probably know about it. They have a cache system too which seems to be pretty efficient. But rather than to copy the directories from the cache to the "switch", they put in the environment the paths to all the packages that are required by the switch. I think it also avoid some problems like relocating ocaml (esy does something in addition to make ocaml relocatable from one computer to another, but it doesn't seem as important as a good local cache system to me). I thought it worth mentioning it.

@AltGr
Copy link
Member

AltGr commented Aug 22, 2018

Is it possible to combine those cache hooks with the sandbox hooks?

yes, of course

rather than to copy the directories from the cache to the "switch", they put in the environment the paths to all the packages that are required by the switch.

this is where there is quite a bit of difference: IIUC, in esy, every package is installed to its own subtree, which is a nice property that allows easy mix & matching of already compiled packages. It requires quite a bit of cooperation from the underlying systems, though, in this case — correct me if I am wrong — significant ocamlfind hacks.

This is a mode that we would definitely be interested in supporting in opam, but it also remains an important part of the project philosophy to be agnostic and pragmatic on what the packages do (hence the simple shell commands for build: instructions, for example).

@Drup
Copy link
Contributor

Drup commented Aug 22, 2018

@AltGr I would encourage you to publish and promote this caching hook to get feedback. It's a very nice feature and even if it doesn't work perfectly just yet, I'm sure lot's of people would be interested. Enabling it would allow you to get feedback quickly.

@Khady
Copy link
Contributor

Khady commented Aug 22, 2018 via email

@AltGr
Copy link
Member

AltGr commented Aug 22, 2018

I have tested it for a while, and while it works correctly at first, once you e.g. remove the original switch the cache was made from, in my experience it doesn't behave well...
I tried some workarounds, e.g. forcing some env variables, but that added more problems.

Of course, there might be progress in ocamlfind configuration since then, that makes this more reliable?

You can have more detail at ocaml/opam-repository#10863

@AltGr
Copy link
Member

AltGr commented Aug 22, 2018

@Khady thanks for sharing your conf; I could publish a small script to easily enable/disable it, for those interested in testing ?

@Drup
Copy link
Contributor

Drup commented Aug 22, 2018

@AltGr please do, this is a really good feature and I think it's worth advertising it, even if it's not completely finished.

@ELLIOTTCABLE
Copy link

I've given this a stab, and it looks like it needs a small tweak for macOS. It's currently causing the failure of any package that's already been installed elsewhere:

#=== ERROR while installing astring.0.8.3 =====================================#
# context     2.0.0 | macos/x86_64 | ocaml-base-compiler.4.07.0 | https://opam.ocaml.org/2.0#0bce4f9a
# path        ~/.opam/default/.opam-switch/build/astring.0.8.3
# command     ~/.opam/opam-init/hooks/opam-bin-cache.sh restore 021592f72a3781d5db0a804a656335022129f66149419979d89f88e3b4460c83 astring
# exit-code   64
# env-file    ~/.opam/log/astring-41810-f19885.env
# output-file ~/.opam/log/astring-41810-f19885.out
### output ###
# [...]
# + shift
# + '[' -z 021592f72a3781d5db0a804a656335022129f66149419979d89f88e3b4460c83 ']'
# + CACHE_DIR=/Users/ec/.cache/opam-bin-cache/021592f72a3781d5db0a804a656335022129f66149419979d89f88e3b4460c83
# + case $COMMAND in
# + NAME=astring
# + shift
# + '[' -d /Users/ec/.cache/opam-bin-cache/021592f72a3781d5db0a804a656335022129f66149419979d89f88e3b4460c83 ']'
# + rm -f astring.install
# + cp -aT /Users/ec/.cache/opam-bin-cache/021592f72a3781d5db0a804a656335022129f66149419979d89f88e3b4460c83/ /Users/ec/.opam/default/
# cp: illegal option -- T
# usage: cp [-R [-H | -L | -P]] [-fi | -n] [-apvXc] source_file target_file
#        cp [-R [-H | -L | -P]] [-fi | -n] [-apvXc] source_file ... target_directory

macOS nor BSD cp have a -T flag (though it looks like the coreutils cp does?)

@Khady
Copy link
Contributor

Khady commented Aug 27, 2018

I created a 300 USD bounty for this issue. To be clear, I consider that solving it also requires solving ocaml/opam-repository#10863. Actually most of the work is probably on ocaml/opam-repository#10863 as opam-bin-cache.sh already works pretty well. I hope it can help to attract contributions.
https://www.bountysource.com/issues/1250468-local-cache-of-binary-packages

@UnixJunkie
Copy link
Contributor Author

@Khady I'm curious, is this out of your pocket or is this your company/employer?

@Khady
Copy link
Contributor

Khady commented Aug 28, 2018 via email

@rjbou rjbou removed this from the Next milestone May 20, 2020
@samoht
Copy link
Member

samoht commented Jun 18, 2020

This has been added directly in dune, see https://dune.readthedocs.io/en/stable/caching.html

@samoht samoht closed this as completed Jun 18, 2020
@dbuenzli
Copy link
Contributor

dbuenzli commented Jun 18, 2020

Isn't opam supposed to be a general purpose development package manager ? I use it to integrate C packages in projects.

@avsm
Copy link
Member

avsm commented Jun 18, 2020

From the front page of opam.ocaml.org since time immemorial: "opam is a source-based package manager for OCaml <...>" (emphasis mine). While we provide hooks for binary caching and reproducibility, it is very unlikely that the opam 2.x series will ever provide binary packages directly as a supported feature. That is down to the creator of the packages to architect, as it is deeply tied into the build mechanism of the package.

So, you are welcome to opam package C packages up, as are Coq developers for their tools, and as I do for running various shell scripts, but they all need to sort out their own caching mechanisms using the hooks. Meanwhile, for the common case of compiling OCaml code in the ocaml/opam-repository, dune provides a caching mechanism that works at the level of build rules, and across opam packages. Other OCaml build systems can also provide their own caching; for example distcc-based for Makefiles should work fine.

We can revisit this decision for the opam 3.x series, but not unless some compelling new facts or approaches come to light that address @AltGr's comments above.

@dbuenzli
Copy link
Contributor

From the front page of opam.ocaml.org since time immemorial: "opam is a source-based package manager for OCaml <...>" (emphasis mine).

Since quite some time now the trend has been to make opam more and more general and agnostic to OCaml. Thanks for clarifying that this no longer seems to be the case.

That is down to the creator of the packages to architect, as it is deeply tied into the build mechanism of the package.

Systems like esy or nix rather prove that this hasn't to be the case.

@Khady
Copy link
Contributor

Khady commented Jun 18, 2020

At this stage of opam's life, to me the solution would be to extract what esy/nix does and use it as an opam hook. It might require to change opam slightly so that it stores switches in a path of fixed length (switch foo would be in ~/.opam/__________foo for example). But it doesn't seem to be a big change for opam. And it should help support cache for any language.

@avsm
Copy link
Member

avsm commented Jun 18, 2020

@dbuenzli:

Since quite some time now the trend has been to make opam more and more general and agnostic to OCaml. Thanks for clarifying that this no longer seems to be the case.

Those are your words; not what I said above at all, and an inaccurate characterisation. I specifically said that the opam 2.x series will not provide binary caching of package descriptions.

Systems like esy or nix rather prove that this hasn't to be the case.

I'm aware of that, which is why opam fully supports the interpretation of opam files into being used by these systems, such as via opam2nix and domain specific solvers. The opam 2.x client will not have this support in its core due to its focus on being a source-based and OCaml agnostic engine.

@Khady wrote:

At this stage of opam's life, to me the solution would be to extract what esy/nix does and use it as an opam hook.

@Khady feel free to extract out such logic using the opam hooks, and publish it as an opam plugin. We'd be glad to have that in the opam repository for users to try out as an option. We won't be adding any fixed-length switch hacks to the core client as the issue of relocation has been discussed at the OCaml developers meeting and we'd prefer to use the upstream solution in the opam-repository. However, nothing stops you from imposing a switch-length limit in an opam plugin, in order to activate binary caching for those switches.

@Khady
Copy link
Contributor

Khady commented Jun 18, 2020

@Khady feel free to extract out such logic using the opam hooks, and publish it as an opam plugin.

That is my plan. Currently on hold because esy authors are rewriting the part which needs to be extracted. If anyone wants to work on that I could create another bounty.

@rjbou
Copy link
Collaborator

rjbou commented Jun 18, 2020

@dbuenzli opam remains agnostic. Every feature we add, we take care to keep that. All OCaml or dune specific behaviors are not hardcoded but configurable.

@Khady We'd be happy to help you, or the person who wants to implement such plugin, to go through the opam lib.

@UnixJunkie
Copy link
Contributor Author

Having dune do the caching is unsatisfactory.
Not all packages use dune and I don't see why it should be the case.
For cluster users, a cache of binary packages would be very useful.
The fact that some people are translating opam packages to nix is telling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants