Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: reuse extracted sources #4056

Open
Armael opened this issue Jan 9, 2020 · 3 comments
Open

Feature request: reuse extracted sources #4056

Armael opened this issue Jan 9, 2020 · 3 comments

Comments

@Armael
Copy link
Member

Armael commented Jan 9, 2020

If I do "opam install X1...Xn", opam starts by downloading sources for packages X1..Xn, or fetching them from the download cache. Then, each archive (Xi.tar.gz) is extracted into <opamroot>/<switch>/.opam-switch/sources/<pkg.ver>.

Even if all the archives are in the download cache, extracting all the archives takes a lot of time if there are many packages.

If one kills the "opam install" invocation (with ^C), either during the "archive unpacking phase" or during the build, then restarting it will not reuse the already-extracted sources, and will instead start by unpacking again all the archives from scratch.

To get more incrementality, one (brittle) solution would be to use whatever files there are in a sources directory if it is present, instead of unpacking the archive. However, if opam is killed while unpacking an archive, then the corresponding sources directory will be in a corrupted state (not containing all the files most likely), and will corrupt the next run of opam.

A more robust solution would be to memorize a mapping <directory where sources are extracted> -> <hash of the archive they come from>. A new entry would be added to this mapping after an archive has been successfully extracted (and only after it has been fully extracted). Then, after having downloaded an archive, and before extracting it, one would be able to check whether the source directory already corresponds to that archive -- in which case one could simply avoid extracting the archive again.

Does that sound like a reasonable idea in principle, and would this "more robust" solution work?
If yes, where would this mapping be stored?

@rjbou
Copy link
Collaborator

rjbou commented Jan 27, 2020

related #3741

@rjbou rjbou added this to To do in Feature Wish Jun 26, 2020
@dra27
Copy link
Member

dra27 commented Jul 9, 2021

This feels like a complicated solution to a (relatively) small problem. If I understand correctly, the core idea here is that if opam was aborted during an installation, then there should be a series of extracted source directories which have been extracted, but not yet built and so could be reused by a subsequent install command.

How about this scheme:

  1. Package archive is extracted to .opam-switch/sources/<archive-hash>.tmp
  2. At the end of the extraction, .opam-switch/sources/<archive-hash>.tmp is renamed to .opam-sources/<archive-hash>
  3. Just before building, .opam-switch/sources/<archive-hash> is renamed to .opam-sources/<pkg.ver>, as before

The idea is that step 2 is atomic, which means before extracting the tarball (which opam knows the hash of) opam can check to see if .opam-switch/sources/<archive-hash> exists and skip extraction. .opam-switch/sources/<archive-hash>.tmp and .opam-switch/sources/<pkg.ver> will always be in an unknown state and so would be erased by any subsequent install invocation. In this scheme, opam clean -s would need updating to clean both <archive-hash> and <archive-hash>.tmp directories.

I think this scheme achieves the same thing but with two benefits: it's simpler (no hashing, no stored configuration) but it's also always used (i.e. it skips steps if there's an existing extracted directly rather than having to activate extra steps to check whether it can be used)

@Armael
Copy link
Member Author

Armael commented Jul 9, 2021

This sounds much better than what I was proposing indeed! I would be very happy with that solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Feature Wish
  
To do
Development

No branches or pull requests

3 participants