Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch export and import: include extra_files in embedded opams #4040

Open
wants to merge 4 commits into
base: master
from

Conversation

@hannesm
Copy link
Member

hannesm commented Dec 9, 2019

Why I'm doing this? I worked on reproducible builds and tooling thereof, where it is very convenient to use opam export and import for rebuilding the same artifact. The current situation that an export contains only a subset of packages (and thus an import may be incomplete or using package metadata from elsewhere) does not fit well into the reproducible builds concept.

This PR embeds https://github.com/mirage/ocaml-base64 into opam, I'm happy to remove this and use a dependency for that (but I didn't know how to add depedencies for opam) -- or use a hex encoding if prefered.

//cc @rjbou

if extra_files is used, embed the files as base64 encoded data in the opam
file (key x-hash, value the base64 encoded data)

on import, lookup x-hash in opam file, base64 decode, and validate the hash

OpamSwitchCommand.export: optionally pass switch
@hannesm hannesm mentioned this pull request Dec 9, 2019
3 of 9 tasks complete
@hannesm hannesm force-pushed the hannesm:full-export-import branch from cfaae0d to 4078ad8 Dec 9, 2019
Copy link
Collaborator

rjbou left a comment

Thanks for the PR!
Indeed, keep extra files files permits to be sure to keep consistency between opam file & rest of the package.

On base64, opam-solver has in its dependencies extlib, so an base64 lib is already available. You need to add the dependency for opam-client in the dune & opam file.
To be more concise in the export file, I tend more to base64 encoding.

OpamHash.check_file (OpamFilename.to_string file) hash then
let value = OpamFilename.read file in
let value' = B64.encode_string value in
let name = "x-" ^ OpamHash.contents hash in

This comment has been minimized.

Copy link
@rjbou

rjbou Dec 9, 2019

Collaborator

Store the file in an extension field is a good idea, but using only the hash as field name makes the file hardly human readable. Maybe have as prefix x-extrafiles or even having a single field containing pairs (which are 2 elements list) of (filename, content)

This comment has been minimized.

Copy link
@hannesm

hannesm Dec 9, 2019

Author Member

x-extra-file-HASH is it now, I'd prefer not to use a single field containing pairs, since that means I'd need to write more code. I can't use x-filename since filename is not necessarily an identifier.

src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
@rjbou

This comment has been minimized.

Copy link
Collaborator

rjbou commented Dec 9, 2019

Also, I would change the title to "Switch export and import: include extra_files in embedded opams" or something like that. To include all installed opam file, there is the --full option, and in the PR, only embedded opam file have their extra file included.

@hannesm hannesm changed the title export and import: include all packages, even those with extra_files. Switch export and import: include extra_files in embedded opams Dec 9, 2019
@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 9, 2019

thanks for the review, and the suggestion for the title. I managed (let's see what CI thinks) to use the extlib base64, and renamed the fields.

@AltGr

This comment has been minimized.

Copy link
Member

AltGr commented Dec 11, 2019

Thanks... indeed that is something that was missing, but I couldn't find a satisfying way to do it.

I must say that I am not really convinced by the choice of x-* fields with varying names, which goes somewhat against the existing conventions for opam files. Ideally, the feature could also be used to embed extra files in opam files even for normal uses... which it seems this PR would allow, but in an undocumented and inconvenient way.

Here are a few options I could think of (this is to open discussions, I am not sure about the best design for this...):

  • limit the extra-files embedding to export/import:

    • in this case, I think the extra file content may be better outside of the package "foo" {} section, to preserve the format ; but then it seems tedious to recover the file at the right moment for non-pinned packages
    • an idea could be to just have a field with a hash → contents mapping, that gets unpacked to a temporary cache when the import is done. Then, similar to what is done in the proposition (and for extra_sources:), check there to find the file in prepare_package_source
  • extend the opam file format:

    • the files could be in an extra section or field, to not disrupt file formatting too much
    • to be more usable, it would be best if embedded text files could be put as raw text (we do already have the functions for safe escaping with """)
    • an idea closer to what we have would be to simply overload the checksum field: besides "md5=...", "sha256=..." etc., we could just add "raw=..." (or more reasonably """raw=...""") with the raw (properly escaped) file contents, and "base64=...". This seems the least disruptive, but I am not sure about the readability of the resulting files.

What do you think ?

@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 11, 2019

I'd first strive for a solution to limit the extra-files embedding for import and export. I'm keen to get to a state where an opam export produces a file that can opam import without any other sources of information (fine with me to download hashed tarballs or git commit ids). It is the minimal invasive solution I could come up with, that solves my issue.

I initially went for a string (no base64) encoding, but that didn't work (IIRC \" was mistranslated, the outputter already does some escaping), and thus decided to use base64 encoded strings instead.

limit the extra-files embedding to export/import:
in this case, I think the extra file content may be better outside of the package "foo" {} section, to preserve the format ; but then it seems tedious to recover the file at the right moment for non-pinned packages
an idea could be to just have a field with a hash → contents mapping, that gets unpacked to a temporary cache when the import is done. Then, similar to what is done in the proposition (and for extra_sources:), check there to find the file in prepare_package_source

that sounds good to me. Unfortunately I don't know the opam codebase well enough to implement this. would you be up for helping out? :)

@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 17, 2019

To further add to the discussion, I added preliminary functionality which rewrites git urls by specific commits. This is crucial for reproducibility.

It makes me wonder where/how this "import/export" is used at the moment, or rather whether the changes I want for reproducibility should be hidden behind optional arguments or maybe even a different code path.

Another change (this time in orb) cleans up the environment before proceeding further. Opam already at initialisation time captures the environment, that's why I needed to do it before the opam library is loaded somehow. It feels a bit clunky. I'm interested what you think about making the minimizing the environment for opam a default? I.e. is there a reason to not do that? only having HOME and PATH may be a bit too radical.

@rjbou

This comment has been minimized.

Copy link
Collaborator

rjbou commented Jan 9, 2020

For the last solution, there are several steps:

  • add an extra-files field to switch export file
  • add the directory where they will be stored in path
  • in OpamSwitchCommand.export, fill the new fields with a scanning of the opam files to save
  • in OpamSwitchCommand.import_t, write them (filename = hash ; content = content of the file) in the given directory
  • in OpamAction.prepare_source, check & take if the file exists in the extra-files directory
  • on opam clean empty extra-files directory

You can pick the two first steps in my repo, branch exportXfiles.

@rjbou rjbou added the PR: WIP label Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.