Skip to content

Include extra-files in switch export if full is provided, on import create an overlay directory with the file contents#4040

Merged
rjbou merged 18 commits into
ocaml:masterfrom
hannesm:full-export-import
Mar 5, 2020
Merged

Include extra-files in switch export if full is provided, on import create an overlay directory with the file contents#4040
rjbou merged 18 commits into
ocaml:masterfrom
hannesm:full-export-import

Conversation

@hannesm

@hannesm hannesm commented Dec 9, 2019

Copy link
Copy Markdown
Member

Why I'm doing this? I worked on reproducible builds and tooling thereof, where it is very convenient to use opam export and import for rebuilding the same artifact. The current situation that an export contains only a subset of packages (and thus an import may be incomplete or using package metadata from elsewhere) does not fit well into the reproducible builds concept.

This PR embeds https://github.com/mirage/ocaml-base64 into opam, I'm happy to remove this and use a dependency for that (but I didn't know how to add depedencies for opam) -- or use a hex encoding if prefered.

//cc @rjbou

@hannesm hannesm mentioned this pull request Dec 9, 2019
9 tasks
@hannesm hannesm force-pushed the full-export-import branch from cfaae0d to 4078ad8 Compare December 9, 2019 13:30

@rjbou rjbou left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
Indeed, keep extra files files permits to be sure to keep consistency between opam file & rest of the package.

On base64, opam-solver has in its dependencies extlib, so an base64 lib is already available. You need to add the dependency for opam-client in the dune & opam file.
To be more concise in the export file, I tend more to base64 encoding.

Comment thread src/client/opamSwitchCommand.ml Outdated
Comment thread src/client/opamSwitchCommand.ml Outdated
Comment thread src/client/opamSwitchCommand.ml Outdated
@rjbou

rjbou commented Dec 9, 2019

Copy link
Copy Markdown
Collaborator

Also, I would change the title to "Switch export and import: include extra_files in embedded opams" or something like that. To include all installed opam file, there is the --full option, and in the PR, only embedded opam file have their extra file included.

@hannesm hannesm changed the title export and import: include all packages, even those with extra_files. Switch export and import: include extra_files in embedded opams Dec 9, 2019
@hannesm

hannesm commented Dec 9, 2019

Copy link
Copy Markdown
Member Author

thanks for the review, and the suggestion for the title. I managed (let's see what CI thinks) to use the extlib base64, and renamed the fields.

@AltGr

AltGr commented Dec 11, 2019

Copy link
Copy Markdown
Member

Thanks... indeed that is something that was missing, but I couldn't find a satisfying way to do it.

I must say that I am not really convinced by the choice of x-* fields with varying names, which goes somewhat against the existing conventions for opam files. Ideally, the feature could also be used to embed extra files in opam files even for normal uses... which it seems this PR would allow, but in an undocumented and inconvenient way.

Here are a few options I could think of (this is to open discussions, I am not sure about the best design for this...):

  • limit the extra-files embedding to export/import:

    • in this case, I think the extra file content may be better outside of the package "foo" {} section, to preserve the format ; but then it seems tedious to recover the file at the right moment for non-pinned packages
    • an idea could be to just have a field with a hash → contents mapping, that gets unpacked to a temporary cache when the import is done. Then, similar to what is done in the proposition (and for extra_sources:), check there to find the file in prepare_package_source
  • extend the opam file format:

    • the files could be in an extra section or field, to not disrupt file formatting too much
    • to be more usable, it would be best if embedded text files could be put as raw text (we do already have the functions for safe escaping with """)
    • an idea closer to what we have would be to simply overload the checksum field: besides "md5=...", "sha256=..." etc., we could just add "raw=..." (or more reasonably """raw=...""") with the raw (properly escaped) file contents, and "base64=...". This seems the least disruptive, but I am not sure about the readability of the resulting files.

What do you think ?

@hannesm

hannesm commented Dec 11, 2019

Copy link
Copy Markdown
Member Author

I'd first strive for a solution to limit the extra-files embedding for import and export. I'm keen to get to a state where an opam export produces a file that can opam import without any other sources of information (fine with me to download hashed tarballs or git commit ids). It is the minimal invasive solution I could come up with, that solves my issue.

I initially went for a string (no base64) encoding, but that didn't work (IIRC \" was mistranslated, the outputter already does some escaping), and thus decided to use base64 encoded strings instead.

limit the extra-files embedding to export/import:
in this case, I think the extra file content may be better outside of the package "foo" {} section, to preserve the format ; but then it seems tedious to recover the file at the right moment for non-pinned packages
an idea could be to just have a field with a hash → contents mapping, that gets unpacked to a temporary cache when the import is done. Then, similar to what is done in the proposition (and for extra_sources:), check there to find the file in prepare_package_source

that sounds good to me. Unfortunately I don't know the opam codebase well enough to implement this. would you be up for helping out? :)

@hannesm

hannesm commented Dec 17, 2019

Copy link
Copy Markdown
Member Author

To further add to the discussion, I added preliminary functionality which rewrites git urls by specific commits. This is crucial for reproducibility.

It makes me wonder where/how this "import/export" is used at the moment, or rather whether the changes I want for reproducibility should be hidden behind optional arguments or maybe even a different code path.

Another change (this time in orb) cleans up the environment before proceeding further. Opam already at initialisation time captures the environment, that's why I needed to do it before the opam library is loaded somehow. It feels a bit clunky. I'm interested what you think about making the minimizing the environment for opam a default? I.e. is there a reason to not do that? only having HOME and PATH may be a bit too radical.

@rjbou

rjbou commented Jan 9, 2020

Copy link
Copy Markdown
Collaborator

For the last solution, there are several steps:

  • add an extra-files field to switch export file
  • add the directory where they will be stored in path
  • in OpamSwitchCommand.export, fill the new fields with a scanning of the opam files to save
  • in OpamSwitchCommand.import_t, write them (filename = hash ; content = content of the file) in the given directory
  • in OpamAction.prepare_source, check & take if the file exists in the extra-files directory
  • on opam clean empty extra-files directory

You can pick the two first steps in my repo, branch exportXfiles.

@rjbou rjbou added PR: WIP Not for merge at this stage PR: NEEDS UPDATE labels Jan 24, 2020
@hannesm hannesm force-pushed the full-export-import branch from 0e953f7 to 2d53cc4 Compare January 27, 2020 18:09
@hannesm

hannesm commented Jan 27, 2020

Copy link
Copy Markdown
Member Author

I did the steps asked for, apart from " on opam clean empty extra-files directory" since I don't know where I should cleanup things. My impression is that OpamSwitchCommand.clear_switch already cleans the extra_files directory in OpamFilename.rmdir comp_dir.

@hannesm hannesm changed the title Switch export and import: include extra_files in embedded opams Include extra-files in switch export if freeze is provided, on import create an overlay directory with the file contents Jan 27, 2020
@hannesm hannesm requested a review from rjbou January 27, 2020 18:17

@rjbou rjbou left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this rewriting. Some comments, & I still need to test the PR.

Comment thread src/client/opamCommands.ml Outdated
Comment thread src/client/opamCommands.ml Outdated
Comment thread src/client/opamSwitchCommand.ml Outdated
Comment thread src/client/opamSwitchCommand.ml Outdated
Comment thread src/client/opamSwitchCommand.ml Outdated
Comment thread src/client/opamSwitchCommand.ml Outdated
Comment thread src/client/opamAction.ml Outdated
Comment thread src/client/opamAction.ml Outdated
Comment thread src/client/opamAction.ml Outdated
Comment thread src/client/opamSwitchCommand.ml Outdated
@rjbou

rjbou commented Jan 27, 2020

Copy link
Copy Markdown
Collaborator

For the opam clean part, it is in the clean, by adding the remove of the OpamPath.Switch.extra_files directory in the switch cleaning. See there

@hannesm hannesm changed the title Include extra-files in switch export if freeze is provided, on import create an overlay directory with the file contents Include extra-files in switch export if full is provided, on import create an overlay directory with the file contents Jan 27, 2020
@hannesm hannesm requested a review from rjbou January 27, 2020 20:51
Comment thread src/client/opamAction.ml Outdated
Comment thread src/client/opamAction.ml Outdated
Comment thread src/client/opamSwitchCommand.ml Outdated
@rjbou rjbou requested a review from AltGr January 30, 2020 09:59
@hannesm

hannesm commented Feb 3, 2020

Copy link
Copy Markdown
Member Author

hi, thanks to @rjbou for refactoring, this works fine for me -- best be squashed and read as a single commit. It implements the top approach suggested by @AltGr in #4040 (comment)

  • limit the extra-files embedding to export/import:
    • in this case, I think the extra file content may be better outside of the package "foo" {} section, to preserve the format ; but then it seems tedious to recover the file at the right moment for non-pinned packages
    • an idea could be to just have a field with a hash → contents mapping, that gets unpacked to a temporary cache when the import is done. Then, similar to what is done in the proposition (and for extra_sources:), check there to find the file in prepare_package_source

About using OpamFile.OPAM.get_extra_files: if it is used directly in prepare_package_sources, the import won't work since get_extra_files returns the empty list if the files/ subdirectory is not present. Another approach may be to modify get_extra_sources to use the switch-local hashmap -- but get_extra_sources has more call sites than OpamAction.prepare_package_source, and would need switch arguments.

Previously (before my last commit), each extra_file was first looked up in the switch-local hashmap, and uses the files/yyy as fallback. With a0a2561, the behaviour changes slightly: either use all extra_files from files/, or if metadata_dir is absent, use the switch-local hashmap, over the behaviour of this PR.

Please let me know what you think, and how to progress. I'd be happy to get this merged before 2.1 (/cc @AltGr).

My testing of this PR was done as follows:

@rjbou

rjbou commented Feb 3, 2020

Copy link
Copy Markdown
Collaborator

When I was talking about using the get_extra_files function it was not to only uses it, but more to avoid code duplication between get_extra_files code and the one in your first commits. The first check in the cache hashmap is needed to be kept here, and as you said, having each files/* files as a fallback is the way to go.

Comment thread src/client/opamAction.ml Outdated
@rjbou rjbou added PR: LGTM and removed PR: NEEDS UPDATE PR: WIP Not for merge at this stage labels Feb 18, 2020
Comment thread src/format/opamPath.ml Outdated
Comment thread src/format/opamPath.ml Outdated
Comment thread src/format/opamPath.ml Outdated
Comment thread src/format/opamPath.mli
Comment thread src/client/opamSwitchCommand.ml
hannesm and others added 18 commits March 4, 2020 17:56
if extra_files is used, embed the files as base64 encoded data in the opam
file (key x-hash, value the base64 encoded data)

on import, lookup x-hash in opam file, base64 decode, and validate the hash

OpamSwitchCommand.export: optionally pass switch
Co-Authored-By: R. Boujbel <rjbou@ocamlpro.com>
Co-Authored-By: Louis Gesbert <louis.gesbert@ocamlpro.com>
@rjbou rjbou force-pushed the full-export-import branch from b9db138 to 6ea4494 Compare March 4, 2020 16:57
@rjbou

rjbou commented Mar 4, 2020

Copy link
Copy Markdown
Collaborator

updated & rebased

@rjbou rjbou merged commit ccd1ab6 into ocaml:master Mar 5, 2020
@hannesm hannesm deleted the full-export-import branch March 5, 2020 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants