Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include extra-files in switch export if full is provided, on import create an overlay directory with the file contents #4040

Merged
merged 18 commits into from Mar 5, 2020

Conversation

hannesm
Copy link
Member

@hannesm hannesm commented Dec 9, 2019

Why I'm doing this? I worked on reproducible builds and tooling thereof, where it is very convenient to use opam export and import for rebuilding the same artifact. The current situation that an export contains only a subset of packages (and thus an import may be incomplete or using package metadata from elsewhere) does not fit well into the reproducible builds concept.

This PR embeds https://github.com/mirage/ocaml-base64 into opam, I'm happy to remove this and use a dependency for that (but I didn't know how to add depedencies for opam) -- or use a hex encoding if prefered.

//cc @rjbou

@hannesm hannesm mentioned this pull request Dec 9, 2019
9 tasks
Copy link
Collaborator

@rjbou rjbou left a comment

Thanks for the PR!
Indeed, keep extra files files permits to be sure to keep consistency between opam file & rest of the package.

On base64, opam-solver has in its dependencies extlib, so an base64 lib is already available. You need to add the dependency for opam-client in the dune & opam file.
To be more concise in the export file, I tend more to base64 encoding.

src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
@rjbou
Copy link
Collaborator

@rjbou rjbou commented Dec 9, 2019

Also, I would change the title to "Switch export and import: include extra_files in embedded opams" or something like that. To include all installed opam file, there is the --full option, and in the PR, only embedded opam file have their extra file included.

@hannesm hannesm changed the title export and import: include all packages, even those with extra_files. Switch export and import: include extra_files in embedded opams Dec 9, 2019
@hannesm
Copy link
Member Author

@hannesm hannesm commented Dec 9, 2019

thanks for the review, and the suggestion for the title. I managed (let's see what CI thinks) to use the extlib base64, and renamed the fields.

@AltGr
Copy link
Member

@AltGr AltGr commented Dec 11, 2019

Thanks... indeed that is something that was missing, but I couldn't find a satisfying way to do it.

I must say that I am not really convinced by the choice of x-* fields with varying names, which goes somewhat against the existing conventions for opam files. Ideally, the feature could also be used to embed extra files in opam files even for normal uses... which it seems this PR would allow, but in an undocumented and inconvenient way.

Here are a few options I could think of (this is to open discussions, I am not sure about the best design for this...):

  • limit the extra-files embedding to export/import:

    • in this case, I think the extra file content may be better outside of the package "foo" {} section, to preserve the format ; but then it seems tedious to recover the file at the right moment for non-pinned packages
    • an idea could be to just have a field with a hash → contents mapping, that gets unpacked to a temporary cache when the import is done. Then, similar to what is done in the proposition (and for extra_sources:), check there to find the file in prepare_package_source
  • extend the opam file format:

    • the files could be in an extra section or field, to not disrupt file formatting too much
    • to be more usable, it would be best if embedded text files could be put as raw text (we do already have the functions for safe escaping with """)
    • an idea closer to what we have would be to simply overload the checksum field: besides "md5=...", "sha256=..." etc., we could just add "raw=..." (or more reasonably """raw=...""") with the raw (properly escaped) file contents, and "base64=...". This seems the least disruptive, but I am not sure about the readability of the resulting files.

What do you think ?

@hannesm
Copy link
Member Author

@hannesm hannesm commented Dec 11, 2019

I'd first strive for a solution to limit the extra-files embedding for import and export. I'm keen to get to a state where an opam export produces a file that can opam import without any other sources of information (fine with me to download hashed tarballs or git commit ids). It is the minimal invasive solution I could come up with, that solves my issue.

I initially went for a string (no base64) encoding, but that didn't work (IIRC \" was mistranslated, the outputter already does some escaping), and thus decided to use base64 encoded strings instead.

limit the extra-files embedding to export/import:
in this case, I think the extra file content may be better outside of the package "foo" {} section, to preserve the format ; but then it seems tedious to recover the file at the right moment for non-pinned packages
an idea could be to just have a field with a hash → contents mapping, that gets unpacked to a temporary cache when the import is done. Then, similar to what is done in the proposition (and for extra_sources:), check there to find the file in prepare_package_source

that sounds good to me. Unfortunately I don't know the opam codebase well enough to implement this. would you be up for helping out? :)

@hannesm
Copy link
Member Author

@hannesm hannesm commented Dec 17, 2019

To further add to the discussion, I added preliminary functionality which rewrites git urls by specific commits. This is crucial for reproducibility.

It makes me wonder where/how this "import/export" is used at the moment, or rather whether the changes I want for reproducibility should be hidden behind optional arguments or maybe even a different code path.

Another change (this time in orb) cleans up the environment before proceeding further. Opam already at initialisation time captures the environment, that's why I needed to do it before the opam library is loaded somehow. It feels a bit clunky. I'm interested what you think about making the minimizing the environment for opam a default? I.e. is there a reason to not do that? only having HOME and PATH may be a bit too radical.

@rjbou
Copy link
Collaborator

@rjbou rjbou commented Jan 9, 2020

For the last solution, there are several steps:

  • add an extra-files field to switch export file
  • add the directory where they will be stored in path
  • in OpamSwitchCommand.export, fill the new fields with a scanning of the opam files to save
  • in OpamSwitchCommand.import_t, write them (filename = hash ; content = content of the file) in the given directory
  • in OpamAction.prepare_source, check & take if the file exists in the extra-files directory
  • on opam clean empty extra-files directory

You can pick the two first steps in my repo, branch exportXfiles.

@rjbou rjbou added PR: WIP PR: NEEDS UPDATE labels Jan 24, 2020
@hannesm
Copy link
Member Author

@hannesm hannesm commented Jan 27, 2020

I did the steps asked for, apart from " on opam clean empty extra-files directory" since I don't know where I should cleanup things. My impression is that OpamSwitchCommand.clear_switch already cleans the extra_files directory in OpamFilename.rmdir comp_dir.

@hannesm hannesm changed the title Switch export and import: include extra_files in embedded opams Include extra-files in switch export if freeze is provided, on import create an overlay directory with the file contents Jan 27, 2020
@hannesm hannesm requested a review from rjbou Jan 27, 2020
Copy link
Collaborator

@rjbou rjbou left a comment

Thanks for this rewriting. Some comments, & I still need to test the PR.

src/client/opamCommands.ml Outdated Show resolved Hide resolved
src/client/opamCommands.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
src/client/opamAction.ml Outdated Show resolved Hide resolved
src/client/opamAction.ml Outdated Show resolved Hide resolved
src/client/opamAction.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
@rjbou
Copy link
Collaborator

@rjbou rjbou commented Jan 27, 2020

For the opam clean part, it is in the clean, by adding the remove of the OpamPath.Switch.extra_files directory in the switch cleaning. See there

@hannesm hannesm changed the title Include extra-files in switch export if freeze is provided, on import create an overlay directory with the file contents Include extra-files in switch export if full is provided, on import create an overlay directory with the file contents Jan 27, 2020
@hannesm hannesm requested a review from rjbou Jan 27, 2020
src/client/opamAction.ml Outdated Show resolved Hide resolved
src/client/opamAction.ml Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Outdated Show resolved Hide resolved
@rjbou rjbou requested a review from AltGr Jan 30, 2020
@hannesm
Copy link
Member Author

@hannesm hannesm commented Feb 3, 2020

hi, thanks to @rjbou for refactoring, this works fine for me -- best be squashed and read as a single commit. It implements the top approach suggested by @AltGr in #4040 (comment)

  • limit the extra-files embedding to export/import:
    • in this case, I think the extra file content may be better outside of the package "foo" {} section, to preserve the format ; but then it seems tedious to recover the file at the right moment for non-pinned packages
    • an idea could be to just have a field with a hash → contents mapping, that gets unpacked to a temporary cache when the import is done. Then, similar to what is done in the proposition (and for extra_sources:), check there to find the file in prepare_package_source

About using OpamFile.OPAM.get_extra_files: if it is used directly in prepare_package_sources, the import won't work since get_extra_files returns the empty list if the files/ subdirectory is not present. Another approach may be to modify get_extra_sources to use the switch-local hashmap -- but get_extra_sources has more call sites than OpamAction.prepare_package_source, and would need switch arguments.

Previously (before my last commit), each extra_file was first looked up in the switch-local hashmap, and uses the files/yyy as fallback. With a0a2561, the behaviour changes slightly: either use all extra_files from files/, or if metadata_dir is absent, use the switch-local hashmap, over the behaviour of this PR.

Please let me know what you think, and how to progress. I'd be happy to get this merged before 2.1 (/cc @AltGr).

My testing of this PR was done as follows:

@rjbou
Copy link
Collaborator

@rjbou rjbou commented Feb 3, 2020

When I was talking about using the get_extra_files function it was not to only uses it, but more to avoid code duplication between get_extra_files code and the one in your first commits. The first check in the cache hashmap is needed to be kept here, and as you said, having each files/* files as a fallback is the way to go.

src/client/opamAction.ml Outdated Show resolved Hide resolved
@rjbou rjbou added PR: LGTM and removed PR: NEEDS UPDATE PR: WIP labels Feb 18, 2020
rjbou
rjbou approved these changes Feb 18, 2020
src/format/opamPath.ml Outdated Show resolved Hide resolved
src/format/opamPath.ml Outdated Show resolved Hide resolved
src/format/opamPath.ml Outdated Show resolved Hide resolved
src/format/opamPath.mli Outdated Show resolved Hide resolved
src/client/opamSwitchCommand.ml Show resolved Hide resolved
@rjbou
Copy link
Collaborator

@rjbou rjbou commented Mar 4, 2020

updated & rebased

@rjbou rjbou merged commit ccd1ab6 into ocaml:master Mar 5, 2020
1 of 2 checks passed
@hannesm hannesm deleted the full-export-import branch Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants