Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide deterministic output by using a Map.Make(String) instead of a hashtable #51

Merged
merged 1 commit into from Dec 14, 2019

Conversation

hannesm
Copy link
Member

@hannesm hannesm commented Dec 7, 2019

this is crucial for reproducible builds of unikernels that use crunch

… hashtable

this is crucial for reproducible builds of unikernels that use crunch
@hannesm
Copy link
Member Author

@hannesm hannesm commented Dec 9, 2019

//cc @mirage/core any objections to this change (including the API change to avoid global mutable state)? It'd be great to get this released soon

@samoht
Copy link
Member

@samoht samoht commented Dec 9, 2019

LGTM

What about the date? Does it still make sense to include it in the generated file?

@hannesm
Copy link
Member Author

@hannesm hannesm commented Dec 9, 2019

the date embedded into a crunched module is either Unix.time () or SOURCE_DATE_EPOCH, which is fine with me (and the reproducible builds people https://reproducible-builds.org/specs/source-date-epoch/).

it is nice to have a last_modified (as required in our mirage-kv interface). another solution would be to use the modification time of individual files and directories (but then, reproducibility is depending on the mtime), thus the current approach is fine with me.

@avsm
Copy link
Member

@avsm avsm commented Dec 9, 2019

Definitely fine by me

@hannesm
Copy link
Member Author

@hannesm hannesm commented Dec 14, 2019

I checked the clients of ocaml-crunch (which are released to opam), and none of them use the API of crunch directly (this PR changes that API), but execute the command-line utility.

In addition, it is noteworthy that if you'd used the API before and wanted to crunch multiple separate artifacts, you'd get the wrong output after the first one, since the hash tables were not cleared between runs. This PR fixes this as well (by making the state explicit).

@hannesm hannesm merged commit 6d486bb into mirage:master Dec 14, 2019
1 check passed
hannesm added a commit to hannesm/opam-repository that referenced this issue Dec 14, 2019
CHANGES:

* Make crunch reproducible: use a Map.Make(String) instead of Hashtbl.
  Iterating over the former is guaranteed to be sorted over the keys.
  (mirage/ocaml-crunch#51 @hannesm)
* The state (Map.t) is passed explicit to `scan_file`, `output_implementation`
  and `walk_directory_tree`. The value `empty` is provided to construct an
  empty `t`. (mirage/ocaml-crunch#51 @hannesm)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants