Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide deterministic output by using a Map.Make(String) instead of a hashtable #51

Merged
merged 1 commit into from Dec 14, 2019

Conversation

@hannesm
Copy link
Member

hannesm commented Dec 7, 2019

this is crucial for reproducible builds of unikernels that use crunch

… hashtable

this is crucial for reproducible builds of unikernels that use crunch
@hannesm hannesm force-pushed the hannesm:repro branch from 2429406 to 9e5be9b Dec 7, 2019
@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 9, 2019

//cc @mirage/core any objections to this change (including the API change to avoid global mutable state)? It'd be great to get this released soon

@samoht

This comment has been minimized.

Copy link
Member

samoht commented Dec 9, 2019

LGTM

What about the date? Does it still make sense to include it in the generated file?

@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 9, 2019

the date embedded into a crunched module is either Unix.time () or SOURCE_DATE_EPOCH, which is fine with me (and the reproducible builds people https://reproducible-builds.org/specs/source-date-epoch/).

it is nice to have a last_modified (as required in our mirage-kv interface). another solution would be to use the modification time of individual files and directories (but then, reproducibility is depending on the mtime), thus the current approach is fine with me.

@avsm

This comment has been minimized.

Copy link
Member

avsm commented Dec 9, 2019

Definitely fine by me

@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 14, 2019

I checked the clients of ocaml-crunch (which are released to opam), and none of them use the API of crunch directly (this PR changes that API), but execute the command-line utility.

In addition, it is noteworthy that if you'd used the API before and wanted to crunch multiple separate artifacts, you'd get the wrong output after the first one, since the hash tables were not cleared between runs. This PR fixes this as well (by making the state explicit).

@hannesm hannesm merged commit 6d486bb into mirage:master Dec 14, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
hannesm added a commit to hannesm/opam-repository that referenced this pull request Dec 14, 2019
CHANGES:

* Make crunch reproducible: use a Map.Make(String) instead of Hashtbl.
  Iterating over the former is guaranteed to be sorted over the keys.
  (mirage/ocaml-crunch#51 @hannesm)
* The state (Map.t) is passed explicit to `scan_file`, `output_implementation`
  and `walk_directory_tree`. The value `empty` is provided to construct an
  empty `t`. (mirage/ocaml-crunch#51 @hannesm)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.