Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SHA computation faster by using ocaml-sha #5042

Merged
merged 5 commits into from
Apr 30, 2022

Conversation

kit-ty-kate
Copy link
Member

In the spirit of #3217 and #4823 here is a PR that removes the runtime dependency on openssl.

In the expected range of use of opam it also makes installation and updates faster than before.
Here is the benchmark i used to show both correctness and speed:

let fmt = Printf.sprintf

let digestif file =
  let ci = open_in_bin file in
  let sha = Digestif.SHA512.init () in
  let buf = Bytes.create 1024 in
  let rec aux sha =
    match input ci buf 0 1024 with
    | 0 | exception End_of_file -> Digestif.SHA512.get sha
    | len -> aux (Digestif.SHA512.feed_bytes sha ~len buf)
  in
  let sha = aux sha in
  close_in ci;
  Digestif.SHA512.to_hex sha

let time name f =
  let timer = OpamConsole.timer () in
  let sha = f () in
  print_endline (fmt "%s calculated %s in %fs" name sha (timer ()))

let () =
  let file = Sys.argv.(1) in
  OpamCoreConfig.init ~debug_level:1 ();
  time "digestif" (fun () -> digestif file);
  time "ocaml-sha" (fun () -> Sha512.to_hex (Sha512.file file));
  time "openssl" (fun () -> OpamHash.to_string (OpamHash.compute ~kind:`SHA512 file));
  time "opam-core" (fun () -> OpamSHA.hash `SHA512 file);
  ()

Using these typical files:

opam-repository/packages/coq/coq.8.13.0/files/coq.install: 172B
opam-repository/packages/ocaml-base-compiler/ocaml-base-compiler.4.02.1/files/fix-gcc10.patch: 756B
opam-repository/packages/ocaml-base-compiler/ocaml-base-compiler.4.02.1/files/gpr1330.patch: 2.0K
irmin-2.10.2.tbz: 285K

and this large atypical file:

FreeBSD-13.0-RELEASE-amd64-bootonly.iso: 347M

Here is the result I get for SHA512:

$ ./a.out opam-repository/packages/coq/coq.8.13.0/files/coq.install
digestif calculated b501737b4dbd22adc1c0377d744448056fb1dc493caf72... in 0.001532s
ocaml-sha calculated b501737b4dbd22adc1c0377d744448056fb1dc493caf72c... in 0.000036s
openssl calculated b501737b4dbd22adc1c0377d744448056fb1dc493caf72c... in 0.003682s
opam-core calculated b501737b4dbd22adc1c0377d744448056fb1dc493caf72c... in 0.000059s

$ ./a.out opam-repository/packages/ocaml-base-compiler/ocaml-base-compiler.4.02.1/files/fix-gcc10.patch
digestif calculated e71423ff0dad6261fb369f6c30a51062b2fe3df7a1b4bf2f7424c756... in 0.001612s
ocaml-sha calculated e71423ff0dad6261fb369f6c30a51062b2fe3df7a1b4bf2f7424c756... in 0.000126s
digestif calculated e71423ff0dad6261fb369f6c30a51062b2fe3df7a1b4bf2f7424c756... in 0.004505s
opam-core calculated e71423ff0dad6261fb369f6c30a51062b2fe3df7a1b4bf2f7424c756... in 0.000088s

$ ./a.out opam-repository/packages/ocaml-base-compiler/ocaml-base-compiler.4.02.1/files/gpr1330.patch
digestif calculated f691fb074f601ebe7accf41d9dcf5ad8793ad8b8cab13... in 0.001678s
ocaml-sha calculated f691fb074f601ebe7accf41d9dcf5ad8793ad8b8cab13fc... in 0.000119s
openssl calculated f691fb074f601ebe7accf41d9dcf5ad8793ad8b8cab13fc... in 0.004745s
opam-core calculated f691fb074f601ebe7accf41d9dcf5ad8793ad8b8cab13fc... in 0.000157s

$ ./a.out irmin-2.10.2.tbz
digestif calculated bbc03417d6eb87d99bee391c489d23a586b0a260e4c797f5e... in 0.002477s
ocaml-sha calculated bbc03417d6eb87d99bee391c489d23a586b0a260e4c797f5e... in 0.002120s
openssl calculated bbc03417d6eb87d99bee391c489d23a586b0a260e4c797f5e... in 0.005695s
opam-core calculated bbc03417d6eb87d99bee391c489d23a586b0a260e4c797f5e... in 0.006051s

$ ./a.out FreeBSD-13.0-RELEASE-amd64-bootonly.iso
digestif calculated 4bb54bbe258a7369cf6255c2324c26eeb302a2c294bbef... in 1.430754s
ocaml-sha calculated 4bb54bbe258a7369cf6255c2324c26eeb302a2c294bbef... in 1.418716s
openssl calculated 4bb54bbe258a7369cf6255c2324c26eeb302a2c294bbef... in 0.843393s
opam-core calculated 4bb54bbe258a7369cf6255c2324c26eeb302a2c294bbef... in 4.579879s

The same order of result can be observed with SHA256. e.g.

$ ./a.out irmin-2.10.2.tbz
digestif calculated 088741b619471c8198e74aaf9f4aeb4a9997abec8b9abcbfb3443fd27bfb433f in 0.001824s
ocaml-sha calculated 088741b619471c8198e74aaf9f4aeb4a9997abec8b9abcbfb3443fd27bfb433f in 0.002390s
openssl calculated 088741b619471c8198e74aaf9f4aeb4a9997abec8b9abcbfb3443fd27bfb433f in 0.003565s
opam-core calculated 088741b619471c8198e74aaf9f4aeb4a9997abec8b9abcbfb3443fd27bfb433f in 0.010110s

Awaiting review to put the rest of the work (vendoring) for this PR to pass CI.
This PR is not urgent at all and rebase should be easy so no rush to put this before opam 2.2.0

Fixes #5037
Would help #4158

@kit-ty-kate kit-ty-kate added the PR: WIP Not for merge at this stage label Feb 4, 2022
@AltGr
Copy link
Member

AltGr commented Feb 7, 2022

Nice, thanks for all this! I am a bit surprised by the poor results of Digestif here; I think that to be fair you should use its bigstrings functions and an mmap, to avoid the extra memory copy of the whole data — like is done in OpamSHA. Then I'd expect it to be as fast as OpamSHA in all cases ?

We'll need to remember to check how the static linking scripts fare with the new C code for next release (I think it should be fine 🤞 but...)

Another precision: there are lots of cases where opam computes hashes from already in-memory strings. The timings would be negligible in this case, except if we need to exec() an external binary to compute the hash like it used to be the case with openssl before we had OpamSHA.

@kit-ty-kate
Copy link
Member Author

kit-ty-kate commented Feb 8, 2022

Note by @AltGr: this PR in its current state would make it possible to remove the dependency towards bigarray entirely (see #5049)

Copy link
Collaborator

@rjbou rjbou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things around to fix, but on the idea, ok!

doc/pages/Manual.md Show resolved Hide resolved
src/client/opamArg.ml Show resolved Hide resolved
@kit-ty-kate kit-ty-kate force-pushed the ocaml-sha branch 2 times, most recently from c50527b to f87bc3a Compare April 5, 2022 15:27
@kit-ty-kate kit-ty-kate removed the PR: WIP Not for merge at this stage label Apr 5, 2022
@kit-ty-kate kit-ty-kate added this to PR in progress in Opam 2.2.0 via automation Apr 5, 2022
@kit-ty-kate kit-ty-kate added this to the 2.2.0~alpha milestone Apr 5, 2022
@rjbou rjbou added the PR: QUEUED Pending pull request, waiting for other work to be merged or closed label Apr 5, 2022
@kit-ty-kate kit-ty-kate removed the PR: QUEUED Pending pull request, waiting for other work to be merged or closed label Apr 6, 2022
@kit-ty-kate kit-ty-kate force-pushed the ocaml-sha branch 2 times, most recently from b01c0c4 to 2df52db Compare April 6, 2022 12:22
@rjbou
Copy link
Collaborator

rjbou commented Apr 6, 2022

As there is usage replacement in OpamSHA, should we in the same time update Digest usage?

@@ -18,17 +18,8 @@ val sha512_file: string -> string
val hash_file: [< `SHA256 | `SHA512 ] -> string -> string


val sha256_bytes: Bytes.t -> string
val sha256_string: string -> string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why renaming these functions from sha256 to sha256_string ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not a renaming, only removal of functions that are not used.

@rjbou
Copy link
Collaborator

rjbou commented Apr 8, 2022

fixed the error is

ocamlfind install sha _build/install/default/sha/META _build/install/default/sha/libsha_stubs.a _build/install/default/sha/sha.* _build/install/default/stublibs/*.so
ocamlfind: _build/install/default/sha/META: No such file or directory
make[2]: Leaving directory '/home/runner/work/opam/opam/src_ext/sha'
make[2]: *** [../Makefile.packages:157: sha-pkg-build] Error 2
make[1]: *** [Makefile:125: sha.pkgbuild] Error 2
make[1]: *** Waiting for unfinished jobs....

@rjbou
Copy link
Collaborator

rjbou commented Apr 15, 2022

(new) error is

  + make all admin
  dune build --profile=dev --root .  --promote-install-files -- opam-installer.install opam.install
      ocamlopt src/core/.opam_core.objs/native/opamSHA.{cmx,o} (exit 2)
  (cd _build/default && /home/runner/work/opam/opam/bootstrap/ocaml/bin/ocamlopt.opt -w [...] -o src/core/.opam_core.objs/native/opamSHA.cmx -c -impl src/core/opamSHA.ml)
  File "src/core/opamSHA.ml", line 11, characters 23-36:
  11 | let sha256_file file = Sha256.to_hex (Sha256.file file)
                              ^^^^^^^^^^^^^
  Error: Unbound module Sha256
  make: *** [Makefile:143: build-opam] Error 1

@rjbou rjbou moved this from PR in progress to PR finalised (merge with CI) in Opam 2.2.0 Apr 27, 2022
@rjbou rjbou moved this from PR finalised (merge with CI) to PR to review in Opam 2.2.0 Apr 27, 2022
src_ext/Makefile.packages Outdated Show resolved Hide resolved
@rjbou rjbou moved this from PR to review to PR finalised (merge with CI) in Opam 2.2.0 Apr 28, 2022
@kit-ty-kate kit-ty-kate force-pushed the ocaml-sha branch 2 times, most recently from 84a6ff9 to 3ec98b2 Compare April 29, 2022 10:45
@rjbou rjbou merged commit e695af2 into ocaml:master Apr 30, 2022
Opam 2.2.0 automation moved this from PR finalised (merge with CI) to Done Apr 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Opam 2.2.0
  
Done
Development

Successfully merging this pull request may close these issues.

opam update doesn’t need to checksum every extra-files
4 participants