Skip to content

Conversation

@Alizter
Copy link
Collaborator

@Alizter Alizter commented Apr 30, 2025

Dune_digest used md5 but this is an implementation detail we shouldn't rely on for computing md5 checksums in package management. We therefore copy and strip down the important parts, renaming any C stubs.

Copy link
Collaborator

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this PR

  1. OCaml ships with Digest which supports MD5 so I don't think vendoring another implementation of MD5 is reasonable.
  2. #11720 is vendoring BLAKE3, which in 2025 seems significantly better than MD5
  3. OCaml's Digest apparently supports BLAKE2 since OCaml 5.2 (TIL & hooray!), so if we wanted to upgrade to a better hash, maybe BLAKE2 would make more sense?

@rgrinberg
Copy link
Member

OCaml ships with Digest which supports MD5 so I don't think vendoring another implementation of MD5 is reasonable.

We are indeed reusing the md5 implementation that is present in OCaml. We're doing so in C rather than OCaml because we don't want to hold the GC lock while digesting.

#11720 is vendoring BLAKE3, which in 2025 seems significantly better than MD5

Right, but we still need MD5 for package management because we are verifying MD5 digests from the opam repository.

OCaml's Digest apparently supports BLAKE2 since OCaml 5.2 (TIL & hooray!), so if we wanted to upgrade to a better hash, maybe BLAKE2 would make more sense?

BLAKE2 is an improvement over MD5, but not a particularly impressive one.

[Dune_digest] used md5 but this is an implementation detail we shouldn't
rely on for computing md5 checksums in package management. We therefore
copy and strip down the important parts, renaming any C stubs.

Signed-off-by: Ali Caglayan <alizter@gmail.com>
Copy link
Collaborator

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see, so this is about holding the GC lock.

Overall the code looks ok, I have one question and one thing that's more of a discussion point…

Alizter added 2 commits May 2, 2025 15:45
Signed-off-by: Ali Caglayan <alizter@gmail.com>
Signed-off-by: Ali Caglayan <alizter@gmail.com>
Signed-off-by: Ali Caglayan <alizter@gmail.com>
Signed-off-by: Ali Caglayan <alizter@gmail.com>
@Alizter Alizter requested a review from rgrinberg May 4, 2025 14:28
@shym
Copy link
Contributor

shym commented May 5, 2025

BLAKE2 is an improvement over MD5, but not a particularly impressive one.

You’re talking about speed, there, I guess?

@rgrinberg rgrinberg merged commit 11a9822 into ocaml:main May 7, 2025
23 of 24 checks passed
Sudha247 pushed a commit to Sudha247/dune that referenced this pull request Jul 23, 2025
* pkg: don't depend on dune_digest

[Dune_digest] used md5 but this is an implementation detail we shouldn't
rely on for computing md5 checksums in package management. We therefore
copy and strip down the important parts, renaming any C stubs.

* refactor(md5): simplify implementation further

* refactor(md5): rename and document md5 api

* refactor(md5): rename *hex_string -> *hex

* md5: better error message on unix error when opening file for digest

Signed-off-by: Ali Caglayan <alizter@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants