Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename Hashtbl.SeededHashedType.{hash => seeded_hash} #11157

Merged
merged 2 commits into from Apr 6, 2022

Conversation

nojb
Copy link
Contributor

@nojb nojb commented Apr 4, 2022

With this change it is possible to define both seeded and unseeded hash functions in the same module. This breaks code instantianting the Hashtbl.MakeSeeded functor, but an OPAM-wide grep shows that not too many packages would be affected:

  • distributed
  • extlib
  • h2
  • lru
  • memtrace
  • ocaml-in-python
  • stdcompat
  • tezos-lwt-result-stdlib

The list only contain the packages directly affected; other packages which vendor one of the previous ones would be affected by transitivity:

  • coccinelle (stdcompat)
  • dream (h2)
  • piaf (h2)
  • ocaml-gist (h2)
  • tezos (all packages) (tezos-lwt-result-stdlib)

Unblocks #8878 #10259

@xavierleroy
Copy link
Contributor

I forgot about this issue, thanks for resurrecting it! I like the proposed change very much, as it makes it possible to export both kinds of hash function from the same module. In turn this facilitates the use of seeded hash tables.

@nojb
Copy link
Contributor Author

nojb commented Apr 4, 2022

cc @ygrek @thierry-martinez @anmonteiro @pqwy @aantron the maintainers of some/most of the affected packages (not sure who is a contact for the Tezos codebase)

@nojb
Copy link
Contributor Author

nojb commented Apr 5, 2022

Also cc @raphael-proust @yurug for the Tezos packages

@raphael-proust
Copy link
Contributor

AFAICT, it'd be relatively easy to adapt tezos-lwt-result-stdlib.
There might be some other minor things that need to be adapted in the tezos codebase but not much.

I'm surprised ringo is not affected. I'll have a look into this.

My main question is: How can we make a library that's compatible with both the existing interface and the proposed interface? In the general case do we need a preprocessor with an OCaml version check?

@nojb
Copy link
Contributor Author

nojb commented Apr 5, 2022

My main question is: How can we make a library that's compatible with both the existing interface and the proposed interface? In the general case do we need a preprocessor with an OCaml version check?

I don't see an easy way without a preprocessor. Another possibility is to depend on a compatibility shim such as https://github.com/thierry-martinez/stdcompat (which itself implements the preprocessor logic).

@nojb
Copy link
Contributor Author

nojb commented Apr 5, 2022

I'm surprised ringo is not affected.

It does not seem to use seeded hash tables.

@thierry-martinez
Copy link
Contributor

I would be very happy with this change, and I think that compatibility shims (generic existing ones and/or a dedicated one) will be enough to have a convenient transition path.

@nojb
Copy link
Contributor Author

nojb commented Apr 6, 2022

There is buy-in from the maintainers of most of the affected packages, so I think this PR is on the right path. We still need an official review or two of this easy PR. Perhaps @alainfrisch and @gasche could do the honours?

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have approved the PR because I believe that the implementation is correct.

In the long run, I don't know if the strategy of picking a seed at table-creation time is the right way to protect ourselves from collision attacks. Also, I don't know if hash has the right API: as a user having to write hash functions, I have often wished for a "digest"-style API where we take as input a hash state and we feed it new data, with a separate finalization step. This is all fairly orthogonal from the present PR -- even though it suggests that the hash API may want to evolve again in the future.

@xavierleroy
Copy link
Contributor

I don't know if the strategy of picking a seed at table-creation time is the right way to protect ourselves from collision attacks

That's the usual approach, AFAIK. Re-seeding an existing table needs a complete rebuild anyway.

Also, I don't know if hash has the right API: as a user having to write hash functions, I have often wished for a "digest"-style API where we take as input a hash state and we feed it new data, with a separate finalization step.

This would be nice but is completely orthogonal to the issue at hand. A first step would be to design this API as a separate library, implemented using the C functions from <caml/hash.h>, or just an OCaml reimplementation of MurmurHash.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a second approval, if it can help :-)

@nojb
Copy link
Contributor Author

nojb commented Apr 6, 2022

Here is a second approval, if it can help :-)

Yes, it does, thanks!

Will merge soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants