-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate and merge yves' stats improvements #31
Comments
The really valuable things in Yves' fork are:
I can't emphasize 3 enough. There are tons of hash functions which pass your fork and which you recommend, but which fail spectacularly with the stricter tests in Yves'. The avalanche test in particular is a massive improvement. This is not magic or the test being overzealous - I can usually find and fix the problem in a very predictable way (most commonly a weak final mixing stage). Going the other direction, I can easily take a hash function which just passes Yves', remove a bunch of things (again, usually in the final mixing stage where operations are by necessity linearized) and have it pass on yours for a significant speed increase. I don't recall every single failure, but I recently added xxh3 to my private fork of Yves' and it fails avalanche on 0-3 byte keys. Other failures have been even more spectacular. Yves hasn't been actively working on his fork in some time. You're actively maintaining yours. It would be ideal if we could have the improvements from Yves' fork in an actively maintained fork. It would also be hugely beneficial if your fork, which is viewed as the canonical one right now, applied the most rigourous standards possible to ensure people used quality hash functions, which is not currently the case. |
I'd love to accept a PR for this. |
Massive amount of work, that, and probably best done by someone familiar with the internals. Much as I'd like to say otherwise, my comment was a response to the 'evaluate' portion of the issue, rather than the 'merge' part ;) |
I started working in that now.
I've missed a few bits:
|
It makes sense conceptually to have the init state generation and the actual run be separate. When people contribute their hashes, they follow the structure that SMHasher requires of them. If SMHasher doesn't appear to require or offer a separate init stage, they'll stick it in the body and be slower. I've seen this with out-of-tree hashes. I appreciate switching to this has got large edit distance. Not that my opinion matters, as the one not doing the work, but I do not at all agree about limiting bit lengths. It breaks at least one promising out-of-tree hash I've seen. Folding n>32 bits into 32 is also essentially a free mixing stage, which would cover up seed generation issues. I don't recall the exact examples of this, but I can almost guarantee that doing this would make things pass on your fork that would fail on Yves'. It was my first experience of his fork in fact - I ran some stuff I was working on using your fork on his, it failed, and I sat there whimpering 'b..b..but it passes on rurban..' Fundamentally, your fork is the canonical one and people code to it. Specifically, its speed results are quoted when people come up with 'the fastest hash in the world' (tm). There are lots of issues with that that I won't go into in this ticket, but the one that specifically relates to this discussion is that any function that has large internal state and wants to initialize it with non-trivially small initial state has to move that into the measured body because of the size limitation. Where Yves' fork makes changes, it is strictly a superset of yours. If we're going to unify them, those changes need to be back-ported whole imho. Unless and until they are, for the things I do, it'd still make way more sense for me personally to port new hashes from your fork to his and run them there. I realize this reads as heckling from the peanut gallery, and I do most sincerely apologize for that ;) |
@rurban in the issue tinypeng/pengyhash#1 the Avalanche and Seed tests from demerphq/smhasher fail for pengyhash, while the hash passed all tests here. |
See https://github.com/demerphq/smhasher/
That version:
The text was updated successfully, but these errors were encountered: