-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash: Add MurmurHash3 with streaming support #6059
Conversation
I didn't know this hashing function, so I read Wikipedia's page about MurmurHash and found this:
(but maybe MurmurHash 3x versions are not vulnerable?) SipHash is recommended as a safe alternative. I can't find that algo in |
Yeah, siphash might be a good candidate to add, too. Regarding the vulnerability concerns - any non crypto algorithm is prone to collisions. Thanks. |
@m6w6 perhaps you could check upon this as well, if you've some time. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
Thanks for the quick check! |
I'm not sure if anyone cares, but this adds a hard dependency on C++ to PHP. Previously C++ was only used in intl (an optional extension). |
Hm, looking through it again quickly, it doesn't seem to use any C++ specifics, except a few references as parameters for inline functions, which could easily be replaced by macros. |
Thanks for noticing. In this case, the existing proven implementation is reused, porting it to C seems unnecessary. The C++ itself there is very light C like. One impact I could think of is that libstdc++ will need to be present at the runtime, which will increase the storage capacity requirement. In general, porting to C would decouple this from the origin implementation, which would complicate backporting. Of course we could do the discussion broader whether C++ is suitable for an obligatory extension. In general, after the move to C99, some perks of C++ are provided. On the compat side - if not a particular C++ standard version is enforced, but one that can be compiled by a corny compiler (like fe i see is used on travis), that should be ok. If C++ usage is considered a bloker, probably this needs to be solved first. There might be other useful pieces, like spooky hash canonical implementation, that might be going here, too. In my case, i'd rather endorse the way to profit from things that can be reused, rather than reinventing the wheel. Thanks. |
Thinking more on this, another option could be to make
In this priority order :) I'd really prefer 1., as the implementation already works in node, works on both endiannesses, more implementations are to come, etc. Please let me know, if there are strong opinions, i'd otherwise pursue merging in a week or two. @m6w6 @nikic @dstogov @cmb69 @remicollet Thanks |
|
@m6w6 nice, thanks! I was hesitant on this also because this implementation would miss custom seed. That would probably need to touch every init routine to pass a custom hash table, so should be a separate run. Perhaps the way to get it done is to
Still, for others in C++, see spooky for example - that would be a tedious case to port to C :/ Until someone has more time to implement papers like this from scratch, having a complementary ext on pecl that is allowed to use C++ or anything else it needs might be the approach. Perhaps sohuld first go for pecl with all of this, to have some playground and figure out a better way. I haven't seen C++ as a big deal, so didn't expect a glitch situation like this initially :) This wouldn't get into 8.0 anyway, so have time to decide the best direction. Thanks |
Oh, and with php_hash_register_algo - cool, i've missed that! Thanks. |
80d2cf9
to
fa1b355
Compare
Pulled the C port patch and fixed some follow ups. Please revaluate. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need some adjustments to be ubsan clean, either using proper alignment marking (grep unaligned_uint32_t) or by suppressing warnings (grep no_sanitize).
/home/nikic/php/php-src-fuzz/ext/hash/murmur/endianness.h:63:12: runtime error: load of misaligned address 0x60800000d4bb for type 'const uint32_t' (aka 'const unsigned int'), which requires 4 byte alignment
0x60800000d4bb: note: pointer points here
00 3a 01 00 00 00 00 00 00 01 31 30 35 36 35 39 36 32 36 34 3b 69 3a 31 3b 69 3a 2d 38 37 00 00
^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/nikic/php/php-src-fuzz/ext/hash/murmur/endianness.h:63:12 in
Big endian support is probably missing, too -- just came to mind recently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If you want to test big-endian support before committing, you can temporarily delete the line
Line 80 in 5b46bde
if: type = cron |
@m6w6 big endian is in. See Thanks. |
Ah, i just saw the comment about s390. I'll do that later today. I previously tested on an old powermac with Gentoo and on QEMU, but it was with the C++ version. This one certainly needs to be retested on big endian, too. Thanks. |
It doesn't look like Travis wants to run the s390 job, timed out 2nd time already. I will verify another way. Thansk. |
It actually came through, putting the link to the run for the reference https://travis-ci.org/github/php/php-src/builds/739657005 , that's sparing time for a manual test :) I'll be finishing this up next days, probably this weekend. Thanks |
The implementation is based on the upstream PMurHash. The following variants are implemented - murmur3a, 32-bit hash - murmur3c, 128-bit hash for x86 - murmur3f, 128-bit hash for x64 The custom seed support is not targeted by this implementation. It will need a major change to the API, so then custom arguments can be passed through `hash_init`. For now, the starting hash is always zero. Signed-off-by: Anatol Belski <ab@php.net>
Don't convert to numeric hash as it might overflow Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
It is just a normal array index usage, no alignment is needed. Signed-off-by: Anatol Belski <ab@php.net>
3b44b4a
to
7cfaafb
Compare
It looks like there's still issues with unaligned accesses: https://dev.azure.com/phpazuredevops/PHP/_build/results?buildId=12715&view=ms.vss-test-web.build-test-results-tab&runId=281440&resultId=105860&paneView=debug |
Indeed. On my end, i used clang to fix the warnings, but the pipeline seems to use AddressSanitizer from GCC, too. I've pushed a fix. Thanks. |
> **MurmurHash3** > > Added support for MurmurHash3 with streaming support. The following variants are implemented: > * murmur3a, 32-bit hash > * murmur3c, 128-bit hash for x86 > * murmur3f, 128-bit hash for x64 > > **xxHash** > > Added support for xxHash. The following variants are implemented: > * xxh32, 32-bit hash > * xxh64, 64-bit hash > * xxh3, 64-bit hash > * xxh128, 128-bit hash Includes unit tests. Refs: * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.murmurhash3 * php/php-src#6059 * php/php-src@72e91e9 * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.xxhash * php/php-src#6524 * php/php-src@23590f7
> **MurmurHash3** > > Added support for MurmurHash3 with streaming support. The following variants are implemented: > * murmur3a, 32-bit hash > * murmur3c, 128-bit hash for x86 > * murmur3f, 128-bit hash for x64 > > **xxHash** > > Added support for xxHash. The following variants are implemented: > * xxh32, 32-bit hash > * xxh64, 64-bit hash > * xxh3, 64-bit hash > * xxh128, 128-bit hash Refs: * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.murmurhash3 * php/php-src#6059 * php/php-src@72e91e9 * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.xxhash * php/php-src#6524 * php/php-src@23590f7
> **MurmurHash3** > > Added support for MurmurHash3 with streaming support. The following variants are implemented: > * murmur3a, 32-bit hash > * murmur3c, 128-bit hash for x86 > * murmur3f, 128-bit hash for x64 > > **xxHash** > > Added support for xxHash. The following variants are implemented: > * xxh32, 32-bit hash > * xxh64, 64-bit hash > * xxh3, 64-bit hash > * xxh128, 128-bit hash Refs: * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.murmurhash3 * php/php-src#6059 * php/php-src@72e91e9 * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.xxhash * php/php-src#6524 * php/php-src@23590f7
> **MurmurHash3** > > Added support for MurmurHash3 with streaming support. The following variants are implemented: > * murmur3a, 32-bit hash > * murmur3c, 128-bit hash for x86 > * murmur3f, 128-bit hash for x64 > > **xxHash** > > Added support for xxHash. The following variants are implemented: > * xxh32, 32-bit hash > * xxh64, 64-bit hash > * xxh3, 64-bit hash > * xxh128, 128-bit hash Refs: * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.murmurhash3 * php/php-src#6059 * php/php-src@72e91e9 * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.xxhash * php/php-src#6524 * php/php-src@23590f7
> **MurmurHash3** > > Added support for MurmurHash3 with streaming support. The following variants are implemented: > * murmur3a, 32-bit hash > * murmur3c, 128-bit hash for x86 > * murmur3f, 128-bit hash for x64 > > **xxHash** > > Added support for xxHash. The following variants are implemented: > * xxh32, 32-bit hash > * xxh64, 64-bit hash > * xxh3, 64-bit hash > * xxh128, 128-bit hash Refs: * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.murmurhash3 * php/php-src#6059 * php/php-src@72e91e9 * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.xxhash * php/php-src#6524 * php/php-src@23590f7 Co-authored-by: jrfnl <jrfnl@users.noreply.github.com> Closes GH-1451.
> **MurmurHash3** > > Added support for MurmurHash3 with streaming support. The following variants are implemented: > * murmur3a, 32-bit hash > * murmur3c, 128-bit hash for x86 > * murmur3f, 128-bit hash for x64 > > **xxHash** > > Added support for xxHash. The following variants are implemented: > * xxh32, 32-bit hash > * xxh64, 64-bit hash > * xxh3, 64-bit hash > * xxh128, 128-bit hash Refs: * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.murmurhash3 * php/php-src#6059 * php/php-src@72e91e9 * https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash.xxhash * php/php-src#6524 * php/php-src@23590f7 Co-authored-by: jrfnl <jrfnl@users.noreply.github.com> Closes phpGH-1451.
The implementation is based on the upstream PMurHash. The modernized implementation with the progressive processing support is thankfully attributed to node-murmurhash-native The following variants are implemented
The custom seed support is not targeted by this implementation. It will need a major change to the API, so then custom arguments can be passed through
hash_init
. For now, the starting hash is always zero.Signed-off-by: Anatol Belski ab@php.net