New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add BLAKE3 hash #13194
base: master
Are you sure you want to change the base?
add BLAKE3 hash #13194
Conversation
BLAKE3 is a very fast cryptograpically secure hash. It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2). Check this speed chart: https://raw.githubusercontent.com/BLAKE3-team/BLAKE3/master/media/speed.svg and this bench.php run where BLAKE3 is much faster than every secure hash, and beats several non-cryptographic hashes: $ sapi/cli/php ext/hash/bench.php crc32b 0.001195 crc32c 0.001202 crc32 0.001202 xxh3 0.001234 xxh128 0.001289 xxh64 0.001475 xxh32 0.002235 murmur3f 0.002459 murmur3c 0.003681 murmur3a 0.004289 adler32 0.007718 blake3 0.007752 md4 0.013109 fnv132 0.015075 fnv164 0.015109 fnv1a64 0.015116 fnv1a32 0.015251 joaat 0.018858 md5 0.019797 sha1 0.020472 tiger160,3 0.021290 tiger192,3 0.021363 tiger128,3 0.021366 tiger128,4 0.027518 tiger160,4 0.027743 tiger192,4 0.027870 ripemd128 0.029190 ripemd256 0.029378 sha3-224 0.029787 sha3-256 0.031518 haval256,3 0.038328 haval224,3 0.038479 haval128,3 0.038483 sha3-384 0.038559 haval192,3 0.038564 haval160,3 0.039068 sha512/256 0.039302 sha512 0.039307 sha512/224 0.039472 sha384 0.039508 ripemd160 0.042287 ripemd320 0.043036 sha3-512 0.054044 haval192,4 0.055699 haval224,4 0.055902 haval160,4 0.055925 haval256,4 0.055948 haval128,4 0.056165 sha256 0.057846 sha224 0.058139 haval128,5 0.070442 haval224,5 0.070503 haval256,5 0.070569 haval192,5 0.070576 haval160,5 0.071109 whirlpool 0.086256 gost 0.200251 gost-crypto 0.200709 snefru256 0.449650 snefru 0.451111 md2 1.237880
That s quite an addition ... worth discussing. |
.. don't know what I'm supposed to do with it, but compiler complains about it missing.
f9f7291
to
bf8a527
Compare
This is indeed a lot of code. I think this should be discussed on the list, and possibly voted on. |
fix seems to be non-trivial, can look into it later.
asked on php-internals mailing list: https://marc.info/?l=php-internals&m=170568974400302&w=2 |
#define PHP_BLAKE3_CTX blake3_hasher | ||
// help: is V correct? | ||
//#define PHP_BLAKE3_SPEC "b8b8qb64bbbbb1760" | ||
#define PHP_BLAKE3_SPEC "L8L8Qa64CCCCL8Ca1760" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can someone double-check this? because i couldn't wrap my head around it, and this is just my best guess based on
php-src/ext/hash/blake3/upstream_blake3/c/blake3.h
Lines 42 to 61 in 882466b
typedef struct { | |
uint32_t cv[8]; | |
uint64_t chunk_counter; | |
uint8_t buf[BLAKE3_BLOCK_LEN]; | |
uint8_t buf_len; | |
uint8_t blocks_compressed; | |
uint8_t flags; | |
} blake3_chunk_state; | |
typedef struct { | |
uint32_t key[8]; | |
blake3_chunk_state chunk; | |
uint8_t cv_stack_len; | |
// The stack size is MAX_DEPTH + 1 because we do lazy merging. For example, | |
// with 7 chunks, we have 3 entries in the stack. Adding an 8th chunk | |
// requires a 4th entry, rather than merging everything down to 1, because we | |
// don't know whether more input is coming. This is different from how the | |
// reference implementation does things. | |
uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN]; | |
} blake3_hasher; |
@@ -0,0 +1,13 @@ | |||
#!/bin/sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a personal opinion, but pcre2 nor opcache's JIT code are cloned. I do not see a particular reason to make an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devnexen this script makes it easy to see exactly how blake3 1.5.0 was cloned, and should make it easy to update to future versions (like 1.6.0) in the future - IMO it's worthwhile to keep this script. idk if it should be part of .github/actions/verify-generated-files/action.yml
tho (there's a discussion on that topic here #13194 (comment) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the appeal nor the necessity. Couple of counter arguments from the top of my head :
- does not make build easier, CI included.
- if this blake3 release contains bugs, the upgrade can be done in the same way the internal pcre2 library does. There is no necessity to upgrade versions if it brings nothing to php or goes against php's needs otherwise.
- if this contains bugs we are able to fix quickly but would take a long time (or worse getting refused) upstream, in your case we would need to patch it on the fly or forking the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@divinity76 The point of verify-generated-files
is to avoid sneaking malicious code into generated files that are collapsed during code reviews that people don't generally look at. This risk doesn't exist here. IMO, if the intent is to avoid people mistakenly editing these files, the same issue already exists for all other copied projects, and should be solved universally (if at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of verify-generated-files is to avoid sneaking malicious code into generated files that are collapsed during code reviews that people don't generally look at. This risk doesn't exist here.
Interesting, thanks for the explanation!
Maybe I snuck some malicious code into one of the upstream_blake3/* files? If I did, this script would actually revert it, and then verify-generated-files
would notice it!
I do not see the appeal
My main appeal is that if I want to update the bundled blake3 in the future, i can just change the 1 line git clone --branch '1.5.0'
with git clone --branch '1.5.1'
and the script should do the rest.
does not make build easier, CI included.
fair
if this blake3 release contains bugs, the upgrade can be done in the same way the internal pcre2 library does.
How does upstream pcre2 fixes get applied to the internal pcre2 library? i don't know
There is no necessity to upgrade versions if it brings nothing to php or goes against php's needs otherwise.
There's good updates sometimes, for example the latest v1.5.0 fixed a arm big-endian compile issue, v1.4.1 increased ARM performance by ~10-30%, 1.3.3 fixed a GCC6.1 AVX512 debug corruption issue.
if this contains bugs we are able to fix quickly but would take a long time (or worse getting refused) upstream, in your case we would need to patch it on the fly or forking the repo.
Yeah that's a good point. If we get to that point, this script is useless. Anecdotal evidence, the only time I made a PR to BLAKE3, it was accepted in 11 hours ( BLAKE3-team/BLAKE3#131 ), but it was a trivial one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's good updates sometimes, for example the latest v1.5.0 fixed a arm big-endian compile issue, v1.4.1 increased ARM performance by ~10-30%, 1.3.3 fixed a GCC6.1 AVX512 debug corruption issue.
Those are valid reasons to upgrade but again upgrades can be handled in same manner at the aforementioned php components. Note that I m not against having blake3 feature, I see it as a "nice to have", just the way of the dependency being handled.
post on the mailing list worth re-posting here: wrote some benchmarks, but the TL;DR is: array(6) {
["O2-portable-march"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(29295)
["mb_per_second"]=>
float(533.3674688513398)
}
["O2-portable"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(13876)
["mb_per_second"]=>
float(1126.0449697319111)
}
["O2-sse2"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(4969)
["mb_per_second"]=>
float(3144.4958744214127)
}
["O2-sse41"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(4688)
["mb_per_second"]=>
float(3332.977815699659)
}
["O2-avx2"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(2384)
["mb_per_second"]=>
float(6554.1107382550335)
}
["O2-avx512"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(1753)
["mb_per_second"]=>
float(8913.291500285226)
}
} Edit: -O2 portable: 596MB/s Again, even with -march=native, the compiler cannot make the portable |
Co-authored-by: Peter Kokot <peterkokot@gmail.com>
Co-authored-by: Peter Kokot <peterkokot@gmail.com>
It's missing license in each file so that will need to be added. |
@bukka added the LICENSE file from https://github.com/BLAKE3-team/BLAKE3/blob/master/LICENSE , |
Regarding the license, can we maybe add it to the README.REDIST.BINS file? Number |
I think the added LICENSE file and a note in README.REDIST.BINS should be enough for upstream files. You should add a header to the files that you added - hash_blake3.c and php_hash_blake3.h |
petk: This detects x86 32-bit processor (can be x86, i386 or similar, or "amd" on FreeBSD. Co-authored-by: Peter Kokot <peterkokot@gmail.com>
petk: This detects the x86_64 processor. x86_64 is detected as word amd64 on FreeBSD. The $host_os is in Autoconf used for the target CPU when cross-compiling or as a host CPU when building on the same machine as the program will be also run. Co-authored-by: Peter Kokot <peterkokot@gmail.com>
patches to specifically address a gcc -Werror=logical-op issue explained in BLAKE3-team/BLAKE3#380 and a gcc -Wunused-function issue explained in BLAKE3-team/BLAKE3#382 and optimized upstream git checkout to only fetch the files we want.
vld1q_u8 and vst1q_u8 has no alignment requirements. This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input, from 13920 nanoseconds down to 13800 nanoseconds (approx) ref BLAKE3-team/BLAKE3#384
Having this would be good |
BLAKE3 is a very fast cryptographically secure hash. It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2).
Check this speed chart:
and this bench.php run where BLAKE3 is much faster than every other secure hash, and beats several non-cryptographic hashes (AMD Ryzen 9 7950x):
Quoting Google Bard: