Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add BLAKE3 hash #13194

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

add BLAKE3 hash #13194

wants to merge 22 commits into from

Conversation

divinity76
Copy link
Contributor

@divinity76 divinity76 commented Jan 19, 2024

BLAKE3 is a very fast cryptographically secure hash. It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2).

Check this speed chart:

and this bench.php run where BLAKE3 is much faster than every other secure hash, and beats several non-cryptographic hashes (AMD Ryzen 9 7950x):

$ sapi/cli/php ext/hash/bench.php
crc32b       0.001195
crc32c       0.001202
crc32        0.001202
xxh3         0.001234
xxh128       0.001289
xxh64        0.001475
xxh32        0.002235
murmur3f     0.002459
murmur3c     0.003681
murmur3a     0.004289
adler32      0.007718
blake3       0.007752
md4          0.013109
fnv132       0.015075
fnv164       0.015109
fnv1a64      0.015116
fnv1a32      0.015251
joaat        0.018858
md5          0.019797
sha1         0.020472
tiger160,3   0.021290
tiger192,3   0.021363
tiger128,3   0.021366
tiger128,4   0.027518
tiger160,4   0.027743
tiger192,4   0.027870
ripemd128    0.029190
ripemd256    0.029378
sha3-224     0.029787
sha3-256     0.031518
haval256,3   0.038328
haval224,3   0.038479
haval128,3   0.038483
sha3-384     0.038559
haval192,3   0.038564
haval160,3   0.039068
sha512/256   0.039302
sha512       0.039307
sha512/224   0.039472
sha384       0.039508
ripemd160    0.042287
ripemd320    0.043036
sha3-512     0.054044
haval192,4   0.055699
haval224,4   0.055902
haval160,4   0.055925
haval256,4   0.055948
haval128,4   0.056165
sha256       0.057846
sha224       0.058139
haval128,5   0.070442
haval224,5   0.070503
haval256,5   0.070569
haval192,5   0.070576
haval160,5   0.071109
whirlpool    0.086256
gost         0.200251
gost-crypto  0.200709
snefru256    0.449650
snefru       0.451111
md2          1.237880

Quoting Google Bard:

In summary, BLAKE3 is a modern and highly efficient cryptographic hash function that offers security, flexibility, and ease of use. Its impressive speed and adaptability make it a promising choice for a wide range of applications in the digital age.

  • on x86_64 BLAKE3 is slightly faster than KangarooTwelve, but on ARM BLAKE3 is significantly faster than KangarooTwelve.
  • AFAIK the PHP project doesn't like git submodules (for example Dmitry Stogov wanted the new JIT engine to be a submodule but the PHP developers refused), so instead I made ext/hash/blake3/fetch_upstream_blake3.sh
  • I have not added SSE2/AVX2/AVX512/etc optimized builds to MSVC because I don't have a MSVC system to test on, so I just added the bare-minimum portable implementation to MSVC builds. (feel free to fix it, I don't want to.)

BLAKE3 is a very fast cryptograpically secure hash.
It is the latest iteration of the BLAKE hash, which was a SHA3 finalist (but lost to Keccak in the final round, for being too similar to SHA2).

Check this speed chart: https://raw.githubusercontent.com/BLAKE3-team/BLAKE3/master/media/speed.svg
and this bench.php run where BLAKE3 is much faster than every secure hash, and beats several non-cryptographic hashes:
$ sapi/cli/php ext/hash/bench.php
crc32b       0.001195
crc32c       0.001202
crc32        0.001202
xxh3         0.001234
xxh128       0.001289
xxh64        0.001475
xxh32        0.002235
murmur3f     0.002459
murmur3c     0.003681
murmur3a     0.004289
adler32      0.007718
blake3       0.007752
md4          0.013109
fnv132       0.015075
fnv164       0.015109
fnv1a64      0.015116
fnv1a32      0.015251
joaat        0.018858
md5          0.019797
sha1         0.020472
tiger160,3   0.021290
tiger192,3   0.021363
tiger128,3   0.021366
tiger128,4   0.027518
tiger160,4   0.027743
tiger192,4   0.027870
ripemd128    0.029190
ripemd256    0.029378
sha3-224     0.029787
sha3-256     0.031518
haval256,3   0.038328
haval224,3   0.038479
haval128,3   0.038483
sha3-384     0.038559
haval192,3   0.038564
haval160,3   0.039068
sha512/256   0.039302
sha512       0.039307
sha512/224   0.039472
sha384       0.039508
ripemd160    0.042287
ripemd320    0.043036
sha3-512     0.054044
haval192,4   0.055699
haval224,4   0.055902
haval160,4   0.055925
haval256,4   0.055948
haval128,4   0.056165
sha256       0.057846
sha224       0.058139
haval128,5   0.070442
haval224,5   0.070503
haval256,5   0.070569
haval192,5   0.070576
haval160,5   0.071109
whirlpool    0.086256
gost         0.200251
gost-crypto  0.200709
snefru256    0.449650
snefru       0.451111
md2          1.237880
@devnexen
Copy link
Member

That s quite an addition ... worth discussing.

.. don't know what I'm supposed to do with it, but compiler complains about it missing.
@iluuu1994
Copy link
Member

This is indeed a lot of code. I think this should be discussed on the list, and possibly voted on.

@divinity76
Copy link
Contributor Author

divinity76 commented Jan 19, 2024

asked on php-internals mailing list: https://marc.info/?l=php-internals&m=170568974400302&w=2
(and fixed MSVC and ARM+Neon builds, the latter tested on Oracle Cloud ARM64, which is notably different from Apple's ARM)

#define PHP_BLAKE3_CTX blake3_hasher
// help: is V correct?
//#define PHP_BLAKE3_SPEC "b8b8qb64bbbbb1760"
#define PHP_BLAKE3_SPEC "L8L8Qa64CCCCL8Ca1760"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can someone double-check this? because i couldn't wrap my head around it, and this is just my best guess based on

typedef struct {
uint32_t cv[8];
uint64_t chunk_counter;
uint8_t buf[BLAKE3_BLOCK_LEN];
uint8_t buf_len;
uint8_t blocks_compressed;
uint8_t flags;
} blake3_chunk_state;
typedef struct {
uint32_t key[8];
blake3_chunk_state chunk;
uint8_t cv_stack_len;
// The stack size is MAX_DEPTH + 1 because we do lazy merging. For example,
// with 7 chunks, we have 3 entries in the stack. Adding an 8th chunk
// requires a 4th entry, rather than merging everything down to 1, because we
// don't know whether more input is coming. This is different from how the
// reference implementation does things.
uint8_t cv_stack[(BLAKE3_MAX_DEPTH + 1) * BLAKE3_OUT_LEN];
} blake3_hasher;

@divinity76 divinity76 mentioned this pull request Jan 20, 2024
@@ -0,0 +1,13 @@
#!/bin/sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a personal opinion, but pcre2 nor opcache's JIT code are cloned. I do not see a particular reason to make an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devnexen this script makes it easy to see exactly how blake3 1.5.0 was cloned, and should make it easy to update to future versions (like 1.6.0) in the future - IMO it's worthwhile to keep this script. idk if it should be part of .github/actions/verify-generated-files/action.yml tho (there's a discussion on that topic here #13194 (comment) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the appeal nor the necessity. Couple of counter arguments from the top of my head :

  • does not make build easier, CI included.
  • if this blake3 release contains bugs, the upgrade can be done in the same way the internal pcre2 library does. There is no necessity to upgrade versions if it brings nothing to php or goes against php's needs otherwise.
  • if this contains bugs we are able to fix quickly but would take a long time (or worse getting refused) upstream, in your case we would need to patch it on the fly or forking the repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@divinity76 The point of verify-generated-files is to avoid sneaking malicious code into generated files that are collapsed during code reviews that people don't generally look at. This risk doesn't exist here. IMO, if the intent is to avoid people mistakenly editing these files, the same issue already exists for all other copied projects, and should be solved universally (if at all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iluuu1994

The point of verify-generated-files is to avoid sneaking malicious code into generated files that are collapsed during code reviews that people don't generally look at. This risk doesn't exist here.

Interesting, thanks for the explanation!

Maybe I snuck some malicious code into one of the upstream_blake3/* files? If I did, this script would actually revert it, and then verify-generated-files would notice it!

@devnexen

I do not see the appeal

My main appeal is that if I want to update the bundled blake3 in the future, i can just change the 1 line git clone --branch '1.5.0' with git clone --branch '1.5.1' and the script should do the rest.

does not make build easier, CI included.

fair

if this blake3 release contains bugs, the upgrade can be done in the same way the internal pcre2 library does.

How does upstream pcre2 fixes get applied to the internal pcre2 library? i don't know

There is no necessity to upgrade versions if it brings nothing to php or goes against php's needs otherwise.

There's good updates sometimes, for example the latest v1.5.0 fixed a arm big-endian compile issue, v1.4.1 increased ARM performance by ~10-30%, 1.3.3 fixed a GCC6.1 AVX512 debug corruption issue.

if this contains bugs we are able to fix quickly but would take a long time (or worse getting refused) upstream, in your case we would need to patch it on the fly or forking the repo.

Yeah that's a good point. If we get to that point, this script is useless. Anecdotal evidence, the only time I made a PR to BLAKE3, it was accepted in 11 hours ( BLAKE3-team/BLAKE3#131 ), but it was a trivial one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's good updates sometimes, for example the latest v1.5.0 fixed a arm big-endian compile issue, v1.4.1 increased ARM performance by ~10-30%, 1.3.3 fixed a GCC6.1 AVX512 debug corruption issue.

Those are valid reasons to upgrade but again upgrades can be handled in same manner at the aforementioned php components. Note that I m not against having blake3 feature, I see it as a "nice to have", just the way of the dependency being handled.

@divinity76
Copy link
Contributor Author

divinity76 commented Jan 25, 2024

post on the mailing list worth re-posting here:
GCC11.4, even with -march=native -mtune=native, which is not commonly used in PHP,
the compiler didn't stand a chance against the hand-optimized assembly versions
image

wrote some benchmarks, but the TL;DR is:
portable -O2 usually used by PHP managed 1126MB/s,
portable -O2 -march=native managed 533MB/s (wtf? gcc obviously got
something wrong here),
hand-written -O2 SSE2 managed 3144MB/s,
hand-written -O2 SSE41 managed 3332MB/s,
hand-written -O2 avx2 managed 6554MB/s,
hand-writen -O2 AVX512 managed 8913MB/s,
on my AMD Ryzen 9 7950x,
benchmarking code:
https://gist.github.com/divinity76/5729472dd5d77e94cd0acb245aac2226
raw output:

array(6) {
  ["O2-portable-march"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(29295)
    ["mb_per_second"]=>
    float(533.3674688513398)
  }
  ["O2-portable"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(13876)
    ["mb_per_second"]=>
    float(1126.0449697319111)
  }
  ["O2-sse2"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(4969)
    ["mb_per_second"]=>
    float(3144.4958744214127)
  }
  ["O2-sse41"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(4688)
    ["mb_per_second"]=>
    float(3332.977815699659)
  }
  ["O2-avx2"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(2384)
    ["mb_per_second"]=>
    float(6554.1107382550335)
  }
  ["O2-avx512"]=>
  array(2) {
    ["microseconds_for_16_kib"]=>
    int(1753)
    ["mb_per_second"]=>
    float(8913.291500285226)
  }
}

Edit:
tested ARM Neon optimizations on Oracle Cloud's cheapest ARM VPS:
VM.Standard.A1.Flex, Ubuntu 22.04, GCC11.4,
results:
image

-O2 portable: 596MB/s
-O2 -march=native portable: 601MB/s
-O2 ARM Neon optimized implementation: 1138MB/s

Again, even with -march=native, the compiler cannot make the portable
implementation nearly as fast as the hand-optimized cpu-specific
implementation.

ext/hash/config.m4 Outdated Show resolved Hide resolved
ext/hash/config.w32 Outdated Show resolved Hide resolved
divinity76 and others added 2 commits January 29, 2024 20:18
Co-authored-by: Peter Kokot <peterkokot@gmail.com>
Co-authored-by: Peter Kokot <peterkokot@gmail.com>
@bukka
Copy link
Member

bukka commented Feb 2, 2024

It's missing license in each file so that will need to be added.

@divinity76
Copy link
Contributor Author

@bukka added the LICENSE file from https://github.com/BLAKE3-team/BLAKE3/blob/master/LICENSE ,
but adding them to every file sounds excessive, are you sure we actually want that?

@petk
Copy link
Member

petk commented Feb 3, 2024

Regarding the license, can we maybe add it to the README.REDIST.BINS file? Number 21. Blake3 (ext/hash/blake3) and content of LICENSE file copied into it at the end. This file is also distributed when download PHP source code either for Windows or the tarball from downloads.php.net so it's on one place, and code is a bit less cluttered.

@bukka
Copy link
Member

bukka commented Feb 3, 2024

I think the added LICENSE file and a note in README.REDIST.BINS should be enough for upstream files. You should add a header to the files that you added - hash_blake3.c and php_hash_blake3.h

ext/hash/config.m4 Outdated Show resolved Hide resolved
ext/hash/config.m4 Outdated Show resolved Hide resolved
divinity76 and others added 5 commits February 3, 2024 15:43
petk: This detects x86 32-bit processor (can be x86, i386 or similar, or "amd" on FreeBSD.

Co-authored-by: Peter Kokot <peterkokot@gmail.com>
petk: This detects the x86_64 processor. x86_64 is detected as word amd64 on FreeBSD. The $host_os is in Autoconf used for the target CPU when cross-compiling or as a host CPU when building on the same machine as the program will be also run.

Co-authored-by: Peter Kokot <peterkokot@gmail.com>
patches to specifically address a gcc -Werror=logical-op issue explained in BLAKE3-team/BLAKE3#380
and a gcc -Wunused-function issue explained in BLAKE3-team/BLAKE3#382
and optimized upstream git checkout to only fetch the files we want.
divinity76 and others added 3 commits February 8, 2024 12:29
vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input,
 from 13920 nanoseconds down to 13800 nanoseconds (approx)

ref BLAKE3-team/BLAKE3#384
@divinity76
Copy link
Contributor Author

@bukka @petk added LICENSE headers and reference: 1954b37 - I prefer a reference in README.REDIST.BINS rather than in-lining it.

@cypherbits
Copy link

Having this would be good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants