core/vm, crypto/blake2b: add BLAKE2b compression func at 0x09 #19972

pdyraga · 2019-08-16T14:00:09Z

The precompile at 0x09 wraps the BLAKE2b F compression function:
https://tools.ietf.org/html/rfc7693#section-3.2

The precompile requires 6 inputs tightly encoded, taking exactly 213 bytes, as explained below.

rounds - the number of rounds - 32-bit unsigned big-endian word
h - the state vector - 8 unsigned 64-bit little-endian words
m - the message block vector - 16 unsigned 64-bit little-endian words
t_0, t_1 - offset counters - 2 unsigned 64-bit little-endian words
f - the final block indicator flag - 8-bit word

[4 bytes for rounds][64 bytes for h][128 bytes for m][8 bytes for t_0] \
[8 bytes for t_1][1 byte for f]

The boolean f parameter is considered as true if set to 1.
The boolean f parameter is considered as false if set to 0.
All other values yield an invalid encoding of f error.

The precompile should compute the F function as specified in the RFC
(https://tools.ietf.org/html/rfc7693#section-3.2) and return the updated
state vector h with unchanged encoding (little-endian).

See EIP-152 for details.

pdyraga · 2019-08-16T14:42:32Z

To see how it works, you may want to use this simple truffle project https://github.com/pdyraga/f-precompile-call and follow the steps from README

core/vm/contracts.go

pdyraga · 2019-08-20T09:38:20Z

eth/tracers/tracer.go

@@ -390,7 +390,7 @@ func New(code string) (*Tracer, error) {
 		return 1
 	})
 	tracer.vm.PushGlobalGoFunction("isPrecompiled", func(ctx *duktape.Context) int {
-		_, ok := vm.PrecompiledContractsByzantium[common.BytesToAddress(popSlice(ctx))]
+		_, ok := vm.PrecompiledContractsIstanbul[common.BytesToAddress(popSlice(ctx))]


@karalabe I decided to switch it since we are adding a new precompile and we are almost-Istanbul but I am not sure about this change. Can you please confirm?

I think that is an ok change. This adds a helper-method to the js-environment, and it is obviously already not quite correct (if executed pre-byzantium). This change does not really make it a lot more incorrect.
A proper fix would be to look at the block number and chain config, but those are not accessible at this point, so would need a refactor which it out of scope for this PR, imo.

pdyraga · 2019-08-20T10:05:51Z

This is ready for another chance.

karalabe · 2019-08-20T10:31:09Z

I'm trying to pull in the SSE, AVX and AVX2 code from upstream too. They make quite a difference on the execution:

BenchmarkWrite128Generic-8   	 3000000	       420 ns/op	 304.76 MB/s
BenchmarkWrite1KGeneric-8    	  500000	      3200 ns/op	 319.99 MB/s
BenchmarkWrite128SSE4-8      	10000000	       242 ns/op	 527.58 MB/s
BenchmarkWrite1KSSE4-8       	 1000000	      1746 ns/op	 586.22 MB/s
BenchmarkWrite128AVX-8       	10000000	       232 ns/op	 549.68 MB/s
BenchmarkWrite1KAVX-8        	 1000000	      1615 ns/op	 633.93 MB/s
BenchmarkWrite128AVX2-8      	10000000	       179 ns/op	 711.66 MB/s
BenchmarkWrite1KAVX2-8       	 1000000	      1222 ns/op	 837.58 MB/s
BenchmarkSum128Generic-8     	 3000000	       409 ns/op	 312.52 MB/s
BenchmarkSum1KGeneric-8      	  500000	      3076 ns/op	 332.82 MB/s
BenchmarkSum128SSE4-8        	 5000000	       256 ns/op	 498.75 MB/s
BenchmarkSum1KSSE4-8         	 1000000	      1557 ns/op	 657.28 MB/s
BenchmarkSum128AVX-8         	10000000	       250 ns/op	 511.80 MB/s
BenchmarkSum1KAVX-8          	 1000000	      1561 ns/op	 655.62 MB/s
BenchmarkSum128AVX2-8        	10000000	       187 ns/op	 683.22 MB/s
BenchmarkSum1KAVX2-8         	 1000000	      1165 ns/op	 878.79 MB/s

I'll keep hacking on it today, need to understand the assembly code first, then modify it. Will push those on top of this PR when I'm dong and will ask you to take a peek.

pdyraga · 2019-08-21T05:42:02Z

If we can pull them in, that's great but if not or if it's really complex and we can't make it before Friday deadline I think it's not such a big deal. According to the EIP, one round cost 1 gas (benchmarks made on the code we have in this PR) and I think it's so far the cheapest precompile. For ZCash interoperability we'll probably need to execute 10 or 12 rounds.

karalabe · 2019-08-21T09:56:01Z

@holiman @pdyraga I've pushed the SSE, AVX and AVX2 code on top. I opted to keep the entire functionality of Blake2b (instead of gutting and just shipping F) so that we also have a massive test suite from the hash functions.

Performance wise the Blake2B hashes with the extracted F methods are:

$ go test --bench=. ./crypto/blake2b

BenchmarkWrite128Generic-8        5000000           351 ns/op     364.07 MB/s
BenchmarkWrite1KGeneric-8          500000          2712 ns/op     377.44 MB/s
BenchmarkSum128Generic-8          5000000           370 ns/op     345.22 MB/s
BenchmarkSum1KGeneric-8            500000          2792 ns/op     366.69 MB/s

BenchmarkWrite128SSE4-8       10000000           226 ns/op     565.81 MB/s
BenchmarkWrite1KSSE4-8         1000000          1780 ns/op     575.16 MB/s
BenchmarkSum128SSE4-8          5000000           275 ns/op     464.94 MB/s
BenchmarkSum1KSSE4-8           1000000          1800 ns/op     568.77 MB/s

BenchmarkWrite128AVX-8       10000000           216 ns/op     591.98 MB/s
BenchmarkWrite1KAVX-8         1000000          1623 ns/op     630.80 MB/s
BenchmarkSum128AVX-8          5000000           238 ns/op     537.12 MB/s
BenchmarkSum1KAVX-8           1000000          1579 ns/op     648.41 MB/s

BenchmarkWrite128AVX2-8       10000000           172 ns/op     741.66 MB/s
BenchmarkWrite1KAVX2-8         1000000          1444 ns/op     708.99 MB/s
BenchmarkSum128AVX2-8         10000000           203 ns/op     628.17 MB/s
BenchmarkSum1KAVX2-8           1000000          1312 ns/op     780.22 MB/s

I've added an 8M gas test case to the precompile to have that as a benchmark too, then the runs on my laptop with the different instructions sets are:

AVX2:
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-8         	100000000	       124 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-8        	50000000	       252 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-8        	50000000	       254 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-8         	100000000	       137 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-8   	     200	  85985717 ns/op

AVX:
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-8         	100000000	       130 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-8        	50000000	       319 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-8        	50000000	       317 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-8         	100000000	       148 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-8   	     100	 108937506 ns/op

SSE4:
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-8         	100000000	       125 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-8        	50000000	       330 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-8        	50000000	       326 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-8         	100000000	       142 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-8   	     100	 127352137 ns/op

Generic:
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-8         	100000000	       128 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-8        	30000000	       427 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-8        	30000000	       421 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-8         	100000000	       151 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-8   	     100	 220968425 ns/op

karalabe · 2019-08-21T10:01:35Z

With the AVX2 instructions, we can do about 47.5Mgas/sec with the standard 12 rounds, 8Mgas/sec with 1 round.

The precompile at 0x09 wraps the BLAKE2b F compression function: https://tools.ietf.org/html/rfc7693#section-3.2 The precompile requires 6 inputs tightly encoded, taking exactly 213 bytes, as explained below. - `rounds` - the number of rounds - 32-bit unsigned big-endian word - `h` - the state vector - 8 unsigned 64-bit little-endian words - `m` - the message block vector - 16 unsigned 64-bit little-endian words - `t_0, t_1` - offset counters - 2 unsigned 64-bit little-endian words - `f` - the final block indicator flag - 8-bit word [4 bytes for rounds][64 bytes for h][128 bytes for m][8 bytes for t_0] [8 bytes for t_1][1 byte for f] The boolean `f` parameter is considered as `true` if set to `1`. The boolean `f` parameter is considered as `false` if set to `0`. All other values yield an invalid encoding of `f` error. The precompile should compute the F function as specified in the RFC (https://tools.ietf.org/html/rfc7693#section-3.2) and return the updated state vector `h` with unchanged encoding (little-endian). See EIP-152 for details.

tkstanczak · 2019-08-21T12:09:47Z

what will be the gas cost?

holiman · 2019-08-21T12:16:12Z

what will be the gas cost?

It's specified in the EIP. 1 gas per round. Let us know if that does not seem reasonable to you

holiman · 2019-08-21T12:17:41Z

This PR looks good to me. My and @karalabe also did some fuzzing which did not find any discrepancies between the various flavours of assembly

karalabe · 2019-08-21T12:18:01Z

1/round AFAIK (+the cost of the contract call itself, which will outweigh it anyway). A 12 round call (standard blake) takes about 250ns on our implementation, so filling an 8M gas block would take 166ms.

…

On Wed, Aug 21, 2019, 15:09 Tomasz Kajetan Stańczak < ***@***.***> wrote: what will be the gas cost? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19972?email_source=notifications&email_token=AAA7UGPUL6D55ICZN6KWK63QFUWBBA5CNFSM4IMHYTY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ZODGI#issuecomment-523428249>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA7UGNMXYYKPSJGC4X6Z4DQFUWBBANCNFSM4IMHYTYQ> .

holiman · 2019-08-21T12:29:16Z

My marks:

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-6         	20000000	       124 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-6        	 5000000	       244 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-6        	 5000000	       253 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-6         	10000000	       124 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-6   	      20	  86059659 ns/op

So 86ms for the 8Mgas vector. ~~I don't know what instruction set was used... Looking into the build flags now, I'm not fully sure it's correct, my IDE tells me I'm using the generic variant, but I have an amd64 processor...~~ EDIT: Benchmarks from AVX2

holiman · 2019-08-21T12:35:59Z

@karalabe are you sure you got the gccgo right in the build tags? Seems like if gccgo is enabled, it will use the generic version, and vice versa

karalabe · 2019-08-21T12:42:23Z

I just used whatever is in upstream. Btw, gccgo is an alternative Go compiler based on the gcc toolkit. Originally it was an experiment for faster Go binaries, but it was a one guy project and afaik cannot keep up. My guess is that gccgo doesn't implement the assembly, since Go's ASM is custom that needs compilation, not just embedding.

…

On Wed, Aug 21, 2019, 15:36 Martin Holst Swende ***@***.***> wrote: @karalabe <https://github.com/karalabe> are you sure you got the gccgo right in the buld instructions? Seems like if you have gccgo enabled, it will use the generic version, and vice versa — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19972?email_source=notifications&email_token=AAA7UGNVF7RDTNBC2XY4E5TQFUZDJA5CNFSM4IMHYTY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ZQHOA#issuecomment-523436984>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA7UGIXKP7ZXDZABOB53BLQFUZDJANCNFSM4IMHYTYQ> .

karalabe · 2019-08-21T12:45:39Z

Re instruction set, that's decided runtime by the Blake code itself (the F function looks at the CPU capabilities and switches dynamically). It uses the highest SIMD version you have available. From Go's perspective, all 4 variants are plain amd64, the binary is not hard coded on one or another.

…

On Wed, Aug 21, 2019, 15:29 Martin Holst Swende ***@***.***> wrote: My marks: goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/vm BenchmarkPrecompiledBlake2F/vector_4-Gas=0-6 <http://github.com/ethereum/go-ethereum/core/vmBenchmarkPrecompiledBlake2F/vector_4-Gas=0-6> 20000000 124 ns/op BenchmarkPrecompiledBlake2F/vector_5-Gas=12-6 5000000 244 ns/op BenchmarkPrecompiledBlake2F/vector_6-Gas=12-6 5000000 253 ns/op BenchmarkPrecompiledBlake2F/vector_7-Gas=1-6 10000000 124 ns/op BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-6 20 86059659 ns/op So 86ms for the 8Mgas vector. I don't know what instruction set was used... Looking into the build flags now, I'm not fully sure it's correct, my IDE tells me I'm using the generic variant, but I have an amd64 processor... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19972?email_source=notifications&email_token=AAA7UGPZPIQ6BN6E52TFCYDQFUYKDA5CNFSM4IMHYTY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ZPU2Q#issuecomment-523434602>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA7UGMPEDYL5I7JVHM5WPDQFUYKDANCNFSM4IMHYTYQ> .

holiman · 2019-08-21T12:56:59Z

Ok, my IDE seems confused. With some self-induced panics, I discovered that my benchmarks come from the AVX2 implementation.

pdyraga · 2019-08-22T09:20:45Z

🎉

Having the precompile already implemented in go-ethereum, shouldn't we now merge EIP-152 as soon as possible?

pdyraga requested review from holiman, karalabe and rjl493456442 as code owners August 16, 2019 14:00

pdyraga force-pushed the istanbul-eip-152-blake2b-f-precompile branch 2 times, most recently from a428296 to cd4f2e5 Compare August 16, 2019 14:15

karalabe reviewed Aug 16, 2019

View reviewed changes

core/vm/contracts.go Outdated Show resolved Hide resolved

karalabe added the istanbul label Aug 19, 2019

karalabe mentioned this pull request Aug 19, 2019

Istanbul hard-fork meta #19919

Closed

10 tasks

pdyraga changed the title ~~Added BLAKE2b F compression function precompile at 0x09~~ BLAKE2b F compression function precompile Aug 20, 2019

pdyraga force-pushed the istanbul-eip-152-blake2b-f-precompile branch from 8af3cd2 to e5bdb96 Compare August 20, 2019 09:36

pdyraga commented Aug 20, 2019

View reviewed changes

holiman approved these changes Aug 21, 2019

View reviewed changes

pdyraga and others added 2 commits August 21, 2019 13:09

core/vm, crypto/blake2b: add SSE, AVX and AVX2 code

1bccafe

karalabe force-pushed the istanbul-eip-152-blake2b-f-precompile branch from 7c174b3 to 1bccafe Compare August 21, 2019 10:09

karalabe added this to the 1.9.3 milestone Aug 21, 2019

tkstanczak mentioned this pull request Aug 21, 2019

Istanbul tracker NethermindEth/nethermind#771

Closed

15 tasks

tkstanczak mentioned this pull request Aug 22, 2019

EIP-152 F NethermindEth/nethermind#816

Closed

karalabe changed the title ~~BLAKE2b F compression function precompile~~ core/vm, crypto/blake2b: add BLAKE2b compression func at 0x09 Aug 22, 2019

karalabe merged commit 22fdbee into ethereum:master Aug 22, 2019

pdyraga mentioned this pull request Aug 22, 2019

Blake2 F function support in go-ethereum keep-network/blake2b#2

Closed

yoomee1313 mentioned this pull request May 12, 2021

blockchaim/vm: add blake2b f compression precompiled contract (eip152) klaytn/klaytn#960

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/vm, crypto/blake2b: add BLAKE2b compression func at 0x09 #19972

core/vm, crypto/blake2b: add BLAKE2b compression func at 0x09 #19972

pdyraga commented Aug 16, 2019 •

edited

Loading

pdyraga commented Aug 16, 2019

pdyraga Aug 20, 2019

holiman Aug 21, 2019

pdyraga commented Aug 20, 2019

karalabe commented Aug 20, 2019

pdyraga commented Aug 21, 2019

karalabe commented Aug 21, 2019

karalabe commented Aug 21, 2019

tkstanczak commented Aug 21, 2019

holiman commented Aug 21, 2019

holiman commented Aug 21, 2019

karalabe commented Aug 21, 2019 via email

holiman commented Aug 21, 2019 •

edited

Loading

holiman commented Aug 21, 2019 •

edited

Loading

karalabe commented Aug 21, 2019 via email

karalabe commented Aug 21, 2019 via email

holiman commented Aug 21, 2019

pdyraga commented Aug 22, 2019

core/vm, crypto/blake2b: add BLAKE2b compression func at 0x09 #19972

core/vm, crypto/blake2b: add BLAKE2b compression func at 0x09 #19972

Conversation

pdyraga commented Aug 16, 2019 • edited Loading

pdyraga commented Aug 16, 2019

pdyraga Aug 20, 2019

Choose a reason for hiding this comment

holiman Aug 21, 2019

Choose a reason for hiding this comment

pdyraga commented Aug 20, 2019

karalabe commented Aug 20, 2019

pdyraga commented Aug 21, 2019

karalabe commented Aug 21, 2019

karalabe commented Aug 21, 2019

tkstanczak commented Aug 21, 2019

holiman commented Aug 21, 2019

holiman commented Aug 21, 2019

karalabe commented Aug 21, 2019 via email

holiman commented Aug 21, 2019 • edited Loading

holiman commented Aug 21, 2019 • edited Loading

karalabe commented Aug 21, 2019 via email

karalabe commented Aug 21, 2019 via email

holiman commented Aug 21, 2019

pdyraga commented Aug 22, 2019

pdyraga commented Aug 16, 2019 •

edited

Loading

holiman commented Aug 21, 2019 •

edited

Loading

holiman commented Aug 21, 2019 •

edited

Loading