Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/vm, crypto/blake2b: add BLAKE2b compression func at 0x09 #19972

Merged

Conversation

pdyraga
Copy link
Contributor

@pdyraga pdyraga commented Aug 16, 2019

The precompile at 0x09 wraps the BLAKE2b F compression function:
https://tools.ietf.org/html/rfc7693#section-3.2

The precompile requires 6 inputs tightly encoded, taking exactly 213 bytes, as explained below.

  • rounds - the number of rounds - 32-bit unsigned big-endian word
  • h - the state vector - 8 unsigned 64-bit little-endian words
  • m - the message block vector - 16 unsigned 64-bit little-endian words
  • t_0, t_1 - offset counters - 2 unsigned 64-bit little-endian words
  • f - the final block indicator flag - 8-bit word
[4 bytes for rounds][64 bytes for h][128 bytes for m][8 bytes for t_0] \
[8 bytes for t_1][1 byte for f]

The boolean f parameter is considered as true if set to 1.
The boolean f parameter is considered as false if set to 0.
All other values yield an invalid encoding of f error.

The precompile should compute the F function as specified in the RFC
(https://tools.ietf.org/html/rfc7693#section-3.2) and return the updated
state vector h with unchanged encoding (little-endian).

See EIP-152 for details.

@pdyraga pdyraga force-pushed the istanbul-eip-152-blake2b-f-precompile branch 2 times, most recently from a428296 to cd4f2e5 Compare August 16, 2019 14:15
@pdyraga
Copy link
Contributor Author

pdyraga commented Aug 16, 2019

To see how it works, you may want to use this simple truffle project https://github.com/pdyraga/f-precompile-call and follow the steps from README

core/vm/contracts.go Outdated Show resolved Hide resolved
@karalabe karalabe mentioned this pull request Aug 19, 2019
10 tasks
@pdyraga pdyraga changed the title Added BLAKE2b F compression function precompile at 0x09 BLAKE2b F compression function precompile Aug 20, 2019
@pdyraga pdyraga force-pushed the istanbul-eip-152-blake2b-f-precompile branch from 8af3cd2 to e5bdb96 Compare August 20, 2019 09:36
@@ -390,7 +390,7 @@ func New(code string) (*Tracer, error) {
return 1
})
tracer.vm.PushGlobalGoFunction("isPrecompiled", func(ctx *duktape.Context) int {
_, ok := vm.PrecompiledContractsByzantium[common.BytesToAddress(popSlice(ctx))]
_, ok := vm.PrecompiledContractsIstanbul[common.BytesToAddress(popSlice(ctx))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karalabe I decided to switch it since we are adding a new precompile and we are almost-Istanbul but I am not sure about this change. Can you please confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is an ok change. This adds a helper-method to the js-environment, and it is obviously already not quite correct (if executed pre-byzantium). This change does not really make it a lot more incorrect.
A proper fix would be to look at the block number and chain config, but those are not accessible at this point, so would need a refactor which it out of scope for this PR, imo.

@pdyraga
Copy link
Contributor Author

pdyraga commented Aug 20, 2019

This is ready for another chance.

@karalabe
Copy link
Member

I'm trying to pull in the SSE, AVX and AVX2 code from upstream too. They make quite a difference on the execution:

BenchmarkWrite128Generic-8   	 3000000	       420 ns/op	 304.76 MB/s
BenchmarkWrite1KGeneric-8    	  500000	      3200 ns/op	 319.99 MB/s
BenchmarkWrite128SSE4-8      	10000000	       242 ns/op	 527.58 MB/s
BenchmarkWrite1KSSE4-8       	 1000000	      1746 ns/op	 586.22 MB/s
BenchmarkWrite128AVX-8       	10000000	       232 ns/op	 549.68 MB/s
BenchmarkWrite1KAVX-8        	 1000000	      1615 ns/op	 633.93 MB/s
BenchmarkWrite128AVX2-8      	10000000	       179 ns/op	 711.66 MB/s
BenchmarkWrite1KAVX2-8       	 1000000	      1222 ns/op	 837.58 MB/s
BenchmarkSum128Generic-8     	 3000000	       409 ns/op	 312.52 MB/s
BenchmarkSum1KGeneric-8      	  500000	      3076 ns/op	 332.82 MB/s
BenchmarkSum128SSE4-8        	 5000000	       256 ns/op	 498.75 MB/s
BenchmarkSum1KSSE4-8         	 1000000	      1557 ns/op	 657.28 MB/s
BenchmarkSum128AVX-8         	10000000	       250 ns/op	 511.80 MB/s
BenchmarkSum1KAVX-8          	 1000000	      1561 ns/op	 655.62 MB/s
BenchmarkSum128AVX2-8        	10000000	       187 ns/op	 683.22 MB/s
BenchmarkSum1KAVX2-8         	 1000000	      1165 ns/op	 878.79 MB/s

I'll keep hacking on it today, need to understand the assembly code first, then modify it. Will push those on top of this PR when I'm dong and will ask you to take a peek.

@pdyraga
Copy link
Contributor Author

pdyraga commented Aug 21, 2019

If we can pull them in, that's great but if not or if it's really complex and we can't make it before Friday deadline I think it's not such a big deal. According to the EIP, one round cost 1 gas (benchmarks made on the code we have in this PR) and I think it's so far the cheapest precompile. For ZCash interoperability we'll probably need to execute 10 or 12 rounds.

@karalabe
Copy link
Member

@holiman @pdyraga I've pushed the SSE, AVX and AVX2 code on top. I opted to keep the entire functionality of Blake2b (instead of gutting and just shipping F) so that we also have a massive test suite from the hash functions.

Performance wise the Blake2B hashes with the extracted F methods are:

$ go test --bench=. ./crypto/blake2b

BenchmarkWrite128Generic-8        5000000           351 ns/op     364.07 MB/s
BenchmarkWrite1KGeneric-8          500000          2712 ns/op     377.44 MB/s
BenchmarkSum128Generic-8          5000000           370 ns/op     345.22 MB/s
BenchmarkSum1KGeneric-8            500000          2792 ns/op     366.69 MB/s

BenchmarkWrite128SSE4-8       10000000           226 ns/op     565.81 MB/s
BenchmarkWrite1KSSE4-8         1000000          1780 ns/op     575.16 MB/s
BenchmarkSum128SSE4-8          5000000           275 ns/op     464.94 MB/s
BenchmarkSum1KSSE4-8           1000000          1800 ns/op     568.77 MB/s

BenchmarkWrite128AVX-8       10000000           216 ns/op     591.98 MB/s
BenchmarkWrite1KAVX-8         1000000          1623 ns/op     630.80 MB/s
BenchmarkSum128AVX-8          5000000           238 ns/op     537.12 MB/s
BenchmarkSum1KAVX-8           1000000          1579 ns/op     648.41 MB/s

BenchmarkWrite128AVX2-8       10000000           172 ns/op     741.66 MB/s
BenchmarkWrite1KAVX2-8         1000000          1444 ns/op     708.99 MB/s
BenchmarkSum128AVX2-8         10000000           203 ns/op     628.17 MB/s
BenchmarkSum1KAVX2-8           1000000          1312 ns/op     780.22 MB/s

I've added an 8M gas test case to the precompile to have that as a benchmark too, then the runs on my laptop with the different instructions sets are:

AVX2:
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-8         	100000000	       124 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-8        	50000000	       252 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-8        	50000000	       254 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-8         	100000000	       137 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-8   	     200	  85985717 ns/op

AVX:
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-8         	100000000	       130 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-8        	50000000	       319 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-8        	50000000	       317 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-8         	100000000	       148 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-8   	     100	 108937506 ns/op

SSE4:
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-8         	100000000	       125 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-8        	50000000	       330 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-8        	50000000	       326 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-8         	100000000	       142 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-8   	     100	 127352137 ns/op

Generic:
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-8         	100000000	       128 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-8        	30000000	       427 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-8        	30000000	       421 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-8         	100000000	       151 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-8   	     100	 220968425 ns/op

@karalabe
Copy link
Member

With the AVX2 instructions, we can do about 47.5Mgas/sec with the standard 12 rounds, 8Mgas/sec with 1 round.

pdyraga and others added 2 commits August 21, 2019 13:09
The precompile at 0x09 wraps the BLAKE2b F compression function:
https://tools.ietf.org/html/rfc7693#section-3.2

The precompile requires 6 inputs tightly encoded, taking exactly 213
bytes, as explained below.

- `rounds` - the number of rounds - 32-bit unsigned big-endian word
- `h` - the state vector - 8 unsigned 64-bit little-endian words
- `m` - the message block vector - 16 unsigned 64-bit little-endian words
- `t_0, t_1` - offset counters - 2 unsigned 64-bit little-endian words
- `f` - the final block indicator flag - 8-bit word

[4 bytes for rounds][64 bytes for h][128 bytes for m][8 bytes for t_0]
[8 bytes for t_1][1 byte for f]

The boolean `f` parameter is considered as `true` if set to `1`.
The boolean `f` parameter is considered as `false` if set to `0`.
All other values yield an invalid encoding of `f` error.

The precompile should compute the F function as specified in the RFC
(https://tools.ietf.org/html/rfc7693#section-3.2) and return the updated
state vector `h` with unchanged encoding (little-endian).

See EIP-152 for details.
@karalabe karalabe force-pushed the istanbul-eip-152-blake2b-f-precompile branch from 7c174b3 to 1bccafe Compare August 21, 2019 10:09
@karalabe karalabe added this to the 1.9.3 milestone Aug 21, 2019
@tkstanczak
Copy link

what will be the gas cost?

@holiman
Copy link
Contributor

holiman commented Aug 21, 2019

what will be the gas cost?

It's specified in the EIP. 1 gas per round. Let us know if that does not seem reasonable to you

@holiman
Copy link
Contributor

holiman commented Aug 21, 2019

This PR looks good to me. My and @karalabe also did some fuzzing which did not find any discrepancies between the various flavours of assembly

@karalabe
Copy link
Member

karalabe commented Aug 21, 2019 via email

@holiman
Copy link
Contributor

holiman commented Aug 21, 2019

My marks:

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
BenchmarkPrecompiledBlake2F/vector_4-Gas=0-6         	20000000	       124 ns/op
BenchmarkPrecompiledBlake2F/vector_5-Gas=12-6        	 5000000	       244 ns/op
BenchmarkPrecompiledBlake2F/vector_6-Gas=12-6        	 5000000	       253 ns/op
BenchmarkPrecompiledBlake2F/vector_7-Gas=1-6         	10000000	       124 ns/op
BenchmarkPrecompiledBlake2F/vector_8-Gas=8000000-6   	      20	  86059659 ns/op

So 86ms for the 8Mgas vector. I don't know what instruction set was used... Looking into the build flags now, I'm not fully sure it's correct, my IDE tells me I'm using the generic variant, but I have an amd64 processor... EDIT: Benchmarks from AVX2

@holiman
Copy link
Contributor

holiman commented Aug 21, 2019

@karalabe are you sure you got the gccgo right in the build tags? Seems like if gccgo is enabled, it will use the generic version, and vice versa

@karalabe
Copy link
Member

karalabe commented Aug 21, 2019 via email

@karalabe
Copy link
Member

karalabe commented Aug 21, 2019 via email

@holiman
Copy link
Contributor

holiman commented Aug 21, 2019

Ok, my IDE seems confused. With some self-induced panics, I discovered that my benchmarks come from the AVX2 implementation.

@karalabe karalabe changed the title BLAKE2b F compression function precompile core/vm, crypto/blake2b: add BLAKE2b compression func at 0x09 Aug 22, 2019
@karalabe karalabe merged commit 22fdbee into ethereum:master Aug 22, 2019
@pdyraga
Copy link
Contributor Author

pdyraga commented Aug 22, 2019

🎉

Having the precompile already implemented in go-ethereum, shouldn't we now merge EIP-152 as soon as possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants