Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blake2b `F` precompile #2129

Open
wants to merge 28 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@mhluongo
Copy link

commented Jun 20, 2019

Adapt #152 into a proper EIP, including an initial implementation and simple benchmarks.

After discussion with the most recent EIP authors and the Zcash team, we've decided to pursue a direct BLAKE2b F precompile rather than focus on improving EIP-131. While EIP-131 is a bit more friendly to application developers, it isn't a flexible enough approach to bring BLAKE2b to the EVM and should be considered a nice-to-have.

Appreciate the support of @MadeofTin @zookozcash @daira @str4d @prestwich in getting to this approach, and of course @pdyraga for his initial implementation.

mhluongo added some commits Jun 16, 2019

Make the draft EIP consistent with the template
Also added myself as an author
Added notes about the in-progress implementation
Should have a working geth precompile and initial benchmarks shortly
Specify EIP-2046 as a requirement
While 2046's cheaper precompile contract calls aren't a requirement for
this EIP's implementation, shipping this precompile without EIP-2046
would make the F function expensive for some of the motivating usecases.
Don't use ABI encoding for precompile
Replace the existing ABI encoding interface to the BLAKE2b `F`
precompile with a loosely pack struct that's `staticcall`-friendly.

H/t to @pdyraga for putting together the interface!
Remove less relevant EIP rationale
Let's not relitigate precompiles, WASM, etc in thie EIP :)
Use 0x09 as the precompile address
If a conflicting EIP is moving forward the EIP editor can assign a new
address
@mhluongo

This comment has been minimized.

Copy link
Author

commented Jun 20, 2019

This supersedes our internal PR, which has some initial discussion.

@mhluongo mhluongo referenced this pull request Jun 20, 2019

Closed

Blake2b F precompile #2

@pdyraga pdyraga referenced this pull request Jun 21, 2019

Open

Blake2b F precompile #4

@mhluongo

This comment has been minimized.

Copy link
Author

commented Jun 21, 2019

Initial implementation with benchmarks is at keep-network/go-ethereum#4 🎉

Looks like the build is choking on the EIP name, though the docs suggested that EIP drafts start with eip-draft-title-in-english.md until a number is assigned 🤷‍♂

Choosing an EIP number
Contributing docs suggest EIPs be named `eip-draft-with-a-title` until
an editor has been assigned, but discussing this work off-platform
without a number is a problem.

Assigning 152 as the issue number where the `F` precompile was
originally raised (#152)
@MadeofTin

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

I suggest closing this EIP at the same time #2136 is merged

Show resolved Hide resolved EIPS/eip-152.md Outdated
Show resolved Hide resolved EIPS/eip-152.md Outdated
Show resolved Hide resolved EIPS/eip-152.md Outdated
Show resolved Hide resolved EIPS/eip-152.md Outdated
Show resolved Hide resolved EIPS/eip-152.md Outdated
Show resolved Hide resolved EIPS/eip-152.md
Show resolved Hide resolved EIPS/eip-152.md Outdated
Show resolved Hide resolved EIPS/eip-152.md Outdated
Show resolved Hide resolved EIPS/eip-152.md Outdated
@pdyraga

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

I think we agree to change the format of input parameters so that we no longer left-pad parameters, and instead, pass ABI-encoded parameters which take exactly 213 bytes.

I modified the sample Go implementation PR with this change to prove that it works.

I can update the EIP tomorrow about this change if there are no objections until then (cc @mhluongo )

@mhluongo

This comment has been minimized.

Copy link
Author

commented Jun 26, 2019

Go for it @pdyraga!

pdyraga added some commits Jun 27, 2019

Updated the interface for F precompile
- F precompile accepts now `abi.encodePacked` parameters taking 
exactly 213 bytes. This is safer and does not require left-padding data
- `rounds` parameter is now the first one as the gas cost depends only
on this parameter
@mhluongo

This comment has been minimized.

Copy link
Author

commented Jun 27, 2019

@axic @holiman feedback addressed- this is ready for another look!

@holiman

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

A few comments:

  • The benchmarks are quoted with absolute figures (ns/op), without comparison with e.g. ecRecover. I think it'd be better to include the stats where they are compared against ecRecover as a baseline. You can run the script here: https://github.com/ethereum/benchmarking/blob/master/scripts/postprocess_geth_v2.py to turn go benchmarks into the type of tables that I provided (like these ).
  • Might also move the benchmarks into some sort of appendix section, outside of the main specification.
  • f as a boolean needs to be further specified. See the ambiguation here, where the go-implementation implements f=true iff input==b0000 0001. I propose that you define f as true iff input != 0, which I think is what solidity implements. Getting these checks wrong across clients means instant consensus break.
@pdyraga

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

Sure, I'll work on it.

pdyraga added some commits Jun 28, 2019

Clarification: f parameter is true if it is nonzero
This rule is compatible with Solidity for boolean.
@pdyraga

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

@holiman All comments addressed, thanks for the great feedback.

@mhluongo

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

Ping @axic, I think this is close / perhaps ready?

Show resolved Hide resolved EIPS/eip-152.md Outdated
Show resolved Hide resolved EIPS/eip-152.md Outdated
## Rationale
<!--The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale may also provide evidence of consensus within the community, and should discuss important objections or concerns raised during discussion.-->

BLAKE2b is an excellent candidate for precompilation. It exhibits an extremely asymmetric efficiency. BLAKE2b is heavily optimized for modern 64-bit CPUs, specifically utilizing 24 and 63-bit rotations to allow parallelism through SIMD instructions and little-endian arithmetic. These characteristics provide exceptional speed on native CPUs: 3.08 cycles per byte, or 1 gibibyte per second on an Intel i5.

This comment has been minimized.

Copy link
@axic

axic Jul 2, 2019

Member

What does "asymmetric efficiency" mean here?

This comment has been minimized.

Copy link
@mhluongo

mhluongo Jul 3, 2019

Author

This was originally @tjade273, but I read it as "this is a heavily optimized operation, and it a great candidate for special-casing"

This comment has been minimized.

Copy link
@axic

axic Jul 3, 2019

Member

"asymmetric" in CS usually means one operation is way faster than the other, e.g. decompression is faster than compression. Perhaps the term is used with a specific meaning in the context of crypto/hashes.

I'm just not sure it gives the right idea to the audience of this EIP.

[4 bytes for rounds][64 bytes for h][128 bytes for m][8 bytes for t_0][8 bytes for t_1][1 byte for f]
```

`f` parameter is considered as `true` if it is `nonzero`. For example:

This comment has been minimized.

Copy link
@axic

axic Jul 2, 2019

Member

Could just shorten this to:

The boolean f parameter is considered as true if at least 1 bit out of the 8 bits is set.


I know @holiman suggested to relax this to "help Solidity", but Solidity always cleaned booleans in outgoing ABI encoded data (e.g. always setting it to 00000001b) and the new encoder also enforces that incoming booleans satisfy this requirement.

Making this strict would help for two reasons:

  • Replacing the precompile with EVM code may be simpler (perhaps not)
  • Reduces the number of test cases. If taking this spec, we could likely end up needing 256x more test cases to check every combination of this field with other fields – if we want complete coverage.

We can discuss this and agree on outside of this PR though.

This comment has been minimized.

Copy link
@pdyraga

pdyraga Jul 3, 2019

Contributor

Maybe I am missing something here but isn't

The boolean f parameter is considered as true if at least 1 bit out of the 8 bits is set.

the same as

f parameter is considered as true if it is nonzero

?

I know the question was about shortening the sentence and I agree what @axic proposed is easier to reason about but at the same time we are discussing "making this strict" and I think "at least 1 bit out of the 8 bits is set" is not strict 🤔

I see some benefit in expressing it as "at least 1 bit out of the 8 bits is set". For example,
00000001 looks as good as 11111111 when representing true.

This comment has been minimized.

Copy link
@axic

axic Jul 3, 2019

Member

My comment had two parts.

The first part asking for simplifying the description. Not sure how to interpret "nonzero" for a boolean field. I think the "at least 1 bit set" could allow removing the examples. I'm not too keen on this change however.

The second arguing against the way boolean is handled, but we can take this second part out of the scope of merging this PR.

This comment has been minimized.

Copy link
@pdyraga

pdyraga Jul 4, 2019

Contributor

I updated the sentence to say

The boolean f parameter is considered as true if at least 1 bit out of the 8 bits is set.

I agree with @axic this is better phrased.

I did not change the logic about how boolean is handled (the second part of the comment) because it seems like we don't have a consensus here between what @holiman and @axic proposed.

This comment has been minimized.

Copy link
@axic

axic Jul 4, 2019

Member

Moved this discussion to the "discussion url": #152 (comment)

@axic

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

@mhluongo @pdyraga left some hopefully final-round comments.

@holiman

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2019

@axic

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

I don't think I said "help solidity" -- it's more that we should interpret input data the same way solidity does. Another option is to fail if f is neither 0x01 or 0x00. As long as it's well defined and somewhat consistent with practice elsewhere I'm fine with either.

Sorry @holiman I didn't mean to put words into your mouth, I don't think you said a sentence like that, just tried to summarise the intent.

I'd propose we make it strict.

P.S. Too bad responding via email doesn't go to the right comment.

pdyraga added some commits Jul 4, 2019

Avoid referring to abi.encodePacked
The specification should not be Solidity-specific. Instead of
referring to abi.encodePacked we now just say "tightly encoded".
Fixed incorrect link
"specified in the RFC" linked to the geth PR for F precompile
instead of linking to the BLAKE2b RFC.

The boolean `f` parameter is considered as `true` if at least 1 bit out of the 8 bits is set.

The precompile should compute the `F` function as [specified in the RFC](https://tools.ietf.org/html/rfc7693#section-3.2) and return the updated state vector `h`.

This comment has been minimized.

Copy link
@axic

axic Jul 4, 2019

Member

I'm still confused how one derives the result from the state vector. I'm sure this is well described in the RFC, but would be nice including a note here for the reader. (Well, then again the same applies to the inputs.)

I guess it is like truncating it to 32-bytes for a 256-bit hash.

I'll raise this on the discussion URL instead.

@daira
Copy link

left a comment

I reviewed this. There's a significant byte ordering issue that should be resolved.

I did not check:

  • the constants in callF;
  • the Gas costs section.
function F(uint32 rounds, bytes32[2] memory h, bytes32[4] memory m, uint64[2] memory t, bool f) public view returns (bytes32[2] memory) {
bytes32[2] memory output;
bytes memory args = abi.encodePacked(rounds, h[0], h[1], m[0], m[1], m[2], m[3], t[0], t[1], f);

This comment has been minimized.

Copy link
@daira

daira Jul 4, 2019

There appears to be a byte order problem here. BLAKE2b is consistently little-endian. abi.encodePacked encodes each of its arguments in big-endian order. EVM's memory accesses are big-endian for types larger than a byte.

For the bytes32 values the intent is clear; a byte array doesn't have endianness, and so the input should be read in the same order as any BLAKE2b implementation operating on byte arrays would read it. However, technically there's a specification ambiguity because in RFC 7693, the h and m inputs to F are specified as being arrays of unsigned 32-bit words. They obviously should be read as little-endian, as the full BLAKE2b hash requires (the ambiguity is caused by the fact that this conversion is specified as part of the full hash, not F).

For the uint64 values, on the other hand, it's totally unclear what the intent is. Note that RFC 7693 specifies t as a single integer, and unpacks it into words within the definition of F. I would imagine that users of this API would try to just increment t[0] for each block, and then using abi.encodePacked here would be plain wrong (the byte order would be swapped).

As well as clarifying this in the text, I strongly suggest adding test vectors (which should include t values >= 264).

This comment has been minimized.

Copy link
@mhluongo

mhluongo Jul 4, 2019

Author

Thanks for the review @daira! The test vectors in the implementation seem to be working, derived from the golang Blake2b implementation and the RFC... so I suspect this would've been a "works on accident" type bug we found on another client implementation.

We'll clarify the text here to remove the ambiguity and see about a couple test vectors for the EIP on top of the geth implementation's vectors.

Thanks again for catching this!

This comment has been minimized.

Copy link
@daira

daira Jul 4, 2019

I'm confused as to how that implementation passes test vectors from the RFC. First of all there is only one execution trace for BLAKE2b in the RFC (in Appendix A), and the test vectors in blake2FTests do not include it in any byte order. Second, there are byte-reversed inputs in the test vectors — for example, "07060504030201000f0e0d0c0b0a09081716151413121110..." looks very much like it is derived from "000102030405060708090a0b0c0d0e0f1011121314151617..." with the order of bytes reversed in each 64-bit word, exactly as I'd expect if the implementation were using the wrong byte order.

As far as I can see, all of the conversions except for rounds (i.e. from line 404 to 428) should be little-endian. https://github.com/keep-network/go-ethereum/pull/4/files#diff-1d2b8390d9dc5764156c9dbe6358a30bR404

This comment has been minimized.

Copy link
@mhluongo

mhluongo Jul 5, 2019

Author

My mistake, I meant to refer to the underlying Blake2b library fork which AFAIK is using canonical test vectors that cover the F implementation (via the Blake2b tests, not F specific).

This comment has been minimized.

Copy link
@pdyraga

pdyraga Jul 5, 2019

Contributor

Many thanks for the review @daira, I was quietly hoping to see someone from Zcash here. 🙇

There appears to be a byte order problem here. BLAKE2b is consistently little-endian. abi.encodePacked encodes each of its arguments in big-endian order. EVM's memory accesses are big-endian for types larger than a byte.

For the bytes32 values the intent is clear; a byte array doesn't have endianness, and so the input should be read in the same order as any BLAKE2b implementation operating on byte arrays would read it. However, technically there's a specification ambiguity because in RFC 7693, the h and m inputs to F are specified as being arrays of unsigned 32-bit words. They obviously should be read as little-endian, as the full BLAKE2b hash requires (the ambiguity is caused by the fact that this conversion is specified as part of the full hash, not F).

There were two reasons why we decided having m and h accepted as bytes32:

  1. Byte array does not have endianness so we did not want to enforce it on the BLAKE2b code using F precompile,
  2. EVM uses 256-bit words, so bytes32 is memory-efficient.

What we did not take into account is what you pointed out in your comment - h and m inputs to F should be read as little-endian. As you have already seen in our implementation, we read them as big-endian just like EVM. I know see that point 1 does not make any sense.

I think the best option here is to leave h and m as we have them now: bytes32[2] memory h, bytes32[4] memory m but make it clear in the EIP that they are arrays of unsigned 32-bit words of little-endian bit order.

For the uint64 values, on the other hand, it's totally unclear what the intent is. Note that RFC 7693 specifies t as a single integer, and unpacks it into words within the definition of F. I would imagine that users of this API would try to just increment t[0] for each block, and then using abi.encodePacked here would be plain wrong (the byte order would be swapped).

Agreed. I think that's another reason (next to EVM memory model) why we should keep h and m as bytes32.

The RFC says t is a 2w-bit offset counter and I assume that it's 2-word counter of little endian interpretation according to 2.4. Little-Endian Interpretation of Words as Bytes. To avoid confusion, maybe it makes sense to accept t as bytes16 and make it clear in the EIP it's unsigned 16-byte word of little-endian bit order: [ 8 bytes for t0 ][ 8 bytes for t1 ].

As well as clarifying this in the text, I strongly suggest adding test vectors (which should include t values >= 264).

We had a problem with test vectors for F - the RFC and all other official resources give test vectors for BLAKE2b as a whole. The test vectors we used in the F implementation and for benchmarks were generated from input/output pairs of execution of F using official test vectors for BLAKE2b as a whole. I do not have any better idea of how to generate something valuable and not entirely detached from the official documentation. If this method sounds good to you, we can add those test vectors to EIP, making sure they are little-endian and include t >= 264.

This comment has been minimized.

Copy link
@daira

daira Jul 8, 2019

I think the best option here is to leave h and m as we have them now: bytes32[2] memory h, bytes32[4] memory m but make it clear in the EIP that they are arrays of unsigned 32-bit words of little-endian bit order.

Byte order, not bit order. (There's no defined ordering of bits within bytes here.)

To avoid confusion, maybe it makes sense to accept t as bytes16 and make it clear in the EIP it's unsigned 16-byte word of little-endian bit order: [ 8 bytes for t0 ][ 8 bytes for t1 ].

Again you mean byte order, but yes that makes sense. It's a little harder to increment a byte array representation but not that much harder (especially as many applications will be limited to less than 256 blocks).

https://tools.ietf.org/html/rfc7693#appendix-A gives one internal test vector set for the F function of BLAKE2b-512. Each 32-bit word in the internal state is presented as a hexadecimal integer, i.e. with the most significant hexadecimal digit on the left.

This comment has been minimized.

Copy link
@daira

daira Jul 8, 2019

Note that if you keep the number of rounds encoded in big endian, that needs to be really clearly documented, otherwise people might assume it has the same byte order as the other fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.