Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicate non-RLP-encodable structures. #736

Merged
merged 1 commit into from Apr 11, 2019
Merged

Conversation

acoglio
Copy link
Member

@acoglio acoglio commented Mar 29, 2019

Besides the changes to the text that can be easily seen in the diff, this commit changes
rb-old
to
rb-new
and also
rl-old
to
rl-new

These new definitions of the RLP encoding functions are consistent with my formalization of RLP encoding in the ACL2 theorem prover, which I have proved to be injective and prefix-unambiguous. The prefix-unambiguity property means that no valid encoding is a strict prefix of another valid encoding; this ensures decodability from a stream of bytes that may not have an end-of-encoding marker.

Excessive large structures cannot be encoded in RLP, namely byte arrays that
contain 2^64 or more bytes, and lists whose concatenated serialized items
contain 2^64 or more bytes. These restrictions ensure that the first byte of an
encoding is indeed a byte, and that the first bytes of byte arrays vs. list
encodings are disjoint. Also see the encode_length function in the RLP page of
the Ethereum Wiki.

Prior to this commit, the definition of the RLP function in Appendix B did not
explicate these restrictions. Even though these restrictions can be reasonably
inferred from the fact that RLP encodings must be easily decodable, this commit
improves clarity by having the RLP function return an explicit "error" value
when the input structure cannot be encoded. We just need a few more cases in the
equations that define the functions R_b, R_l, and s. The error value is
currently \varnothing, but a different symbol could be used instead.
@nicksavers
Copy link
Contributor

@acoglio I agree that the limit was not properly specified and can agree to this change. I can't however, check all client implementations whether they comply exactly with this formal definition. Could you perhaps get various client teams to sign off on this to avoid any incompatibilities?

@acoglio
Copy link
Member Author

acoglio commented Apr 8, 2019

@nicksavers I will contact the teams.

@acoglio
Copy link
Member Author

acoglio commented Apr 9, 2019

@nicksavers I posted a message to the Go Ethereum Discord general channel, then I saw that you had done that already, and the RLP implementor confirmed that the spec change is okay (message of 2019-04-02 from user fjl on general channel).

Note that the 2^64 limit is inherent to the encoding method:

  • If we wanted to encode strings of 2^64 or more bytes, we would need 9 or more bytes for the length. But adding 9 or more to 183 (in equation (180)) yields a first byte that is 192 or more, which would thus overlap with the encodings of lists, whose first byte is 192 or more. So the decoder would not be readily able to distinguish strings from lists based on the first byte.
  • If we wanted to encode lists whose concatenated component encodings are 2^64 bytes or more, we would need 9 or more bytes for the length. But adding 9 or more to 247 (in equation (183)) yields 256 or more, which does not fit in a byte.

The 2^64 limits, although not explicated by the YP, could be argued to be inferable based on the above observations. But I believe that explicating them makes things clearer in the YP.

@acoglio
Copy link
Member Author

acoglio commented Apr 9, 2019

Besides extending the equations, my commit includes the following added text that makes the above observations (in more concise form; I can expand them in a new commit if you think it would be useful):

Byte arrays containing $2^{64}$ or more bytes cannot be encoded. This restriction ensures that the first byte of the encoding of a byte array is always below 192, and thus it can be readily distinguished from the encodings of sequences in $\mathbb{L}$.

Sequences whose concatenated serialized items contain $2^{64}$ or more bytes cannot be encoded. This restriction ensures that the first byte of the encoding does not exceed 255 (otherwise it would not be a byte).

@nicksavers nicksavers merged commit 2a79beb into ethereum:master Apr 11, 2019
@acoglio acoglio deleted the rlp-err branch April 11, 2019 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants