-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document dense encoding of invalid pushdata in EOFv0 #98
base: main
Are you sure you want to change the base?
Conversation
@gballet here's our new proposal which reduces header overhead significantly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting results. The analysis is missing a gas estimate. But it does a priori have a positive impact on code complexity as well as code size.
From what I can see, using scheme 1 and a 64kb limit, the extra gas cost would be of 11200 which is too significant to be hidden in the 21000. But for a code size limit of 24kb, it's - acceptable at 6600.
The questions that remain are :
- Where to store that? Increasing the gas limit will pose a problem as there won't be enough space in the header, so a scheme needs to be devised.
- How to make it work during the transition, as there will be some code that will be translated and some code that will still be in legacy mode. My hunch is that it is possible to check if the header is available, and if not, revert to legacy.
spec/eofv0_verkle.md
Outdated
|
||
Worst case encoding where each chunk contains an invalid `JUMPDEST`: | ||
``` | ||
total_chunk_count = 24576 / 32 = 768 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to figure out what the numbers would be for a maximum code size of 64k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It scales linearly, so same %.
|
||
Let's create a map of `invalid_jumpdests[chunk_no] = first_instruction_offset`. We can densely encode this | ||
map using techniques similar to *run-length encoding* to skip distances and delta-encode offsets. | ||
This map is always fully loaded prior to execution, and so it is important to ensure the encoded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: see how much of those costs could be covered by the 21000 gas.
spec/eofv0_verkle.md
Outdated
|
||
Encoding size: `7 skips (7 * 11 bits) + 9 values (9 * 11 bits)` = 22-bytes header (0.122%) | ||
|
||
Our current hunch is that in average contracts this results in a sub-1% overhead, while the worst case is 4.1%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good results, although I would like to see a full analysis, including of contracts that are close to the 24kb limit. And, ideally, of contracts with 64kb code size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: we will make a table with worst case values for code size limits of 24k, 32k and 64k.
It is possible to place above as part of the "EOFv0" header, but given the upper bound of number of chunks occupied is low (33 vs 21), | ||
it is also possible to make this part of the Verkle account header. | ||
|
||
This second option allows for the simplification of the `code_size` value, as it does not need to change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "second option", you mean "adding it to the account header", not "Scheme 2", right ?
I don't see why there would be a difference with the other case though : in both cases, one needs to use the code size to skip the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "second option", you mean "adding it to the account header", not "Scheme 2", right ?
Yes.
I don't see why there would be a difference with the other case though : in both cases, one needs to use the code size to skip the header.
No because I'd imagine the account header (i.e. not code leafs/keys) would be handled separately, so the actual EVM code remains verbatim.
#### Header location | ||
|
||
It is possible to place above as part of the "EOFv0" header, but given the upper bound of number of chunks occupied is low (33 vs 21), | ||
it is also possible to make this part of the Verkle account header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but if we want to increase the maximum code size to 64k, there won't be enough space left for it in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With scheme 1 it is still 56 verkle leafs for 64k code in worst case. That should still easily fit into the 128 "special" first header leafs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we definitely need a variadic length of this section because the average case (1–2 chunks) is much different from the worst case (20–30 chunks). I.e. you don't want to reserve ~60 chunks in the tree just to use 2 on average.
spec/eofv0_verkle.md
Outdated
- For skip-mode: | ||
- 10-bit number of chunks to skip | ||
- For value-mode: | ||
- 6-bit `first_instruction_offset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question that came up: there can be more than one entry per chunk, if there are more than one PUSHn
in the chunk? Why not store just the overflowing one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "the overflowing one" means.
In the current version for a chunk what has any number of invalid jumpdests we store first instruction offset as in the "vanilla" verkle EIP. This requires to perform the jumpdest analysis on the chunk (as in the "vanilla" verkle).
There are some alternatives to first instruction offest but we currently aim for storing single number per chunk because this really binds the worst case.
#### Header location | ||
|
||
It is possible to place above as part of the "EOFv0" header, but given the upper bound of number of chunks occupied is low (33 vs 21), | ||
it is also possible to make this part of the Verkle account header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we definitely need a variadic length of this section because the average case (1–2 chunks) is much different from the worst case (20–30 chunks). I.e. you don't want to reserve ~60 chunks in the tree just to use 2 on average.
spec/eofv0_verkle.md
Outdated
Arbitrum (2147-bytes long): | ||
``` | ||
(chunk offset, chunk number, pushdata offset) | ||
malicious push byte: 85 2 21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This analysis is wrong because we have to encode first instruction offset instead of first invalid jumpdest offset. I think we should remove this section or at least mark is as incorrect until I'll come with proper analysis.
spec/eofv0_verkle.md
Outdated
|
||
Encoding size: `7 skips (7 * 11 bits) + 9 values (9 * 11 bits)` = 22-bytes header (0.122%) | ||
|
||
Our current hunch is that in average contracts this results in a sub-1% overhead, while the worst case is 4.1%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: we will make a table with worst case values for code size limits of 24k, 32k and 64k.
Co-authored-by: Paweł Bylica <pawel@ethereum.org>
spec/eofv0_verkle.md
Outdated
|
||
Since Solidity contracts have a trailing metadata, which contains a Keccak-256 (32-byte) hash of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we went to keep this trivia?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have data now so this estimation is pointless. I actually checked and probability from all dataset is indeed ~12%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually wasn't correct. The contract hash is not in the PUSH32
therefore doesn't count towards invalid jumpdests.
spec/eofv0_verkle.md
Outdated
- For value-mode: | ||
- 6-bit `first_instruction_offset` | ||
- 7-bit number combining number of chunks to skip `s` and `first_instruction_offset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next line?
Documenting #58 (comment)