Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document dense encoding of invalid pushdata in EOFv0 #98

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

axic
Copy link
Member

@axic axic commented Apr 25, 2024

Documenting #58 (comment)

@axic
Copy link
Member Author

axic commented Apr 25, 2024

@gballet here's our new proposal which reduces header overhead significantly.

spec/eofv0_verkle.md Outdated Show resolved Hide resolved
spec/eofv0_verkle.md Outdated Show resolved Hide resolved
spec/eofv0_verkle.md Show resolved Hide resolved
spec/eofv0_verkle.md Outdated Show resolved Hide resolved
Copy link

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting results. The analysis is missing a gas estimate. But it does a priori have a positive impact on code complexity as well as code size.

From what I can see, using scheme 1 and a 64kb limit, the extra gas cost would be of 11200 which is too significant to be hidden in the 21000. But for a code size limit of 24kb, it's - acceptable at 6600.

The questions that remain are :

  1. Where to store that? Increasing the gas limit will pose a problem as there won't be enough space in the header, so a scheme needs to be devised.
  2. How to make it work during the transition, as there will be some code that will be translated and some code that will still be in legacy mode. My hunch is that it is possible to check if the header is available, and if not, revert to legacy.


Worst case encoding where each chunk contains an invalid `JUMPDEST`:
```
total_chunk_count = 24576 / 32 = 768
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to figure out what the numbers would be for a maximum code size of 64k

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It scales linearly, so same %.


Let's create a map of `invalid_jumpdests[chunk_no] = first_instruction_offset`. We can densely encode this
map using techniques similar to *run-length encoding* to skip distances and delta-encode offsets.
This map is always fully loaded prior to execution, and so it is important to ensure the encoded
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: see how much of those costs could be covered by the 21000 gas.


Encoding size: `7 skips (7 * 11 bits) + 9 values (9 * 11 bits)` = 22-bytes header (0.122%)

Our current hunch is that in average contracts this results in a sub-1% overhead, while the worst case is 4.1%.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good results, although I would like to see a full analysis, including of contracts that are close to the 24kb limit. And, ideally, of contracts with 64kb code size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: we will make a table with worst case values for code size limits of 24k, 32k and 64k.

It is possible to place above as part of the "EOFv0" header, but given the upper bound of number of chunks occupied is low (33 vs 21),
it is also possible to make this part of the Verkle account header.

This second option allows for the simplification of the `code_size` value, as it does not need to change.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "second option", you mean "adding it to the account header", not "Scheme 2", right ?

I don't see why there would be a difference with the other case though : in both cases, one needs to use the code size to skip the header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "second option", you mean "adding it to the account header", not "Scheme 2", right ?

Yes.

I don't see why there would be a difference with the other case though : in both cases, one needs to use the code size to skip the header.

No because I'd imagine the account header (i.e. not code leafs/keys) would be handled separately, so the actual EVM code remains verbatim.

#### Header location

It is possible to place above as part of the "EOFv0" header, but given the upper bound of number of chunks occupied is low (33 vs 21),
it is also possible to make this part of the Verkle account header.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but if we want to increase the maximum code size to 64k, there won't be enough space left for it in the header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With scheme 1 it is still 56 verkle leafs for 64k code in worst case. That should still easily fit into the 128 "special" first header leafs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we definitely need a variadic length of this section because the average case (1–2 chunks) is much different from the worst case (20–30 chunks). I.e. you don't want to reserve ~60 chunks in the tree just to use 2 on average.

- For skip-mode:
- 10-bit number of chunks to skip
- For value-mode:
- 6-bit `first_instruction_offset`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question that came up: there can be more than one entry per chunk, if there are more than one PUSHn in the chunk? Why not store just the overflowing one?

Copy link
Member

@chfast chfast May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "the overflowing one" means.
In the current version for a chunk what has any number of invalid jumpdests we store first instruction offset as in the "vanilla" verkle EIP. This requires to perform the jumpdest analysis on the chunk (as in the "vanilla" verkle).

There are some alternatives to first instruction offest but we currently aim for storing single number per chunk because this really binds the worst case.

@axic axic assigned axic and chfast May 6, 2024
#### Header location

It is possible to place above as part of the "EOFv0" header, but given the upper bound of number of chunks occupied is low (33 vs 21),
it is also possible to make this part of the Verkle account header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we definitely need a variadic length of this section because the average case (1–2 chunks) is much different from the worst case (20–30 chunks). I.e. you don't want to reserve ~60 chunks in the tree just to use 2 on average.

Arbitrum (2147-bytes long):
```
(chunk offset, chunk number, pushdata offset)
malicious push byte: 85 2 21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This analysis is wrong because we have to encode first instruction offset instead of first invalid jumpdest offset. I think we should remove this section or at least mark is as incorrect until I'll come with proper analysis.


Encoding size: `7 skips (7 * 11 bits) + 9 values (9 * 11 bits)` = 22-bytes header (0.122%)

Our current hunch is that in average contracts this results in a sub-1% overhead, while the worst case is 4.1%.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: we will make a table with worst case values for code size limits of 24k, 32k and 64k.


Since Solidity contracts have a trailing metadata, which contains a Keccak-256 (32-byte) hash of the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we went to keep this trivia?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have data now so this estimation is pointless. I actually checked and probability from all dataset is indeed ~12%.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually wasn't correct. The contract hash is not in the PUSH32 therefore doesn't count towards invalid jumpdests.

- For value-mode:
- 6-bit `first_instruction_offset`
- 7-bit number combining number of chunks to skip `s` and `first_instruction_offset`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants