Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-3540: EVM Object Format (EOF) v1 #3540

Merged
merged 5 commits into from
May 14, 2021
Merged

EIP-3540: EVM Object Format (EOF) v1 #3540

merged 5 commits into from
May 14, 2021

Conversation

axic
Copy link
Member

@axic axic commented Apr 28, 2021

No description provided.

@alita-moore
Copy link
Contributor

Hi! I'm a bot, and I wanted to automerge your PR, but couldn't because of the following issue(s):

 - File with name EIPS/eip-3540.md is new and new files must be reviewed

@chriseth
Copy link
Contributor

Obsoletes EIP-2327 - not sure if we have a standardized field for that.

EIPS/eip-3540.md Outdated

The format described in this EIP introduces a simple and extensible container with a minimal set of changes required to both clients and languages, and introduces validation.

The first tangible feature it provides is separation of code and data. This separation is especially beneficial for on-chain code validators (like those utilised by layer-2 scaling tools, such as Optimism), because they can more efficiently validate contracts (for example by skipping the data section). This can result in significant gas savings. Additionally, various (static) analysis tools can also benefit, though off-chain tools can already deal with existing code, so the impact is smaller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the main benefit is not the efficiency of the validation but the fact that such systems do not need to manually deactivate the "store data in code" feature of a compiler and especially that they can use contracts that have constructors with parameters (constructor arguments have to be stored in the init code). If contracts contain data and the validation routine cannot distinguish code and data, there is always the risk of data failing the validation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, please check.

EIPS/eip-3540.md Outdated Show resolved Hide resolved
@MicahZoltu
Copy link
Contributor

Obsoletes EIP-2327 - not sure if we have a standardized field for that.

2327 is a draft, not a final EIP, so it should NOT be referenced in a new EIP.


## Motivation

On-chain deployed EVM bytecode contains no pre-defined structure today. Code is typically validated in clients to the extent of `JUMPDEST` analysis at runtime, every single time prior to execution. This poses not only an overhead, but also a challenge for introducing new or deprecating old features. [This initial proposal](https://notes.ethereum.org/@axic/evm-object-format) explains some of these challenges.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External links are strongly discouraged. Please inline the relevant content here rather than linking externally to it.

Alternatively, consider including it as an asset in ../assets/eip-3540/evm-object-format.pdf or something.

A non-exhaustive list of future improvements which could benefit from this format:
- Including a `JUMPDEST`-table (to avoid analysis at execution time) or removing `JUMPDEST`s entirely.
- Introducing static jumps (with relative addresses) and jump tables, and disallowing dynamic jumps at the same time.
- Requiring code section(s) to be terminated by `STOP`. (Assumptions like this can provide significant speed improvements in interpreters, such as a speed up of ~7% seen in [evmone](https://github.com/ethereum/evmone/pull/295).)
Copy link
Contributor

@MicahZoltu MicahZoltu Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove external link. Recommend just removing the entire second sentence as a specific example like this isn't necessary for an EIP and is better suited for external documentation/blog posts/discussions.

Suggested change
- Requiring code section(s) to be terminated by `STOP`. (Assumptions like this can provide significant speed improvements in interpreters, such as a speed up of ~7% seen in [evmone](https://github.com/ethereum/evmone/pull/295).)
- Requiring code section(s) to be terminated by `STOP`.

- Requiring code section(s) to be terminated by `STOP`. (Assumptions like this can provide significant speed improvements in interpreters, such as a speed up of ~7% seen in [evmone](https://github.com/ethereum/evmone/pull/295).)
- Multi-byte opcodes without any workarounds.
- Representing functions as individual code sections instead of subroutines.
- Introducing a specific section for the [EIP-2938 Account Abstraction](./eip-2938.md) "top-level AA execution frame", simplifying the proposal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend removing the link to 2938 as it will block this EIP from becoming final until 2938 is final. Since this is just the motivation section, no link is necessary and you can just say:

Suggested change
- Introducing a specific section for the [EIP-2938 Account Abstraction](./eip-2938.md) "top-level AA execution frame", simplifying the proposal.
- Introducing a specific section for Account Abstraction "top-level AA execution frame".


*We use [RFC2119](https://tools.ietf.org/html/rfc2119) keywords in this section.*

In order to guarantee that every EOF-formatted contract in the state is valid, we need to prevent already deployed (and not validated) contracts from being recognized as such format. This will be achieved by choosing a byte sequence for the *magic* that doesn't exist in any of the already deployed contracts. To prevent the growth of the search space and to limit the analysis to the contracts existing before these changes, we split changes into two hard forks, and disallow the starting byte of the format (the first byte of the magic) in the first hard fork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is motivational, and should be moved to that section of the EIP rather than the specification section. That being said, this feels a bit redundant to what is already in the motivation section so consider just dropping it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this looks like it is saying that this EIP has a hard dependency on EIP-3541. Recommend removing the backlink from 3541 to this EIP, and making this EIP depend on 3541. Then you can just delete this whole paragraph.

Comment on lines +40 to +57
### First hard fork

After `block.number == HF1_BLOCK` new contract creation (via create transaction, `CREATE` or `CREATE2` instructions) results in an exceptional abort if the _code_'s first byte is `0xEF`.

#### Remarks

*For purely reference purposes we call the `0xEF` byte the `FORMAT`.*

The *initcode* is the code executed in the context of the *create* transaction, `CREATE`, or `CREATE2` instructions. The *initcode* returns *code* (via the `RETURN` instruction), which is inserted into the account. See section 7 ("Contract Creation") in the Yellow Paper for more information.

The opcode `0xEF` is currently an unimplemented instruction, therefore: *It pops no stack items and pushes no stack items, and it causes an exceptional abort when executed.* This means *initcode* or already deployed *code* starting with this instruction will continue to abort execution.

### Between the two hard forks

Next step to be conducted manually: Once the first hard fork went live, all existing contracts at `block.number == HF1_BLOCK` having their first byte as the `FORMAT` are examined off-chain. The goal is to find the shortest sequence after the `FORMAT` not matching any existing contract. We expect a 1- or 2-byte sequence would suffice. We call this byte sequence the *magic*, which will be used in the second hard fork.

### Second hard fork

Copy link
Contributor

@MicahZoltu MicahZoltu Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EIPs do not concern themselves with the hardfork coordination process. This EIP should just have a dependency on 3541 and otherwise not bring up the "first hardfork" at all. You may want to put a note in the backward compatibility section mentioning that there is value in having a delay between the introduction of 3541 and this EIP, but extended discussion about the migration process and timing should be left out of the EIP and put in the discussions-to link or some other external source.

Suggested change
### First hard fork
After `block.number == HF1_BLOCK` new contract creation (via create transaction, `CREATE` or `CREATE2` instructions) results in an exceptional abort if the _code_'s first byte is `0xEF`.
#### Remarks
*For purely reference purposes we call the `0xEF` byte the `FORMAT`.*
The *initcode* is the code executed in the context of the *create* transaction, `CREATE`, or `CREATE2` instructions. The *initcode* returns *code* (via the `RETURN` instruction), which is inserted into the account. See section 7 ("Contract Creation") in the Yellow Paper for more information.
The opcode `0xEF` is currently an unimplemented instruction, therefore: *It pops no stack items and pushes no stack items, and it causes an exceptional abort when executed.* This means *initcode* or already deployed *code* starting with this instruction will continue to abort execution.
### Between the two hard forks
Next step to be conducted manually: Once the first hard fork went live, all existing contracts at `block.number == HF1_BLOCK` having their first byte as the `FORMAT` are examined off-chain. The goal is to find the shortest sequence after the `FORMAT` not matching any existing contract. We expect a 1- or 2-byte sequence would suffice. We call this byte sequence the *magic*, which will be used in the second hard fork.
### Second hard fork


### Second hard fork

In this fork we introduce _code validation_ for new contract creation. To achieve this, we define a format called EVM Object Format (EOF), containing a version indicator, and a ruleset of validity tied to a given version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this fork we introduce _code validation_ for new contract creation. To achieve this, we define a format called EVM Object Format (EOF), containing a version indicator, and a ruleset of validity tied to a given version.
We define a format called EVM Object Format (EOF), containing a version indicator, and a ruleset of validity tied to a given version.


In this fork we introduce _code validation_ for new contract creation. To achieve this, we define a format called EVM Object Format (EOF), containing a version indicator, and a ruleset of validity tied to a given version.

At `block.number == HF2_BLOCK` new contract creation is modified:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At `block.number == HF2_BLOCK` new contract creation is modified:
As of FORK_BLOCK_NUMBER, new contract creation behaves as follows:

Copy link
Contributor

@MicahZoltu MicahZoltu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of comments related to the HF1/HF2 situation. I'm a huge fan of breaking this change up into multiple EIPs, and you may even want to consider breaking it up further into having a generalized contract versioning EIP and another for this first new contract version. However, this EIP should just reference its dependency but otherwise should ignore the hard fork coordination process in this EIP outside of maybe a one-line mention in backward compatibility section.


At `block.number == HF2_BLOCK` new contract creation is modified:
- if *initcode* or *code* starts with the *EOF prefix* (*TBD*), it is considered to be EOF formatted and will undergo validation specified in the following sections,
- if *code* starts with `0xEF`, creation continues to result in an exceptional abort (the rule introduced in HF1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- if *code* starts with `0xEF`, creation continues to result in an exceptional abort (the rule introduced in HF1),
- if *code* starts with `0xEF`, creation continues to result in an exceptional abort,

- if *code* starts with `0xEF`, creation continues to result in an exceptional abort (the rule introduced in HF1),
- otherwise code is considered *legacy code* and the following rules do not apply to it.

Note that the *EOF prefix* here means the to be determined combination of `FORMAT` and *magic*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EIPs shouldn't have TBD in them in prose text. Recommend writing this as though everything is final, and you can always change it up until it actually becomes final.

Suggested change
Note that the *EOF prefix* here means the to be determined combination of `FORMAT` and *magic*.
EOF_PREFIX: 0xEFCAFEBEEF

(just pick something and include it, you can change it later after discussion happens on it)

| description | length | value | |
|-------------|---------|-------|-|
| format | 1-byte | 0xEF | |
| magic | n-byte(s) | TBD | n >= 0 (zero in the best case) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, just pick something and run with it rather than using placeholders like TBD. You can always change it later. One exception to this is FORK_BLOCK_NUMBER, which doesn't ever get filled in (since that is part of the hardfork coordination process, rather than the standardization process).


To summarise, the bytecode has the following format:
```
format, magic, version, (section_kind, section_size)+, 0, <section contents>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a great summary to include in the ## Abstract.

Comment on lines +122 to +123
4. `PC` returns the current position within the *container*.
5. `JUMP`/`JUMPI` uses an absolute offset within the *container*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is PC and JUMP container absolute rather than code relative? Given that neither can point outside of the code, it seems like this will introduce more complexity because now you have to know the offset of the code within the container in order to properly construct the JUMPs and during execution as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a debatable point. The argument for PC/JUMP/JUMPI being relative to container start is that this keeps it consistent with the other opcodes (codecopy etc.). The downside is that an assembler needs to determine the header size before filling in the final jump targets. Another argument for keeping it as it is here is that it might be a smaller change to interpreter implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it is kept this way is to reduce the impact on client implementations. The only change really is what section to run jumpdest analysis on, what is the starting PC and what is the "end of code" (i.e. the first 3 rules).

Arguably it is easier to correctly change 2-3 assemblers, than changing all the clients correctly.

I personally was in favour of changing it already to use the code section only, but means we need to introduce DATACOPY in this version, which is a lot of more added complexity for little benefit.

- In future, this allows serializing `JUMPDEST` map in the EOF container and eliminate the need of implicit `JUMPDEST` analysis required before execution.
- Or completely remove the need for `JUMPDEST` instructions.
- This helps with deprecating EVM instructions and/or features.
- The biggest disadvantage is that deploy-time validation of EOF code must be enabled in two hard-forks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the multi-hard fork mentions, as they are out of scope for this specification and in scope for hard fork coordination discussion.

Suggested change
- The biggest disadvantage is that deploy-time validation of EOF code must be enabled in two hard-forks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard fork coordination complexities can affect the design desicions of the spec, can't they?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose since this is the rationale section this line is probably fine to leave. I still standby that the HF coordination stuff should be removed from the specification, and it should be limited in the rationale. The key thing to remember is that EIPs are technical specifications designed to be read by future developers and they shouldn't overly focus on the discussions of today, but instead around how to make the specification understandable to developers of the future.


The alternative is to have execution time validation for EOF. This is performed every single time a contract is executed, however clients may be able to cache validation results. This approach has the following properties:
- Because the validation is consensus-level execution step, it means the execution always requires the entire code. This makes code merkleization impractical.
- The main advantage is it can be enabled via a single hard-fork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The main advantage is it can be enabled via a single hard-fork.


### Initcode vs code

After HF1 and before HF2 the `FORMAT` first byte check only applies to _code_. Applying the rule also to _initcode_ would have the same effects because is not distinguishable from executing _initcode_: if it starts with `FORMAT` execution would exceptionally abort. Yet, we decided against introducing an explicit check for _initcode_ for the simplicity of the HF1 spec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this EIP should be written standalone and without regard to the hardfork coordination process.


### The FORMAT byte

The `0xEF` byte was chosen because it resembles **E**xecutable **F**ormat. It has a long history of being proposed for this use case, starting with [this](https://github.com/ethereum/EIPs/issues/154).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `0xEF` byte was chosen because it resembles **E**xecutable **F**ormat. It has a long history of being proposed for this use case, starting with [this](https://github.com/ethereum/EIPs/issues/154).
The `0xEF` byte was chosen because it resembles **E**xecutable **F**ormat.

Don't include external links. Also, having a long history of being proposed for a case doesn't really mean anything so I recommend just leaving that bit out entirely.

We have considered different questions for the sections:
- Streaming headers (i.e. `section_header, section_data, section_header, section_data, ...`) are used in some other formats (such as WebAssembly). They are handy for formats which are subject to editing (adding/removing sections). That is not a useful feature for EVM. One minor benefit applicable to our case is they do not require a specific "header terminator".
- Whether to have a header terminator or to encode `number_of_sections` or `total_size_of_headers`. Both of the latter raise the question how large of a value these fields should be able to hold. While today there will be only two sections, in case each "EVM function" would become their section, a fixed 8-bit field may not be big enough. A terminator byte seems to avoid these problems.
- Whether to encode `section_size` as a fixed 16-bit value or some kind of variable length field (e.g. [LEB128](https://en.wikipedia.org/wiki/LEB128)). We have opted for fixed size, because it simplifies client implementations, and 16-bit seems enough, because of the currently exposed code size limit of 24576 bytes (see [EIP-170](./eip-170.md) and [EIP-2677](./eip-2677.md)). Should this be limiting in the future, a new EOF version could change the format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Whether to encode `section_size` as a fixed 16-bit value or some kind of variable length field (e.g. [LEB128](https://en.wikipedia.org/wiki/LEB128)). We have opted for fixed size, because it simplifies client implementations, and 16-bit seems enough, because of the currently exposed code size limit of 24576 bytes (see [EIP-170](./eip-170.md) and [EIP-2677](./eip-2677.md)). Should this be limiting in the future, a new EOF version could change the format.
- Whether to encode `section_size` as a fixed 16-bit value or some kind of variable length field: We have opted for fixed size, because it simplifies client implementations, and 16-bit seems enough, because of the currently exposed code size limit of 24576 bytes (see [EIP-170](./eip-170.md) and [EIP-2677](./eip-2677.md)). Should this be limiting in the future, a new EOF version could change the format.

Also note: This creates a dependency on EIP-2677. This may be fine, but consider removing the reference just to be safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure why rationale/motivation would create any dependency when linking to other EIPs. The only dependency on other EIPs happens when referral is done in the specification.

This has been the case for many EIPs so far and I don't see a good reason why gutting everything is a good approach. I appreciate that goal of slimming specifications, but I don't think in practice it has worked out to find all relevant sources of information if external links are not permitted. Kind of catch 22.

@axic
Copy link
Member Author

axic commented Apr 30, 2021

@MicahZoltu thank you for the comprehensive review. We have debated this even before publishing this draft and our conclusion is that we would very much like to merge it in its current form with the long explainer and two hard forks. We expect to substantially simplify this EIP once 3541 is agreed on, but for now we feel this explainer is very much needed to explain the motivation and process.

We did consider that the better approach would have been writing a long explainer and two shorter EIPs, but due to the time pressure on London, I think this is a good solution for the moment. After all 3540 is not proposed for London, so there's plenty of time to simplify this.

@MicahZoltu
Copy link
Contributor

We expect to substantially simplify this EIP once 3541 is agreed on, but for now we feel this explainer is very much needed to explain the motivation and process.

@axic It sounds like what you want is a HackMD where you can have more free form iteration on this idea and a place to present future plans while pushing through EIP-3541. Account Abstraction did this, and I feel it worked out quite well as they were able to lay out their roadmap while only the first step in the series was an actual EIP.

@axic
Copy link
Member Author

axic commented May 10, 2021

I'm not sure I agree with that sentiment, this EIP describes the proposal well and I think it is good enough for draft status. We do not actually expect any material changes to this, with the exception of potentially improving the readability.

I do not think having a "random hackmd URL" is beneficial here. The whole point of the draft status in EIPs is to have an easy to access URL for tracking discussion of a proposal. Getting from review to final is a much stricter process, but becoming a draft did not had such strict requirements in the past.

@axic axic marked this pull request as ready for review May 10, 2021 13:39
@axic
Copy link
Member Author

axic commented May 10, 2021

The whole point of the draft status in EIPs is to have an easy to access URL for tracking discussion of a proposal.

A more detailed answer to this is here: #3541 (comment)

@axic axic changed the title Add first draft of EOF1 EVM Object Format (EOF) v1 May 10, 2021
@axic axic changed the title EVM Object Format (EOF) v1 EIP-3540: EVM Object Format (EOF) v1 May 10, 2021
@MicahZoltu
Copy link
Contributor

I still stand by all of my comments and consider this EIP as currently written to not be of appropriate quality to merge as a draft, mostly due to all of the external links/references. However, there is disagreement as to whether the EIP process should be a change control process or a standardization process, and it seems a number of core developers want it to be a change control process. Given that, I'm going to start merging EIPs with far less editorial review so they can be used more generally as a central document to record things related to proposed changes, rather than as a place to define standards.

@MicahZoltu MicahZoltu merged commit 620a985 into ethereum:master May 14, 2021
@axic
Copy link
Member Author

axic commented May 14, 2021

I really appreciate your review @MicahZoltu and hopefully you will engage when the time comes (i.e. getting closer to the Review state). However I am still super puzzled what you mean when you say this is not of the appropriate quality or that it is violating rules. I have thoroughly reviewed https://eips.ethereum.org/EIPS/eip-1#what-belongs-in-a-successful-eip again and it seems every section matches the description of that.

Can you please show me what rules you are referring to? Where are they described and what are they?

@axic axic deleted the eof1 branch May 14, 2021 12:02
@axic axic restored the eof1 branch May 14, 2021 12:03
@maurelian
Copy link
Contributor

@axic, afaict this is the main issue @MicahZoltu has:

mostly due to all of the external links/references

(Only jumping in here because I would love to see this go ahead).

@axic axic deleted the eof1 branch July 16, 2021 22:15
@axic axic mentioned this pull request Aug 20, 2021
PhABC pushed a commit to PhABC/EIPs that referenced this pull request Jan 25, 2022
We introduce an extensible and versioned container format for the EVM with a once-off validation at deploy time. The version described here brings the tangible benefit of code and data separation, and allows for easy introduction of a variety of changes in the future. It is introduced via two hard forks to avoid breaking any existing executable contracts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

6 participants