Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add anvil_dumpState and anvil_loadState #2256

Merged
merged 21 commits into from Jul 12, 2022
Merged

Conversation

dbeal-eth
Copy link
Contributor

@dbeal-eth dbeal-eth commented Jul 10, 2022

Motivation

Cannon is a packaging and deployment system for EVM protocols. One of the core tenants of its operation requires saving snapshots of the state of a chain (similar to how a Docker image saves snapshots of fs layers) and using them to transmit and recreate in a new EVM dev instance. It is not convenient, reliable, or performant to get the chain state through individual RPC calls. Therefore, a dedicated command is necessary.

Solution

Create new commands anvil_dumpState and anvil_loadState.

  • anvil_dumpState takes no arguments. Upon invocation, returns a large hex string which can be loaded into a separate/restarted anvil instance in order to reattain the entire chain state. Only supported on standalone (ie non-fork) networks.
  • anvil_loadState takes one argument, a hex string previously dumped with anvil_dumpState. Upon invocation, merges the specified state into the currently active network. The merge functionality is very important. Cannon will regularly clear the state using evm_revert and load a sequence of states of separate contracts. States should also be merged within the same account. It is the responsibility of the RPC client to verify there are no storage collisions.

Limitations

I originally wanted to have the returned data compressed, but no compression library is included in any of foundry's packages. If you are OK with a compression package being added to the cargo dependencies, please LMK and the data will be deflated before being returned or something.

I have not done any testing to verify what the maximum size the RPC response is. However, I have run tests with blobs that ended up being ~5MB and there were no problems. If you happen to know the answer to this question (for example if log calls) please share.

Additional Info

There is a companion PR open for the same feature on Hardhat network NomicFoundation/hardhat#2425 . @alcuadrado has confirmed this will be supported after they are done with some refactoring on their end. The hardhat PR will be updated by me to match the standard set by this PR.

Am I supposed to add tests for this? If so, please LMK where the recommended place to add the tests is.

@onbjerg onbjerg added the T-feature Type: feature label Jul 10, 2022
Copy link
Member

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise looks good to me, I don't really have an opinion on the feature itself so I will defer to @gakonst and @mattsse on that. Going by the code here, it seems that Hardhat has this functionality? If so, seems OK to include it in Anvil, too.

less than the maximum can lead to evm storage collisions
@tynes
Copy link
Contributor

tynes commented Jul 11, 2022

It would be cool if this was compatible with geth's debug_dumpBlock - https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/eth/api.go#L258

It looks like its not 100% the same, but pretty close, this is what is returned by geth - https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/state/dump.go#L64

@dbeal-eth
Copy link
Contributor Author

dbeal-eth commented Jul 11, 2022

It would be cool if this was compatible with geth's debug_dumpBlock - https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/eth/api.go#L258

It looks like its not 100% the same, but pretty close, this is what is returned by geth - https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/state/dump.go#L64

thats pretty cool that something related to this exists on geth already. What happens if the code_hash is only provided but the code is missing? same for root (which I assume corresponds to the merkle root of the storage). If these are fine, can make the few changes to match it up 👍

Does geth have a corresponding load function as well, or is it just the dump for debugging?

Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this,

overall this looks great. only a couple of smol style nits.

not familiar with debug_dumpBlock so unsure whether it's possible to unify?

anvil/src/eth/backend/db.rs Outdated Show resolved Hide resolved
anvil/src/eth/backend/db.rs Outdated Show resolved Hide resolved
anvil/src/eth/backend/mem/in_memory_db.rs Outdated Show resolved Hide resolved
anvil/src/eth/backend/mem/mod.rs Outdated Show resolved Hide resolved
anvil/src/eth/backend/mem/mod.rs Outdated Show resolved Hide resolved
}

/// Deserialize and add all chain data to the backend storage
pub fn load_state(&self, buf: Bytes) -> Result<bool, BlockchainError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question, why does this accept Bytes and not SerializableState directly?

Copy link
Contributor Author

@dbeal-eth dbeal-eth Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is that the data that is received by the client through the backend node is an opaque snapshot and shouldn't be manipulated directly. The original thought was that at this level, the JSON text would be compressed somehow (using deflate or something). However, it turned out that there was no compression library in anvil, so it doesn't try to compress anything.

If you think it would be better to just supply the SerializedState directly to the client (and vise versa) please lmk. But assuming that is not the case, the backend is the most logical location to do the conversion from a serializable state to opaque bytes

anvil/src/eth/backend/mem/mod.rs Outdated Show resolved Hide resolved
@tynes
Copy link
Contributor

tynes commented Jul 11, 2022

It would be cool if this was compatible with geth's debug_dumpBlock - https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/eth/api.go#L258
It looks like its not 100% the same, but pretty close, this is what is returned by geth - https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/state/dump.go#L64

thats pretty cool that something related to this exists on geth already. What happens if the code_hash is only provided but the code is missing? same for root (which I assume corresponds to the merkle root of the storage). If these are fine, can make the few changes to match it up +1

Does geth have a corresponding load function as well, or is it just the dump for debugging?

It returns the empty values for the corresponding hashes, this is an example:

    "0xdd2fd4581271e230360230f9337d5c0430bf44c0": {
      "balance": "904625697166532776746648320380374280103671755200316906558262375061821325312",
      "codeHash": "0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470",
      "key": "0x978cc91d914c8ab8b2703515a2b31a631baf8f97ec7fada3a16966332fe9e35f",
      "nonce": 0,
      "root": "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421"
    },

You can see the constant values here: https://github.com/ethereumjs/ethereumjs-util/blob/master/src/constants.ts

There is no load state functionality in geth besides on startup with geth init which takes a serialized genesis struct - https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/genesis.go#L49

Do you need to be able to load state dynamically or once at start up?

edit: after re-reading above, it sounds like you load state and then revert and load state again to do another test

dbeal-eth and others added 3 commits July 11, 2022 11:21
Co-authored-by: Matthias Seitz <matthias.seitz@outlook.de>
* move serializablestate down, derive default
* split dump_state into 2 steps in inmemorydb
* better serialize/deserialize function helpers
@dbeal-eth
Copy link
Contributor Author

dbeal-eth commented Jul 11, 2022

It returns the empty values for the corresponding hashes, this is an example:

    "0xdd2fd4581271e230360230f9337d5c0430bf44c0": {
      "balance": "904625697166532776746648320380374280103671755200316906558262375061821325312",
      "codeHash": "0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470",
      "key": "0x978cc91d914c8ab8b2703515a2b31a631baf8f97ec7fada3a16966332fe9e35f",
      "nonce": 0,
      "root": "0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421"
    },

You can see the constant values here: https://github.com/ethereumjs/ethereumjs-util/blob/master/src/constants.ts

There is no load state functionality in geth besides on startup with geth init which takes a serialized genesis struct - https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/genesis.go#L49

Do you need to be able to load state dynamically or once at start up?

edit: after re-reading above, it sounds like you load state and then revert and load state again to do another test

so the data structure definitely isnt the same for genesis init, and in their def they take in code rather than code_hash https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/genesis.go#L159

so I g uess my concern is that the dump is kind of useless if it only has the code_hash instead of the actual code, unless the node happens to have the exact code in its db already it will fail to load the contract and who knows what. So that may be the blocker to supporting the geth format, if it dumps code_hash without the code (as seems to be the case.

@tynes
Copy link
Contributor

tynes commented Jul 11, 2022

so I g uess my concern is that the dump is kind of useless if it only has the code_hash instead of the actual code

The example I showed above was the result of debug_dumpBlock for an account with no code, the code key isn't serialized in the json. When an account does have code, it will have a code key, see here: https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/state/dump.go#L56

@dbeal-eth
Copy link
Contributor Author

so I g uess my concern is that the dump is kind of useless if it only has the code_hash instead of the actual code

The example I showed above was the result of debug_dumpBlock for an account with no code, the code key isn't serialized in the json. When an account does have code, it will have a code key, see here: https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/state/dump.go#L56

I see now that the code hash there is the hash of the empty buffer, which makes sense now. So the code of a contract can be guaranteed to be set if it is has code, and I assume same goes for storage.

Do you happen to have a small sample dump handy? If not, I will try and get a dump from my local eth node tomorrow or something

Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

good to merge @dbeal-eth ?

it'd be great if you could add them to the anvil reference in https://github.com/foundry-rs/book as well

@mattsse mattsse added the C-anvil Command: anvil label Jul 12, 2022
db.set_storage_at(test_addr, "0x1234567".into(), "0x1".into());
db.set_storage_at(test_addr, "0x1234568".into(), "0x2".into());

let mut new_state = SerializableState::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut new_state = SerializableState::new();
let mut new_state = SerializableState::default();

@tynes
Copy link
Contributor

tynes commented Jul 12, 2022

so I g uess my concern is that the dump is kind of useless if it only has the code_hash instead of the actual code

The example I showed above was the result of debug_dumpBlock for an account with no code, the code key isn't serialized in the json. When an account does have code, it will have a code key, see here: https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/state/dump.go#L56

I see now that the code hash there is the hash of the empty buffer, which makes sense now. So the code of a contract can be guaranteed to be set if it is has code, and I assume same goes for storage.

Do you happen to have a small sample dump handy? If not, I will try and get a dump from my local eth node tomorrow or something

Here is a link to a dump that came from a local devnet of optimism bedrock - https://gist.github.com/tynes/0cae7ac93e266cd92ad5aa4c37ec95db

@dbeal-eth
Copy link
Contributor Author

lgtm

good to merge @dbeal-eth ?

it'd be great if you could add them to the anvil reference in https://github.com/foundry-rs/book as well

yes, lets go ahead and merge!

compile issue fixed

book PR foundry-rs/book#447

so I g uess my concern is that the dump is kind of useless if it only has the code_hash instead of the actual code

The example I showed above was the result of debug_dumpBlock for an account with no code, the code key isn't serialized in the json. When an account does have code, it will have a code key, see here: https://github.com/ethereum/go-ethereum/blob/2697e44d819377e39a781e5ab9f1814426b4b0f0/core/state/dump.go#L56

I see now that the code hash there is the hash of the empty buffer, which makes sense now. So the code of a contract can be guaranteed to be set if it is has code, and I assume same goes for storage.
Do you happen to have a small sample dump handy? If not, I will try and get a dump from my local eth node tomorrow or something

Here is a link to a dump that came from a local devnet of optimism bedrock - https://gist.github.com/tynes/0cae7ac93e266cd92ad5aa4c37ec95db

will test running the import with this later and any change is needed will probably submit a follow up PR.

@gakonst
Copy link
Member

gakonst commented Jul 12, 2022

Integration test failure unrelated, fixed in #2286

@gakonst gakonst merged commit b02dcd2 into foundry-rs:master Jul 12, 2022
@ngotchac
Copy link
Contributor

Quick question: why leave forked nodes out of this? IMO it would be great to have this for forked node, so that we can easily save, share and restore a forked network with modified state. Maybe I'm missing something though 🤔

@dbeal-eth
Copy link
Contributor Author

Quick question: why leave forked nodes out of this? IMO it would be great to have this for forked node, so that we can easily save, share and restore a forked network with modified state. Maybe I'm missing something though thinking

while it is technically possible to save a forked state, I thought I would leave this change to a separate PR because it requires some special handling (ex. in order for the forked state to be guaranteed consistent, it must be forked from exactly the same network and the same block, except when new contracts are being merged into the network, where its not necessarily necessary). The primary inspiration/use case stated above only currently requires local node dump/reload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-anvil Command: anvil T-feature Type: feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants