Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(anvil): add load/dump state options #3730

Merged

Conversation

mattsse
Copy link
Member

@mattsse mattsse commented Nov 21, 2022

Motivation

Closes #3665

this integrates the loadState dumpState RPC calls into the config and adds --load-state <Path>, --dump-state <Path> cli arguments.

@kahuang @fubhy wdyt?

Solution

@mattsse mattsse added T-feature Type: feature C-anvil Command: anvil labels Nov 21, 2022
@kahuang
Copy link

kahuang commented Nov 21, 2022

Looks good and works for our use-case (being able to gracefully stop/start anvil instances)

Not in scope, but calling out here: if anvil were to be shut down with (e.g. kill -9) we would have no state since the shutdown handler doesn't run. A real "db" with a wal etc. would be great for resilience in a cloud environment. But, re-iterating that this is probably not in scope for anvil

@fubhy
Copy link
Contributor

fubhy commented Nov 21, 2022

Good stuff! Looks like this would solve our use-case too. Thanks @mattsse!

@fubhy
Copy link
Contributor

fubhy commented Nov 21, 2022

A couple observations after giving this a quick test spin:

  1. I am not sure if two separate options (--dump-state vs. --load-state) are required. Did you choose to do that because it would allow "snapshot" use-cases that always start from the same (previously recorded) state but without overriding it? It might be useful but I have to admit I don't see where :-)

  2. It would be good to allow --load-state to silently fail if there is no stored state yet (directory is empty). Otherwise, we'd have to write a custom entrypoint script that sets that flag conditionally. So in addition to the point that @kahuang already raised (kill -9), the state would also not be preserved properly if the wrapping shell script, for whatever reason, doesn't properly handle and forward the shutdown signals.

  3. I think it woudl be better if state was written to disk continuously instead of just on shutdown. Agree with @kahuang that a proper db handler would be great for resilience. I'd personally say though that this would be in-scope for anvil (imho), but maybe not as part of this PR :)

@fubhy
Copy link
Contributor

fubhy commented Nov 21, 2022

  1. Instead of just retaining the latest state, we would also like to get full archive data. Do you see a chance for that to happen? Probably a different request than this but would be useful nonetheless.

@mattsse
Copy link
Member Author

mattsse commented Nov 22, 2022

thanks for this feedback, will update accordingly.

perhaps we can flush this every couple minutes or so?

@fubhy
Copy link
Contributor

fubhy commented Nov 22, 2022

Here's some background / context around my use-case:

I'm trying to put together a tightly integrated local development environment for front-to-back smart contract, data pipeline & dapp development needs.

Ideally, I'd like to end up with an AIO (all-in-one) service emulator running locally (similar to the likes of minikube, etc.) that lets you easily spin up and control all infrastructure components required for building a blockchain application end-to-end in a fault tolerant, fully observable, plug&play manner... With minimal configuration requrements & wiring of individual pieces. I'm dreaming of a fully replicable local development environment for all stakeholders across the stack (smart contract developers, data engineers, app developers, etc.) ...

Let's assume you are building a project from the grounds up (or are still iterating on all of these layers). In such a scenario, you want to ...

  1. Write your smart contract code
  2. Test it
  3. Generate bindings
  4. Write your substream / subgraph mappings
  5. Spin up a local RPC node and deploy your contracts
  6. Spin up your data services (e.g. firehose, substreams, subgraphs)
  7. Run a couple of scripts to seed some state
  8. Integrate the user interface on the frontend on top of the underlying contracts and the gathered data
  9. Repeat ...

Currently, setting up such an environment requires developers to think about a lot of infrastructure components and keep a lot of things in sync. And operate them locally! Map ports, connect things, know what to do if any of them fails, etc.

It still feels that all these layers aren't as tightly knitted together within this space as I think they could be. The developer experience at the intersection between smart contracts, data and frontend is brutal. Each team out there is still doing all of that custom wiring by themselves with varying degrees of success.


I want a tool stack that allows developers on all of these layers to seamlessly operate all these service dependencies locally with the ability to easily stop & restart them at will. All whilst preserving local state between restarts. Without them then having to deal with corrupted / out-of-sync state etc.

The problem really is the introduction of stateful components to the stack.

For instance, in case of Firehose, Substreams and Subgraphs it wouldn't be enough to periodically flush data from Anvil to disk to tick these boxes.

If we kill and restart the entire stack, and upon restart, there's data in the Firehose instrumentation that's now missing in Anvil, then we've got a broken state.

Furthermore, in order to allow re-indexing data that depends on eth calls on archive state, we'd need to preserve that too.


I know that I'm asking for a lot here and that this is probably out of scope for this first iteration, but I wanted to provide some context regardless :-).

@kahuang
Copy link

kahuang commented Nov 22, 2022

thanks for this feedback, will update accordingly.

perhaps we can flush this every couple minutes or so?

Configurable flush period feels like a nice middle ground!

@mattsse
Copy link
Member Author

mattsse commented Nov 23, 2022

@fubhy thanks so much for the additonal context.

this makes a ton of sense.

Could you please open another topic for 4) as this is beyond the scope of this PR?

@mattsse
Copy link
Member Author

mattsse commented Nov 23, 2022

after reading this again I still have a question re:

I am not sure if two separate options (--dump-state vs. --load-state) are required.

the reason why I used two separate flags is so you can load from one location and set a separate for persistent otherwise you'd end up overwriting this on dump.

But maybe it is easier to use only one argument for load+dump and always. That would make it easier to load and update the saved state, but harder to load and save it to another location.
So unsure what the best behavior here would be.

kill -9 is SIGKILL iirc, which you can't intercept. But perhaps you can handle this with a combination of trap and kill -- -$$ SIGINT SIGTERM EXIT, but idk trap or set -e in detail

@fubhy
Copy link
Contributor

fubhy commented Nov 23, 2022

after reading this again I still have a question re:

I am not sure if two separate options (--dump-state vs. --load-state) are required.

the reason why I used two separate flags is so you can load from one location and set a separate for persistent otherwise you'd end up overwriting this on dump.

Ganache has a --database option for that (https://trufflesuite.com/docs/ganache/reference/cli-options/#database). That was sufficient for me in the past and I've personally never had a use-case where I'd have wanted to specify read & write locations seperately. But I guess we could have both? An additional --state option that's simply an alias for both --dump-state and --load-state and, if there's no state store yet at that location, it would create it?

Actually I could see these two options being useful if you want to roll back to a previously stored state snapshot between restarts... ? So you initialize the db from a backup location and afterwards dump it elsewhere or choose to not dump it at all. Could be useful.

Like... A shared developer instance for your team to test stuff with... Hosted somewhere. Resets every 24 hours or so?

@mattsse
Copy link
Member Author

mattsse commented Nov 23, 2022

An additional --state option that's simply an alias for both --dump-state and --load-state and, if there's no state store yet at that location, it would create it?

I like this, let's do that,

@mslipper
Copy link

This would be extremely helpful for us at Optimism. For context, we're in the process of prepping for our Bedrock migration, and we'd like to run a long-lived, public fork of Goerli so forks can test things out before doing the migration for real. Since this fork would be long-lived and will see a bunch of transaction volume, it's important that data be persisted to disk rather stored in memory.

We need the configurability here so that we can provision Kubernetes PVCs for state to be stored on.

@fubhy
Copy link
Contributor

fubhy commented Nov 30, 2022

This would be extremely helpful for us at Optimism. For context, we're in the process of prepping for our Bedrock migration, and we'd like to run a long-lived, public fork of Goerli so forks can test things out before doing the migration for real. Since this fork would be long-lived and will see a bunch of transaction volume, it's important that data be persisted to disk rather stored in memory.

We need the configurability here so that we can provision Kubernetes PVCs for state to be stored on.

That sounds like you'd also need full persistence though (all data, not just latest state).

@mattsse
Copy link
Member Author

mattsse commented Nov 30, 2022

@fubhy --state should work now,

I've set the interval to 60s but perhaps we make this configurable or decrease?

Update: added --state-interval option

@mattsse mattsse force-pushed the matt/add-options-for-load-dump-state branch from 9f26b7c to 693dec6 Compare November 30, 2022 14:38
Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it and we can iterate. Foundryup will have this tonight, or you can foundryup -b master. I'm triggering a docker build rn which will be avail in ghr

@gakonst gakonst merged commit a43313a into foundry-rs:master Nov 30, 2022
@fubhy
Copy link
Contributor

fubhy commented Nov 30, 2022

Sweet. Thanks @mattsse!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-anvil Command: anvil T-feature Type: feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Be able to read data from disk between restarts
5 participants