Stateless mode (not working yet) #1311

AdamSpitz · 2022-11-22T01:38:14Z

NOT WORKING YET: New "stateless" mode.

The basic idea is that just before we're about to need some account or slot or block header, we prefetch it asynchronously. (e.g. Accounts and slots are fetched by using eth_getProof, then putting the fetched nodes into the database.)

You can start in this mode by running:

nimbus --sync-mode=stateless --stateless-data-source-url=https://mainnet.infura.io/whatever

(At which point nimbus-eth1 will wait for an eth2 client to feed it blocks to run via newPayload in the Engine API.)

Alternatively, you can run the block with a particular hash by running:

nimbus statelesslyRun --stateless-data-source-url=https://mainnet.infura.io/whatever --stateless-block-hash=1234ABCetc.

(This will simply fetch the block header corresponding to that hash from the data source, then run it, without bothering to wait for an eth2 client.)

I say "not working yet" because the state roots are still coming out wrong.

Also, block processing times are still much too slow to be practical: on the order of five minutes per block. We'll need to do precalculated witnesses or something like that in order to make this viable.

One more note: I've removed the concurrentAssemblers test that I implemented when I did the basic async EVM stuff; it was useful as a temporary way to exercise the async code, but the test was based on a very hacky idea and it was more trouble than it was worth.

This is a whole new Git branch, not the same one as last time (#1250) - there wasn't much worth salvaging. Main differences: I didn't do the "each opcode has to specify an async handler" junk that I put in last time. Instead, in oph_memory.nim you can see sloadOp calling asyncChainTo and passing in an async operation. That async operation is then run by the execCallOrCreate (or asyncExecCallOrCreate) code in interpreter_dispatch.nim. In the test code, the (previously existing) macro called "assembler" now allows you to add a section called "initialStorage", specifying fake data to be used by the EVM computation run by that test. (In the long run we'll obviously want to write tests that for-real use the JSON-RPC API to asynchronously fetch data; for now, this was just an expedient way to write a basic unit test that exercises the async-EVM code pathway.) There's also a new macro called "concurrentAssemblers" that allows you to write a test that runs multiple assemblers concurrently (and then waits for them all to finish). There's one example test using this, in test_op_memory_lazy.nim, though you can't actually see it doing so unless you uncomment some echo statements in async_operations.nim (in which case you can see the two concurrently running EVM computations each printing out what they're doing, and you'll see that they interleave). A question: is it possible to make EVMC work asynchronously? (For now, this code compiles and "make test" passes even if ENABLE_EVMC is turned on, but it doesn't actually work asynchronously, it just falls back on doing the usual synchronous EVMC thing. See FIXME-asyncAndEvmc.)

Also ditched the plain-data Vm2AsyncOperation type; it wasn't really serving much purpose. Instead, the pendingAsyncOperation field directly contains the Future.

It's not the right solution to the "how do we know whether we still need to fetch the storage value or not?" problem. I haven't implemented the right solution yet, but at least we're better off not putting in a wrong one.

(Based on feedback on the PR.)

There was some back-and-forth in the PR regarding whether nested waitFor calls are acceptable: #1260 (comment) The eventual decision was to just change the waitFor to a doAssert (since we probably won't want this extra functionality when running synchronously anyway) to make sure that the Future is already finished.

The basic idea is that just *before* we're about to need some account or slot or block header, we prefetch it asynchronously. (e.g. Accounts and slots are fetched by using eth_getProof, then putting the fetched nodes into the database.) You can start in this mode by running: nimbus --sync-mode=stateless --stateless-data-source-url=https://mainnet.infura.io/whatever (At which point nimbus-eth1 will wait for an eth2 client to feed it blocks to run via newPayload in the Engine API.) Alternatively, you can run the block with a particular hash by running: nimbus statelesslyRun --stateless-data-source-url=https://mainnet.infura.io/whatever --stateless-block-hash=1234ABCetc. (This will simply fetch the block header corresponding to that hash from the data source, then run it, without bothering to wait for an eth2 client.) I say "not working yet" because the state roots are still coming out wrong. Also, block processing times are still much too slow to be practical: on the order of five minutes per block. We'll need to do precalculated witnesses or something like that in order to make this viable. One more note: I've removed the concurrentAssemblers test that I implemented when I did the basic async EVM stuff; it was useful as a temporary way to exercise the async code, but the test was based on a very hacky idea and it was more trouble than it was worth.

zah · 2022-11-22T11:14:03Z

nimbus/db/accounts_cache.nim

  discard result.beginSavepoint

-proc init*(x: typedesc[AccountsCache], db: TrieDatabaseRef, pruneTrie: bool = true): AccountsCache =
-  init(x, db, emptyRlpHash, pruneTrie)
+proc statelessInit*(x: typedesc[AccountsCache], db: TrieDatabaseRef,


What's the difference between init and statelessInit here?

zah · 2022-11-22T11:21:35Z

nimbus/p2p/executor/process_transaction.nim

+    fork:    Fork): Result[GasInt,void]
+    # wildcard exception, wrapped below
+    {.gcsafe, raises: [Exception].} =
+  return waitFor(asyncProcessTransactionImpl(vmState, tx, sender, header, fork))


That's OK as long this is still work-in-progress, but you'll have to restore the proper sync version before merging this PR.

zah · 2022-11-22T11:26:53Z

nimbus/vm2/async/data_sources/on_demand.nim

@@ -0,0 +1,30 @@
+import


This should probably be called data_sources/json_rpc or data_sources/web3. Fluffy will add another on-demand data source which will probably be called portal.

zah · 2022-11-22T11:29:08Z

nimbus/vm2/async/data_fetching.nim

+  let h = chainDB.getBlockHash(blockNumber)
+  doAssert(h == header.blockHash, "stored the block header for block " & $(blockNumber))
+
+template raiseExceptionIfError[E](whatAreWeVerifying: untyped, r: Result[void, E]) =


This should probably be a generic proc to avoid the multiple evaluation of the r parameter.

zah · 2022-11-22T11:39:27Z

nimbus/vm2/async/rpc_api.nim

+# from ../../lc_proxy/validate_proof import getAccountFromProof
+
+
+var durationSpentDoingFetches*: times.Duration


Consider using the metrics library for this. This would allow visualizing the data in Grafana once we deploy long-running instances of Nimbus-eth1 on our "fleet" servers. As an example, here are some metrics from our Nimbus eth2 fleet:

https://metrics.status.im/d/pgeNfj2Wz23/nimbus-fleet-testnets?orgId=1&refresh=5m

Take a look at the "Block & Attestation Delay" panels at the bottom of the screen which are currently using histograms:

# at the module top-level scope declareHistogram data_fetching_duration, "Data fetch duration", buckets = [0.25, 0.5, 1, 2, 4, 8, Inf] # inside functions data_fetching_duration.observe(durationInSecondsAsFloat) # Usually based on `Moment.now()`

zah · 2022-11-22T11:43:37Z

nimbus/vm2/async/rpc_api.nim

+
+proc fetchBlockHeaderWithHash*(rpcClient: RpcClient, h: Hash256): Future[BlockHeader] {.async.} =
+  let t0 = now()
+  let r = request("eth_getBlockByHash", %[%h.prefixHex, %false], some(rpcClient))


request uses waitFor internally, which is not quite appropriate. Furthermore, all HTTP requests may hang forever unless they are guarded with a timeout. Study the usages of awaitWithTimeout from the nimbus-eth2 codebase.

zah · 2022-11-22T11:45:38Z

nimbus/vm2/interpreter/op_handlers/oph_blockdata.nim

-    let (blockNumber) = k.cpt.stack.popInt(1)
-    k.cpt.stack.push:
-      k.cpt.getBlockHash(blockNumber)
+    let cpt = k.cpt  # so it can safely be captured by the asyncChainTo closure below


Isn't this quite expensive to copy?
Can you explain the problem being solved in more detail? Perhaps we need to get back to the drawing board.

I'm a bit confused about what you're asking.

If you're asking why I'm saying let cpt = k.cpt, the problem I'm solving is that if I instead refer directly to k.cpt inside the asyncChainTo closure, I get error messages like this:

Error: 'k' is of type <var Vm2Ctx> which cannot be captured as it would violate memory safety, declared here: /home/adam/Projects/nimbus-eth1/nimbus/vm2/interpreter/op_handlers/oph_blockdata.nim(32, 32)

By doing what I did, I avoid the need for the closure to capture k. I don't actually need k (the Vm2Ctx) inside the asyncChainTo closure; I only need cpt (the Computation). (And this turns out to be true in all of the places where I've used this pattern.)

I expected this to be cheap; the Computation type is defined as a ref object, so I assumed that all that was being copied into the closure was a single reference. (But I didn't actually look at the C code to verify that.) Am I wrong?

AdamSpitz and others added 9 commits October 13, 2022 15:39

Merge branch 'master' into new-async-vm2

0cf5d2e

Moved the AsyncOperationFactory to the BaseVMState object.

a35f610

Made the AsyncOperationFactory into a table of fn pointers.

b22121f

Also ditched the plain-data Vm2AsyncOperation type; it wasn't really serving much purpose. Instead, the pendingAsyncOperation field directly contains the Future.

Removed the hasStorage idea.

d1666b0

It's not the right solution to the "how do we know whether we still need to fetch the storage value or not?" problem. I haven't implemented the right solution yet, but at least we're better off not putting in a wrong one.

Added/modified/removed some comments.

bee951c

(Based on feedback on the PR.)

Merged the latest master branch into the stateless code.

2413ec7

AdamSpitz requested review from zah, jangko and KonradStaniec November 22, 2022 01:38

Updated nim-eth branch from master to incomplete-tries.

c5272ee

AdamSpitz marked this pull request as draft November 22, 2022 10:54

zah reviewed Nov 22, 2022

View reviewed changes

AdamSpitz mentioned this pull request Apr 28, 2023

Stateless mode (not yet working!!!) #1570

Closed

AdamSpitz closed this Apr 28, 2023

AdamSpitz deleted the stateless branch April 28, 2023 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateless mode (not working yet) #1311

Stateless mode (not working yet) #1311

AdamSpitz commented Nov 22, 2022

zah Nov 22, 2022

zah Nov 22, 2022

zah Nov 22, 2022

zah Nov 22, 2022

zah Nov 22, 2022

zah Nov 22, 2022

zah Nov 22, 2022

AdamSpitz Nov 22, 2022

		# from ../../lc_proxy/validate_proof import getAccountFromProof


		var durationSpentDoingFetches*: times.Duration

Stateless mode (not working yet) #1311

Stateless mode (not working yet) #1311

Conversation

AdamSpitz commented Nov 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment