-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stateless mode (not working yet) #1311
Conversation
This is a whole new Git branch, not the same one as last time (#1250) - there wasn't much worth salvaging. Main differences: I didn't do the "each opcode has to specify an async handler" junk that I put in last time. Instead, in oph_memory.nim you can see sloadOp calling asyncChainTo and passing in an async operation. That async operation is then run by the execCallOrCreate (or asyncExecCallOrCreate) code in interpreter_dispatch.nim. In the test code, the (previously existing) macro called "assembler" now allows you to add a section called "initialStorage", specifying fake data to be used by the EVM computation run by that test. (In the long run we'll obviously want to write tests that for-real use the JSON-RPC API to asynchronously fetch data; for now, this was just an expedient way to write a basic unit test that exercises the async-EVM code pathway.) There's also a new macro called "concurrentAssemblers" that allows you to write a test that runs multiple assemblers concurrently (and then waits for them all to finish). There's one example test using this, in test_op_memory_lazy.nim, though you can't actually see it doing so unless you uncomment some echo statements in async_operations.nim (in which case you can see the two concurrently running EVM computations each printing out what they're doing, and you'll see that they interleave). A question: is it possible to make EVMC work asynchronously? (For now, this code compiles and "make test" passes even if ENABLE_EVMC is turned on, but it doesn't actually work asynchronously, it just falls back on doing the usual synchronous EVMC thing. See FIXME-asyncAndEvmc.)
Also ditched the plain-data Vm2AsyncOperation type; it wasn't really serving much purpose. Instead, the pendingAsyncOperation field directly contains the Future.
It's not the right solution to the "how do we know whether we still need to fetch the storage value or not?" problem. I haven't implemented the right solution yet, but at least we're better off not putting in a wrong one.
(Based on feedback on the PR.)
There was some back-and-forth in the PR regarding whether nested waitFor calls are acceptable: #1260 (comment) The eventual decision was to just change the waitFor to a doAssert (since we probably won't want this extra functionality when running synchronously anyway) to make sure that the Future is already finished.
The basic idea is that just *before* we're about to need some account or slot or block header, we prefetch it asynchronously. (e.g. Accounts and slots are fetched by using eth_getProof, then putting the fetched nodes into the database.) You can start in this mode by running: nimbus --sync-mode=stateless --stateless-data-source-url=https://mainnet.infura.io/whatever (At which point nimbus-eth1 will wait for an eth2 client to feed it blocks to run via newPayload in the Engine API.) Alternatively, you can run the block with a particular hash by running: nimbus statelesslyRun --stateless-data-source-url=https://mainnet.infura.io/whatever --stateless-block-hash=1234ABCetc. (This will simply fetch the block header corresponding to that hash from the data source, then run it, without bothering to wait for an eth2 client.) I say "not working yet" because the state roots are still coming out wrong. Also, block processing times are still much too slow to be practical: on the order of five minutes per block. We'll need to do precalculated witnesses or something like that in order to make this viable. One more note: I've removed the concurrentAssemblers test that I implemented when I did the basic async EVM stuff; it was useful as a temporary way to exercise the async code, but the test was based on a very hacky idea and it was more trouble than it was worth.
discard result.beginSavepoint | ||
|
||
proc init*(x: typedesc[AccountsCache], db: TrieDatabaseRef, pruneTrie: bool = true): AccountsCache = | ||
init(x, db, emptyRlpHash, pruneTrie) | ||
proc statelessInit*(x: typedesc[AccountsCache], db: TrieDatabaseRef, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between init
and statelessInit
here?
fork: Fork): Result[GasInt,void] | ||
# wildcard exception, wrapped below | ||
{.gcsafe, raises: [Exception].} = | ||
return waitFor(asyncProcessTransactionImpl(vmState, tx, sender, header, fork)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK as long this is still work-in-progress, but you'll have to restore the proper sync version before merging this PR.
@@ -0,0 +1,30 @@ | |||
import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be called data_sources/json_rpc
or data_sources/web3
. Fluffy will add another on-demand data source which will probably be called portal
.
let h = chainDB.getBlockHash(blockNumber) | ||
doAssert(h == header.blockHash, "stored the block header for block " & $(blockNumber)) | ||
|
||
template raiseExceptionIfError[E](whatAreWeVerifying: untyped, r: Result[void, E]) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a generic proc to avoid the multiple evaluation of the r
parameter.
# from ../../lc_proxy/validate_proof import getAccountFromProof | ||
|
||
|
||
var durationSpentDoingFetches*: times.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using the metrics
library for this. This would allow visualizing the data in Grafana once we deploy long-running instances of Nimbus-eth1 on our "fleet" servers. As an example, here are some metrics from our Nimbus eth2 fleet:
https://metrics.status.im/d/pgeNfj2Wz23/nimbus-fleet-testnets?orgId=1&refresh=5m
Take a look at the "Block & Attestation Delay" panels at the bottom of the screen which are currently using histograms:
# at the module top-level scope
declareHistogram data_fetching_duration,
"Data fetch duration", buckets = [0.25, 0.5, 1, 2, 4, 8, Inf]
# inside functions
data_fetching_duration.observe(durationInSecondsAsFloat) # Usually based on `Moment.now()`
|
||
proc fetchBlockHeaderWithHash*(rpcClient: RpcClient, h: Hash256): Future[BlockHeader] {.async.} = | ||
let t0 = now() | ||
let r = request("eth_getBlockByHash", %[%h.prefixHex, %false], some(rpcClient)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request
uses waitFor
internally, which is not quite appropriate. Furthermore, all HTTP requests may hang forever unless they are guarded with a timeout. Study the usages of awaitWithTimeout
from the nimbus-eth2 codebase.
let (blockNumber) = k.cpt.stack.popInt(1) | ||
k.cpt.stack.push: | ||
k.cpt.getBlockHash(blockNumber) | ||
let cpt = k.cpt # so it can safely be captured by the asyncChainTo closure below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this quite expensive to copy?
Can you explain the problem being solved in more detail? Perhaps we need to get back to the drawing board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about what you're asking.
If you're asking why I'm saying let cpt = k.cpt
, the problem I'm solving is that if I instead refer directly to k.cpt
inside the asyncChainTo closure, I get error messages like this:
Error: 'k' is of type <var Vm2Ctx> which cannot be captured as it would violate memory safety, declared here: /home/adam/Projects/nimbus-eth1/nimbus/vm2/interpreter/op_handlers/oph_blockdata.nim(32, 32)
By doing what I did, I avoid the need for the closure to capture k. I don't actually need k (the Vm2Ctx) inside the asyncChainTo closure; I only need cpt (the Computation). (And this turns out to be true in all of the places where I've used this pattern.)
I expected this to be cheap; the Computation type is defined as a ref object
, so I assumed that all that was being copied into the closure was a single reference. (But I didn't actually look at the C code to verify that.) Am I wrong?
NOT WORKING YET: New "stateless" mode.
The basic idea is that just before we're about to need some account or slot or block header, we prefetch it asynchronously. (e.g. Accounts and slots are fetched by using eth_getProof, then putting the fetched nodes into the database.)
You can start in this mode by running:
(At which point nimbus-eth1 will wait for an eth2 client to feed it blocks to run via newPayload in the Engine API.)
Alternatively, you can run the block with a particular hash by running:
(This will simply fetch the block header corresponding to that hash from the data source, then run it, without bothering to wait for an eth2 client.)
I say "not working yet" because the state roots are still coming out wrong.
Also, block processing times are still much too slow to be practical: on the order of five minutes per block. We'll need to do precalculated witnesses or something like that in order to make this viable.
One more note: I've removed the concurrentAssemblers test that I implemented when I did the basic async EVM stuff; it was useful as a temporary way to exercise the async code, but the test was based on a very hacky idea and it was more trouble than it was worth.