-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stateless_validation] Missing ChunkExtra on load memtrie on startup #11135
Comments
So there is a dependency on ChunkExtra which for some reason is not available there. But we only need it to get |
Hmm, so the startup loading logic is a bit different from the catchup loading logic because whereas during catchup we have just created flat storage from a downloaded trie, during startup the flat storage may be in any arbitrary state. The flat head is somewhere, and on top of the flat head there is any set of deltas that represent different forks we may still be choosing among in the future. When loading memtrie, we start with the flat head and then for each delta, we also construct a new memtrie root to represent the difference. So, we cannot just take a different state root for the flat head, as that may not be consistent with the state that the flat state represents, and we cannot just take some other state root that does not correspond to the flat head, because then we may be missing the state root for some fork that we end up building on. So for example, suppose we have blocks A, B, C, D where B.parent == A, C.parent == A, D.parent == C. The flat head may be at A. We would need to load four state roots corresponding to the post state roots of A, B, C, and D, because technically we may continue building from any of these blocks. The memtrie would contain four roots, and if we apply a chunk on top of B for example, we would query the memtrie root corresponding to B. ChunkExtra is the place where we obtain the state root, because the flat state encodes the state corresponding to the post state root of the flat head, which is stored in the ChunkExtra for the flat head. Similarly, for each flat state delta, the delta describes the state transition whose result corresponds to the post state root of the block that the flat state delta is intended for. As for why ChunkExtra is missing, that's still to be investigated. |
Ah ok, I think the bug is here. It's a problem that I deferred during the implementation of memtries that I honestly just forgot about. nearcore/chain/chain/src/chain.rs Line 516 in 16e2321
When we load the memtries, we needed to determine which shards memtries should actually load. At that time, I simply took the tip of the blockchain, because, well, we tracked all shards anyway so that would only change upon resharding. But now, that also changes from epoch to epoch due to single shard tracking. So the bug was triggered as follows. The node is on a tip at height (presumably) 114912712, which is the first block of a new epoch, where it is a chunk producer for shard 2. When loading memtries, it needed to start from the flat head, which is 114912710, but that is in the previous epoch, where it was not tracking shard 2. So, when querying for the ChunkExtra for shard 2 at the 114912710 height, it didn't exist. So then, I have some questions:
|
The issue happened in the middle of the epoch, so memtrie has already been loaded (in the previous epoch, on catchup), state sync worked fine in the previous epoch. Loading memtrie on catchup does not require ChunkExtra, so it might have not been available in the previous epoch yet things worked. |
We need flat state to construct memtrie and we want memtrie for each flat state root.
We would be good as long as flat state is correct. If we need state sync again because of memtrie - would it mean that some flat state is missing, and need to be state synced too? |
@robin-near Do you think tracing would be useful in identifying why ChunkExtra was missing? I am currently not using it because had many merge conflicts with current master: #10843 |
Ah forgot to update on this; only robin-near@b85ed54 is needed now for tracing. |
Does catchup not write ChunkExtra? Maybe that's where my confusion is. |
Ah, yes. After loading memtrie, we write ChunkExtras in the loop here: nearcore/chain/chain/src/chain_update.rs Line 863 in f95087b
|
@Longarithm fixed #11135 by forcing flat storage head to move after state sync. The pytest `single_shard_tracking` exposes this issue. Instructions to run the test ``` cargo build -p neard --features test_features,statelessnet_protocol python3 pytest/tests/sanity/single_shard_tracking.py ``` --------- Co-authored-by: Longarithm <the.aleksandr.logunov@gmail.com>
It's reproducible again near/stakewars-iv#139 |
Error message after restarting a stateless validation node. After restart it attempts to load memtrie on startup (shard shuffling enabled):
I think the solution would be to modify
load_memtries_on_startup()
so that it can takestate_root
asload_mem_trie_on_catchup()
does:nearcore/core/store/src/trie/shard_tries.rs
Line 430 in 16e2321
Example usage: https://github.com/near/nearcore/pull/10820/files#diff-ef9c6aaa80a330e446c5365f42be9bff37ba4f898cf519dadd7e17545783c77cR2787
The text was updated successfully, but these errors were encountered: