Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate contract code out of state witness & reactive contract request by chunk validator #11099

Open
Tracked by #46
walnut-the-cat opened this issue Apr 17, 2024 · 1 comment
Assignees
Labels
A-stateless-validation Area: stateless validation

Comments

@walnut-the-cat
Copy link
Contributor

walnut-the-cat commented Apr 17, 2024

Relevant discussion

Link

Issue

During stateless validation forknet test, we observed node crash with the following error

2024-04-16T20:21:23.545144Z DEBUG chunk_tracing{chunk_hash=HnFSQEoLMEnMXK2pxnnnbv7GkwFobanyrd7JJbNS2Rrj}:new_chunk{shard_id=3}:apply_chunk{shard_id=3}:process_state_update:apply{protocol_version=84 num_transactions=19}:process_receipt{receipt_id=GHhLncT5GM2ksuwVzUqPMkzCp132V7xToQZPfUbKeRgP predecessor=operator.meta-pool.near receiver=lockup-meta-pool.near id=GHhLncT5GM2ksuwVzUqPMkzCp132V7xToQZPfUbKeRgP}:run{code.hash=EXekfV3kpFHHsTi4JUDh2MVLCKS3hpKdPbXMuRirxrvY vm_kind=NearVm}: vm: close time.busy=49.3µs time.idle=3.42µs
thread '<unnamed>' panicked at core/store/src/trie/trie_storage.rs:317:16:
!!!CRASH!!!: MissingTrieValue(TrieMemoryPartialStorage, 5FWvfWAJxH1mbCHuzLGwBfL9EYjH8YWVin6Pmp3H8gdM)
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: <near_store::trie::trie_storage::TrieMemoryPartialStorage as near_store::trie::trie_storage::TrieStorage>::retrieve_raw_bytes
   4: near_store::trie::Trie::internal_retrieve_trie_node
   5: near_store::trie::Trie::retrieve_raw_node
   6: near_store::trie::Trie::lookup_from_state_column
   7: near_store::trie::Trie::get_optimized_ref
   8: near_store::trie::Trie::get
   9: near_store::trie::update::TrieUpdate::get
  10: near_store::get_code
  11: node_runtime::actions::execute_function_call
  12: node_runtime::Runtime::apply_action
  13: node_runtime::Runtime::apply_action_receipt
  14: node_runtime::Runtime::apply::{{closure}}
  15: node_runtime::Runtime::apply
  16: <near_chain::runtime::NightshadeRuntime as near_chain::types::RuntimeAdapter>::apply_chunk
  17: near_chain::update_shard::apply_new_chunk
  18: core::ops::function::FnOnce::call_once{{vtable.shim}}
  19: <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute
  20: rayon_core::registry::WorkerThread::wait_until_cold

@Longarithm mentioned that

  10: near_store::get_code

is due to missing contract doe from state witness.

From debug log, @staffik confirmed that it was likely the case and the crash was happening with different contracts, including lockup-meta-pool.near and pack.promotional.basketball.playible.near

@Longarithm 's understanding of how this can cause node crash is as follows:

  • Chunk producer reads code from cache and doesn't go to trie for the code;
  • so trie nodes required for reading contract code are never read and recorded;
  • so chunk validator doesn't know where to take it.

Timeline

April 17

@Longarithm is preparing a quick patch to bypass the issue in Forknet for now, but we need a proper solution in place before MainNet launch

April 18

The team had discussion on the proper solution and concluded to separate contract out of state witness. When a chunk validator realizes that it does not have a contract code to validate incoming state witness, it will reactively request missing code to its peers. As a result, chunk miss may happen, but the chunk validator should be compiled contract code ready fur the future validation.

The project involves following works but not limited to:

  • Introduce a new network message to request contract code
    • Saketh's tip on how to do so: link
  • Remove contract code from state witness
@walnut-the-cat walnut-the-cat added the A-stateless-validation Area: stateless validation label Apr 17, 2024
@walnut-the-cat walnut-the-cat changed the title Incomplete state witness when contract code is cached. Separate contract code out of state witness Apr 18, 2024
@walnut-the-cat walnut-the-cat changed the title Separate contract code out of state witness Separate contract code out of state witness & reactive contract request by chunk validator Apr 18, 2024
@tayfunelmas tayfunelmas self-assigned this Apr 18, 2024
@walnut-the-cat
Copy link
Contributor Author

For now, @tayfunelmas will continue making progress on building network message, but @Longarithm will pause and focus on #11124 until we have a clear evidence that including contract code in state witness does not work for MVP launch. Relevant discussion can be found here: link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

No branches or pull requests

3 participants