Node crushing after a simple restart of the service #139

abahmanem · 2024-06-17T09:12:41Z

Bug Report

Overview

Please share high level description of the issue/bug you are reporting.

i set up a stateless node days ago and was running fine.

This morning , i just did a : sudo systemctl restart neard and i'm getting this :

Opened a new RocksDB instance. num_instances=1
thread 'main' panicked at chain/client/src/client_actor.rs:168:6:
called `Result::unwrap()` on an `Err` value: Chain(StorageError(StorageInconsistentState("No ChunkExtra for block 8WX1DQnSttuk4WTyHPD5oJnrYBAL95hbCDaF2nbX2pgj in shard s1.v3")))
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: nearcore::start_with_config_and_synchronization
   4: neard::cli::RunCmd::run::{{closure}}
   5: tokio::task::local::LocalSet::run_until::{{closure}}
   6: neard::cli::NeardCmd::parse_and_run
   7: neard::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Aborted (core dumped)

I got the latest snapshot data but i had the same error.

Affected parties

Who is affected? Validators? Contract developers? Or regular users?

stateless node
pool : abahmane.pool.statelessnet.

Impact

What’s the worst outcome of the issue?

Reproduction steps

Please share step by step guideline on how to reproduce the issue.

Do a simple :sudo systemctl restart neard

[Optional] Code reference

Please locate the issue in the codebase.

[Optional] Root cause analysis

This section is optional but should be filed to claim additional reward.
Please share your analysis on the root cause of the issue.

[Optional] Suggested fix

This section is optional but should be filed to claim additional reward.
Please share a recommended long-term/short-term fix for the issue.

The text was updated successfully, but these errors were encountered:

telezhnaya · 2024-06-17T11:49:55Z

Thank you for reporting! We'll investigate this

GO2Pro · 2024-06-17T12:39:57Z

Confirm the error:

Jun 17 09:55:15 stakewars-iv-h15a neard[143422]: thread 'main' panicked at chain/client/src/client_actor.rs:168:6:
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]: called `Result::unwrap()` on an `Err` value: Chain(StorageError(StorageInconsistentState("No ChunkExtra for block 8WX1DQnSttuk4WTyHPD5oJnrYBAL95hbCDaF2nbX2pgj in shard s1.v3")))
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]: stack backtrace:
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    0: rust_begin_unwind
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    1: core::panicking::panic_fmt
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    2: core::result::unwrap_failed
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    3: nearcore::start_with_config_and_synchronization
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    4: neard::cli::RunCmd::run::{{closure}}
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    5: tokio::task::local::LocalSet::run_until::{{closure}}
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    6: neard::cli::NeardCmd::parse_and_run
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    7: neard::main
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Jun 17 09:55:15 stakewars-iv-h15a systemd[1]: neard.service: Main process exited, code=dumped, status=6/ABRT
Jun 17 09:55:15 stakewars-iv-h15a systemd[1]: neard.service: Failed with result 'core-dump'.

The problem appeared right after the neard stop/start, nothing was changed.

Stateless pool : go2pro.pool.statelessnet

Here are the steps I took to fix this error:

Checked out the latest version: git checkout statelessnet_latest
Compiled the latest version: build 1.36.1-730-g4b39f0226
Copied the DB Snapshot
Initialized the working directory, after deleting the files from the .near directory.
Downloaded the config file
Replaced validator_key.json and node_key.json with mine.

When moving the keys to another server, the error is completely reproduced.

The error is still present.

abahmanem · 2024-06-17T14:02:42Z

managed to start the node with this snapshot : 2024-06-17T11:42:04Z

GO2Pro · 2024-06-17T15:31:44Z

managed to start the node with this snapshot : 2024-06-17T11:42:04Z

Thanks, it works, but I'm stuck on the block 119407740

telezhnaya · 2024-06-26T11:06:00Z

The network was reset, this problem should not appear anymore

telezhnaya mentioned this issue Jun 17, 2024

[stateless_validation] Missing ChunkExtra on load memtrie on startup near/nearcore#11135

Open

telezhnaya added the bug label Jun 17, 2024

telezhnaya closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node crushing after a simple restart of the service #139

Node crushing after a simple restart of the service #139

abahmanem commented Jun 17, 2024 •

edited

Loading

telezhnaya commented Jun 17, 2024

GO2Pro commented Jun 17, 2024

abahmanem commented Jun 17, 2024

GO2Pro commented Jun 17, 2024

telezhnaya commented Jun 26, 2024

Node crushing after a simple restart of the service #139

Node crushing after a simple restart of the service #139

Comments

abahmanem commented Jun 17, 2024 • edited Loading

Bug Report

Overview

Affected parties

Impact

Reproduction steps

[Optional] Code reference

[Optional] Root cause analysis

[Optional] Suggested fix

telezhnaya commented Jun 17, 2024

GO2Pro commented Jun 17, 2024

abahmanem commented Jun 17, 2024

GO2Pro commented Jun 17, 2024

telezhnaya commented Jun 26, 2024

abahmanem commented Jun 17, 2024 •

edited

Loading