Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node crushing after a simple restart of the service #139

Closed
abahmanem opened this issue Jun 17, 2024 · 5 comments
Closed

Node crushing after a simple restart of the service #139

abahmanem opened this issue Jun 17, 2024 · 5 comments
Labels

Comments

@abahmanem
Copy link

abahmanem commented Jun 17, 2024

Bug Report

Overview

Please share high level description of the issue/bug you are reporting.

i set up a stateless node days ago and was running fine.

This morning , i just did a : sudo systemctl restart neard and i'm getting this :

Opened a new RocksDB instance. num_instances=1
thread 'main' panicked at chain/client/src/client_actor.rs:168:6:
called `Result::unwrap()` on an `Err` value: Chain(StorageError(StorageInconsistentState("No ChunkExtra for block 8WX1DQnSttuk4WTyHPD5oJnrYBAL95hbCDaF2nbX2pgj in shard s1.v3")))
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: nearcore::start_with_config_and_synchronization
   4: neard::cli::RunCmd::run::{{closure}}
   5: tokio::task::local::LocalSet::run_until::{{closure}}
   6: neard::cli::NeardCmd::parse_and_run
   7: neard::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Aborted (core dumped)

I got the latest snapshot data but i had the same error.

Affected parties

Who is affected? Validators? Contract developers? Or regular users?

stateless node
pool : abahmane.pool.statelessnet.

Impact

What’s the worst outcome of the issue?

Reproduction steps

Please share step by step guideline on how to reproduce the issue.

Do a simple :sudo systemctl restart neard

[Optional] Code reference

Please locate the issue in the codebase.

[Optional] Root cause analysis

This section is optional but should be filed to claim additional reward.
Please share your analysis on the root cause of the issue.

[Optional] Suggested fix

This section is optional but should be filed to claim additional reward.
Please share a recommended long-term/short-term fix for the issue.

@telezhnaya
Copy link
Contributor

Thank you for reporting! We'll investigate this

@telezhnaya telezhnaya added the bug label Jun 17, 2024
@GO2Pro
Copy link

GO2Pro commented Jun 17, 2024

Confirm the error:

Jun 17 09:55:15 stakewars-iv-h15a neard[143422]: thread 'main' panicked at chain/client/src/client_actor.rs:168:6:
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]: called `Result::unwrap()` on an `Err` value: Chain(StorageError(StorageInconsistentState("No ChunkExtra for block 8WX1DQnSttuk4WTyHPD5oJnrYBAL95hbCDaF2nbX2pgj in shard s1.v3")))
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]: stack backtrace:
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    0: rust_begin_unwind
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    1: core::panicking::panic_fmt
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    2: core::result::unwrap_failed
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    3: nearcore::start_with_config_and_synchronization
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    4: neard::cli::RunCmd::run::{{closure}}
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    5: tokio::task::local::LocalSet::run_until::{{closure}}
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    6: neard::cli::NeardCmd::parse_and_run
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]:    7: neard::main
Jun 17 09:55:15 stakewars-iv-h15a neard[143422]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Jun 17 09:55:15 stakewars-iv-h15a systemd[1]: neard.service: Main process exited, code=dumped, status=6/ABRT
Jun 17 09:55:15 stakewars-iv-h15a systemd[1]: neard.service: Failed with result 'core-dump'.

The problem appeared right after the neard stop/start, nothing was changed.

Stateless pool : go2pro.pool.statelessnet

Here are the steps I took to fix this error:

  1. Checked out the latest version: git checkout statelessnet_latest
  2. Compiled the latest version: build 1.36.1-730-g4b39f0226
  3. Copied the DB Snapshot
  4. Initialized the working directory, after deleting the files from the .near directory.
  5. Downloaded the config file
  6. Replaced validator_key.json and node_key.json with mine.

When moving the keys to another server, the error is completely reproduced.

The error is still present.

@abahmanem
Copy link
Author

managed to start the node with this snapshot : 2024-06-17T11:42:04Z

@GO2Pro
Copy link

GO2Pro commented Jun 17, 2024

managed to start the node with this snapshot : 2024-06-17T11:42:04Z

Thanks, it works, but I'm stuck on the block 119407740

@telezhnaya
Copy link
Contributor

The network was reset, this problem should not appear anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants