Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread 'solana-window' panicked at 'assertion failed: `(left == right)`, core/src/blocktree.rs:1668:17 #5570

Closed
mvines opened this issue Aug 20, 2019 · 6 comments

Comments

@mvines
Copy link
Member

commented Aug 20, 2019

tds-solana-com-bootstrap-leader panicked in the middle of TdS stage 0 dry run 3 with:

thread 'solana-window' panicked at 'assertion failed: `(left == right)`
  left: `28393`,
 right: `8934`', core/src/blocktree.rs:1668:17
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:59
             at src/libstd/panicking.rs:197
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:211
   4: solana_metrics::metrics::set_panic_hook::{{closure}}::{{closure}}
   5: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:478
   6: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:381
   7: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:336
   8: solana::blocktree::handle_recovery
   9: solana::blocktree::Blocktree::write_shared_blobs
  10: solana::window_service::recv_window
thread 'main' panicked at 'validator exit: Any', src/libcore/result.rs:999:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:59
             at src/libstd/panicking.rs:197
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:211
   4: solana_metrics::metrics::set_panic_hook::{{closure}}::{{closure}}
   5: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:478
   6: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:381
   7: rust_begin_unwind
             at src/libstd/panicking.rs:308
   8: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
   9: core::result::unwrap_failed
  10: solana_validator::main
  11: std::rt::lang_start::{{closure}}
  12: std::panicking::try::do_call
             at src/libstd/rt.rs:49
             at src/libstd/panicking.rs:293
  13: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:85
  14: std::rt::lang_start_internal
             at src/libstd/panicking.rs:272
             at src/libstd/panic.rs:394
             at src/libstd/rt.rs:48
  15: main
  16: __libc_start_main
  17: _start

Full log: tds-solana-com-bootstrap-leader.log

assert_eq!(blob_slot, slot);

@mvines mvines added this to the Mavericks v0.18.0 milestone Aug 20, 2019

@mvines mvines added this to Non-blocking in TdS Stage 0 via automation Aug 20, 2019

@mvines

This comment has been minimized.

Copy link
Member Author

commented Aug 20, 2019

All 4 staked Solana-run nodes failed due to this panic, at different times:

tds-solana-com-bootstrap-leader.log
  237762:thread 'solana-window' panicked at 'assertion failed: `(left == right)`
  237763-  left: `28393`,
  237764- right: `8934`', core/src/blocktree.rs:1668:17

tds-solana-com-us-central1-a-fullnode.log
  263181:thread 'solana-window' panicked at 'assertion failed: `(left == right)`
  263182-  left: `9936`,
  263183- right: `10086`', core/src/blocktree.rs:1668:17

tds-solana-com-us-west1-a-fullnode.log
  257649:thread 'solana-window' panicked at 'assertion failed: `(left == right)`
  257650-  left: `17667`,
  257651- right: `8934`', core/src/blocktree.rs:1668:17

tds-solana-com-europe-west4-a-fullnode.log
  263026:thread 'solana-window' panicked at 'assertion failed: `(left == right)`
  263027-  left: `48066`,
  263028- right: `8918`', core/src/blocktree.rs:1668:17

The Solana blockstreamer node though did not suffer from this panic

@rob-solana

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2019

those assert!()s need to be errors, failures, and probably promoted to slashing offenses

@rob-solana

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2019

was the slot being broadcast by the HA validator?

@mvines

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

Observed on another validator during DR3: (looks the same as tds-solana-com-europe-west4-a-fullnode.log):

[2019-08-20T20:11:56.452501380Z WARN  solana::blocktree] [handle_recovery] failed verification at slot=8918, index=23, discarding
thread 'solana-window' panicked at 'assertion failed: `(left == right)`
  left: `48066`,
 right: `8918`', core/src/blocktree.rs:1668:17

@mvines mvines moved this from Non-blocking to Blocking Dry Run 4 in TdS Stage 0 Aug 21, 2019

@carllin

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

From the combination of blocktree errors of this type:

Received last blob with index 109 >= slot.last_index 30 and what looks like
the erasure corruption posted above, it seems a node has been retransmitting blobs for the same slot after a restart.

@pgarg66 Shreds will remove the assert and make some changes to make erasure more robust

@mvines mvines moved this from Blocking Dry Run 4 to Non-blocking in TdS Stage 0 Aug 23, 2019

@mvines

This comment has been minimized.

Copy link
Member Author

commented Aug 26, 2019

Sounds like with shreds this issue is now obsolete

@mvines mvines closed this Aug 26, 2019

TdS Stage 0 automation moved this from Non-blocking to Done Aug 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
4 participants
You can’t perform that action at this time.