-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
existence of block with number new_canonical implies existence of blocks with all numbers before it #541
Comments
Cc @arkpar |
@bkchr As far as I understand this is collator syncing relay chain. Does it not keep full relay chain block history? |
It should. We don't change anything for the relay chain client when it comes to syncing a normal full node. |
Could you reproduce this with |
@arkpar Unfortunately, I've already deleted that container and launched a new one (the node was synchronized and this bug didn't occur again) |
Also reported in #563 and #544 A lot of people are hitting this. @albertov19 Do you know anything about how often this happens, whether it's collators only or all node? Does a node restart work around it? |
@JoshOrndorff Reboot doesn't help. |
This issue has not been reported by any of our collators. I'll keep a lookout in case anyone reports it. |
@bkchr Looks like when syncing, the parachain head gets moved to a much earlier block. This leads to number-to-head mappings for retracted blocks being removed from the database. The sync then tries to import a block on top of a retracted block and this panic happens. Changing head by a large number of blocks is not really supported by the database layer. This also breaks the sync in a number of ways. In the log above the best block is reset more than 70000 blocks behind. And since the sync always starts from best blocks it leads to a lot of existing blocks being re-downloaded. |
This happened to me when all I did was start the docker container as per manual. Is there going to be any fix soon, or should I just build "keep trying till it works" into my automation? |
Either way, the assertion failure within the database is not the correct way of handling an API usage issue. We should address the underlying cause (collations built by out-of-sync collators?) as well as change the database to handle this case more gracefully. |
The assertion is for that once block number N exists, it may be mapped to a different hash, but never removed. Which was true at the time. It needs to be updated, I agree. |
@arkpar is your fix enough to allow nodes to recover, or it will prevent them to panic but keep them stall in that case? |
@crystalin The latter. There's a more fundamental issue that needs to be fixed by someone from the cumulus team. |
@arkpar , This seems to be a critical issue (impacting also other parachains), is someone from Parity current working on it? |
For posterity - paritytech/cumulus#544 is a patch working around the issue. |
Thank you @rphmeier, we are still testing the fix but haven't observed any error internally. @Comr8-Nga and others impacted by this issue, please upgrade to Moonbeam client v0.9.6 (https://github.com/PureStake/moonbeam/releases/tag/v0.9.6) |
The issue doesn't seem to appear again, closing it. |
I'm re-opening this, as it happens also in recent versions. The error is reporting a different code line, here are some logs reported (I'll try to gather some more recent ones):
|
@crystalin this panic doesn't exist anymore. Please update your nodes. This was removed entirely. |
@bkchr sorry I might have got a mismatch of version, I was expecting to be fixed in our previous version. I'll keep this one open until I get the new version tested to avoid confusion. |
2021-06-26 11:50:02 Moonbeam Parachain Collator
2021-06-26 11:50:02 ✌️ version 0.8.0-unknown-x86_64-linux-gnu
2021-06-26 11:50:02 ❤️ by PureStake, 2019-2021
2021-06-26 11:50:02 📋 Chain specification: Moonbase Alpha
2021-06-26 11:50:02 🏷 Node name: Comr8_Nga
2021-06-26 11:50:02 👤 Role: AUTHORITY
2021-06-26 11:50:02 💾 Database: RocksDb at /data/chains/moonbase_alpha/db
2021-06-26 11:50:02 ⛓ Native runtime: moonbase-40 (moonbase-1.tx2.au3)
2021-06-26 11:50:02 Parachain id: Id(1000)
2021-06-26 11:50:02 Parachain Account: 5Ec4AhPZk8STuex8Wsi9TwDtJQxKqzPJRCH7348Xtcs9vZLJ
2021-06-26 11:50:02 Parachain genesis state: 0x000000000000000000000000000000000000000000000000000000000000000000b505bc9a20d69f14620b2417b6d777c398ceb3e32119b9a53507111d1880927c03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
2021-06-26 11:50:02 [Relaychain] 🏷 Local node identity is: 12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:02 [Relaychain] 📦 Highest known block at #436285
2021-06-26 11:50:02 [Relaychain] 〽️ Prometheus server started at 127.0.0.1:9616
2021-06-26 11:50:02 [Relaychain] Listening for new connections on 127.0.0.1:9945.
2021-06-26 11:50:02 Using default protocol ID "sup" because none is configured in the chain specs
2021-06-26 11:50:02 🏷 Local node identity is: 12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:02 📦 Highest known block at #74522
2021-06-26 11:50:02 〽️ Prometheus server started at 127.0.0.1:9615
2021-06-26 11:50:02 Listening for new connections on 127.0.0.1:9944.
2021-06-26 11:50:03 🔍 Discovered new external address for our node: /ip4/65.21.185.66/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:03 [Relaychain] 🔍 Discovered new external address for our node: /ip4/65.21.185.66/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:03 [Relaychain] 🔍 Discovered new external address for our node: /ip4/10.2.112.215/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:03 🔍 Discovered new external address for our node: /ip4/10.2.112.215/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:03 🔍 Discovered new external address for our node: /ip4/10.2.96.244/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:03 [Relaychain] 🔍 Discovered new external address for our node: /ip4/172.18.0.3/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:04 🔍 Discovered new external address for our node: /ip6/2a01:4f9:c010:9e1f::1/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:04 🔍 Discovered new external address for our node: /ip4/10.2.96.243/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:04 [Relaychain] 🔍 Discovered new external address for our node: /ip4/127.0.0.1/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:04 [Relaychain] 🔍 Discovered new external address for our node: /ip4/192.168.170.128/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:04 [Relaychain] 🔍 Discovered new external address for our node: /ip4/192.168.211.64/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:04 [Relaychain] Candidate included without being backed? candidate_hash=0x8c80d74867da6e3fb6af49a785ec1db2d6ea02dc64289644de37f907a8923141
2021-06-26 11:50:05 🔍 Discovered new external address for our node: /ip4/10.2.112.239/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:05 [Relaychain] 🔍 Discovered new external address for our node: /ip6/2a01:4f9:c010:9e1f::1/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:05 [Relaychain] 🔍 Discovered new external address for our node: /ip4/10.132.0.60/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
Version: 0.8.0-unknown-x86_64-linux-gnu
0: sp_panic_handler::set::{{closure}}
1: std::panicking::rust_panic_with_hook
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/panicking.rs:595:17
2: std::panicking::begin_panic_handler::{{closure}}
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/panicking.rs:497:13
3: std::sys_common::backtrace::__rust_end_short_backtrace
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/sys_common/backtrace.rs:141:18
4: rust_begin_unwind
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/panicking.rs:493:5
5: core::panicking::panic_fmt
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/core/src/panicking.rs:92:14
6: core::option::expect_failed
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/core/src/option.rs:1258:5
7: sc_client_db::Backend::try_commit_operation
8: <sc_client_db::Backend as sc_client_api::backend::Backend>::commit_operation
9: <sc_service::client::client::Client<B,E,Block,RA> as sc_client_api::backend::LockImportRun<Block,B>>::lock_import_and_run::{{closure}}
10: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
11: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
12: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
13: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
14: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
15: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
16: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
17: <sc_service::task_manager::prometheus_future::PrometheusFuture as core::future::future::Future>::poll
18: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
19: <tracing_futures::Instrumented as core::future::future::Future>::poll
20: futures_executor::local_pool::block_on
21: <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once
22: tokio::runtime::task::harness::Harness<T,S>::poll
23: tokio::runtime::blocking::pool::Inner::run
24: tokio::runtime::context::enter
25: std::sys_common::backtrace::__rust_begin_short_backtrace
26: core::ops::function::FnOnce::call_once{{vtable.shim}}
27: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/alloc/src/boxed.rs:1480:9
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/alloc/src/boxed.rs:1480:9
std::sys::unix::thread::Thread::new::thread_start
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/sys/unix/thread.rs:71:17
28: start_thread
29: clone
Thread 'tokio-runtime-worker' panicked at 'existence of block with number
new_canonical
implies existence of blocks with all numbers before it; qed', /home/actions/.cargo/git/checkouts/substrate-7e08433d4c370a21/1d7f6e1/client/db/src/lib.rs:1187This is a bug. Please report it at:
The text was updated successfully, but these errors were encountered: