existence of block with number new_canonical implies existence of blocks with all numbers before it #541

Comr8-Nga · 2021-06-26T12:42:16Z

2021-06-26 11:50:02 Moonbeam Parachain Collator
2021-06-26 11:50:02 ✌️ version 0.8.0-unknown-x86_64-linux-gnu
2021-06-26 11:50:02 ❤️ by PureStake, 2019-2021
2021-06-26 11:50:02 📋 Chain specification: Moonbase Alpha
2021-06-26 11:50:02 🏷 Node name: Comr8_Nga
2021-06-26 11:50:02 👤 Role: AUTHORITY
2021-06-26 11:50:02 💾 Database: RocksDb at /data/chains/moonbase_alpha/db
2021-06-26 11:50:02 ⛓ Native runtime: moonbase-40 (moonbase-1.tx2.au3)
2021-06-26 11:50:02 Parachain id: Id(1000)
2021-06-26 11:50:02 Parachain Account: 5Ec4AhPZk8STuex8Wsi9TwDtJQxKqzPJRCH7348Xtcs9vZLJ
2021-06-26 11:50:02 Parachain genesis state: 0x000000000000000000000000000000000000000000000000000000000000000000b505bc9a20d69f14620b2417b6d777c398ceb3e32119b9a53507111d1880927c03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
2021-06-26 11:50:02 [Relaychain] 🏷 Local node identity is: 12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:02 [Relaychain] 📦 Highest known block at #436285
2021-06-26 11:50:02 [Relaychain] 〽️ Prometheus server started at 127.0.0.1:9616
2021-06-26 11:50:02 [Relaychain] Listening for new connections on 127.0.0.1:9945.
2021-06-26 11:50:02 Using default protocol ID "sup" because none is configured in the chain specs
2021-06-26 11:50:02 🏷 Local node identity is: 12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:02 📦 Highest known block at #74522
2021-06-26 11:50:02 〽️ Prometheus server started at 127.0.0.1:9615
2021-06-26 11:50:02 Listening for new connections on 127.0.0.1:9944.
2021-06-26 11:50:03 🔍 Discovered new external address for our node: /ip4/65.21.185.66/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:03 [Relaychain] 🔍 Discovered new external address for our node: /ip4/65.21.185.66/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:03 [Relaychain] 🔍 Discovered new external address for our node: /ip4/10.2.112.215/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:03 🔍 Discovered new external address for our node: /ip4/10.2.112.215/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:03 🔍 Discovered new external address for our node: /ip4/10.2.96.244/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:03 [Relaychain] 🔍 Discovered new external address for our node: /ip4/172.18.0.3/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:04 🔍 Discovered new external address for our node: /ip6/2a01:4f9:c010:9e1f::1/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:04 🔍 Discovered new external address for our node: /ip4/10.2.96.243/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:04 [Relaychain] 🔍 Discovered new external address for our node: /ip4/127.0.0.1/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:04 [Relaychain] 🔍 Discovered new external address for our node: /ip4/192.168.170.128/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:04 [Relaychain] 🔍 Discovered new external address for our node: /ip4/192.168.211.64/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:04 [Relaychain] Candidate included without being backed? candidate_hash=0x8c80d74867da6e3fb6af49a785ec1db2d6ea02dc64289644de37f907a8923141
2021-06-26 11:50:05 🔍 Discovered new external address for our node: /ip4/10.2.112.239/tcp/30333/p2p/12D3KooWEKi2pXZ4TH34C8YUQ9iravpSdBAxptSZDHKcUJkrgtST
2021-06-26 11:50:05 [Relaychain] 🔍 Discovered new external address for our node: /ip6/2a01:4f9:c010:9e1f::1/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv
2021-06-26 11:50:05 [Relaychain] 🔍 Discovered new external address for our node: /ip4/10.132.0.60/tcp/30334/ws/p2p/12D3KooWLkVaWfT5BxW1LnxYeEy78Rb9LKXCuoEX4iDYsYownQjv

Version: 0.8.0-unknown-x86_64-linux-gnu

0: sp_panic_handler::set::{{closure}}
1: std::panicking::rust_panic_with_hook
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/panicking.rs:595:17
2: std::panicking::begin_panic_handler::{{closure}}
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/panicking.rs:497:13
3: std::sys_common::backtrace::__rust_end_short_backtrace
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/sys_common/backtrace.rs:141:18
4: rust_begin_unwind
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/panicking.rs:493:5
5: core::panicking::panic_fmt
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/core/src/panicking.rs:92:14
6: core::option::expect_failed
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/core/src/option.rs:1258:5
7: sc_client_db::Backend::try_commit_operation
8: <sc_client_db::Backend as sc_client_api::backend::Backend>::commit_operation
9: <sc_service::client::client::Client<B,E,Block,RA> as sc_client_api::backend::LockImportRun<Block,B>>::lock_import_and_run::{{closure}}
10: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
11: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
12: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
13: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
14: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
15: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
16: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
17: <sc_service::task_manager::prometheus_future::PrometheusFuture as core::future::future::Future>::poll
18: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
19: <tracing_futures::Instrumented as core::future::future::Future>::poll
20: futures_executor::local_pool::block_on
21: <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once
22: tokio::runtime::task::harness::Harness<T,S>::poll
23: tokio::runtime::blocking::pool::Inner::run
24: tokio::runtime::context::enter
25: std::sys_common::backtrace::__rust_begin_short_backtrace
26: core::ops::function::FnOnce::call_once{{vtable.shim}}
27: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/alloc/src/boxed.rs:1480:9
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/alloc/src/boxed.rs:1480:9
std::sys::unix::thread::Thread::new::thread_start
at rustc/7a9b552cb1621c9c57898d147228aab32b65a7c3/library/std/src/sys/unix/thread.rs:71:17
28: start_thread
29: clone

Thread 'tokio-runtime-worker' panicked at 'existence of block with number new_canonical implies existence of blocks with all numbers before it; qed', /home/actions/.cargo/git/checkouts/substrate-7e08433d4c370a21/1d7f6e1/client/db/src/lib.rs:1187

This is a bug. Please report it at:

    https://github.com/PureStake/moonbeam/issues/new

The text was updated successfully, but these errors were encountered:

bkchr · 2021-06-28T14:01:34Z

Cc @arkpar

arkpar · 2021-06-28T14:27:41Z

@bkchr As far as I understand this is collator syncing relay chain. Does it not keep full relay chain block history?

bkchr · 2021-06-28T20:07:23Z

It should. We don't change anything for the relay chain client when it comes to syncing a normal full node.

arkpar · 2021-06-28T20:51:56Z

@Comr8-Nga

Could you reproduce this with -l state-db=trace,sync=trace?

Comr8-Nga · 2021-06-28T22:11:50Z

@arkpar Unfortunately, I've already deleted that container and launched a new one (the node was synchronized and this bug didn't occur again)

JoshOrndorff · 2021-07-02T17:03:59Z

Also reported in #563 and #544 A lot of people are hitting this. @albertov19 Do you know anything about how often this happens, whether it's collators only or all node? Does a node restart work around it?

razumv · 2021-07-02T18:48:29Z

@JoshOrndorff Reboot doesn't help.
I've even tried clearing the database and starting over again, with no results. Several people in my community have the same error.

albertov19 · 2021-07-05T10:40:09Z

This issue has not been reported by any of our collators. I'll keep a lookout in case anyone reports it.

crystalin · 2021-07-13T13:25:10Z

@arkpar we retrieve trace logs for a failing node there: #595
Let me know if you need more information

arkpar · 2021-07-13T20:36:15Z

@bkchr Looks like when syncing, the parachain head gets moved to a much earlier block. This leads to number-to-head mappings for retracted blocks being removed from the database. The sync then tries to import a block on top of a retracted block and this panic happens.

Changing head by a large number of blocks is not really supported by the database layer. This also breaks the sync in a number of ways. In the log above the best block is reset more than 70000 blocks behind. And since the sync always starts from best blocks it leads to a lot of existing blocks being re-downloaded.

gdzien-co · 2021-07-14T16:34:30Z

@bkchr Looks like when syncing, the parachain head gets moved to a much earlier block

This happened to me when all I did was start the docker container as per manual. Is there going to be any fix soon, or should I just build "keep trying till it works" into my automation?

rphmeier · 2021-07-16T03:45:28Z

@bkchr Looks like when syncing, the parachain head gets moved to a much earlier block. This leads to number-to-head mappings for retracted blocks being removed from the database. The sync then tries to import a block on top of a retracted block and this panic happens.

Changing head by a large number of blocks is not really supported by the database layer. This also breaks the sync in a number of ways. In the log above the best block is reset more than 70000 blocks behind. And since the sync always starts from best blocks it leads to a lot of existing blocks being re-downloaded.

Either way, the assertion failure within the database is not the correct way of handling an API usage issue. We should address the underlying cause (collations built by out-of-sync collators?) as well as change the database to handle this case more gracefully.

arkpar · 2021-07-16T07:34:53Z

The assertion is for that once block number N exists, it may be mapped to a different hash, but never removed. Which was true at the time. It needs to be updated, I agree.

crystalin · 2021-07-16T12:35:28Z

@arkpar is your fix enough to allow nodes to recover, or it will prevent them to panic but keep them stall in that case?

arkpar · 2021-07-16T12:50:24Z

@crystalin The latter. There's a more fundamental issue that needs to be fixed by someone from the cumulus team.

crystalin · 2021-07-20T15:32:57Z

@arkpar , This seems to be a critical issue (impacting also other parachains), is someone from Parity current working on it?
If yes, is the progress posted somewhere ?

arkpar · 2021-07-21T06:56:38Z

@arkpar , This seems to be a critical issue (impacting also other parachains), is someone from Parity current working on it?
If yes, is the progress posted somewhere ?

I'm redirecting this question to @bkchr and @rphmeier

rphmeier · 2021-07-21T19:07:55Z

For posterity - paritytech/cumulus#544 is a patch working around the issue.

crystalin · 2021-07-22T02:15:29Z

Thank you @rphmeier, we are still testing the fix but haven't observed any error internally.

@Comr8-Nga and others impacted by this issue, please upgrade to Moonbeam client v0.9.6 (https://github.com/PureStake/moonbeam/releases/tag/v0.9.6)

crystalin · 2021-07-23T22:04:42Z

The issue doesn't seem to appear again, closing it.

crystalin · 2021-09-05T01:36:57Z

I'm re-opening this, as it happens also in recent versions.
@rphmeier, it seems the fix in Cumulus wasn't enough as we are still observing it.

The error is reporting a different code line, here are some logs reported (I'll try to gather some more recent ones):

Version: 0.10.0-6fab892-x86_64-linux-gnu

0: sp_panic_handler::set::{{closure}}
1: std::panicking::rust_panic_with_hook
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/panicking.rs:627:17
2: std::panicking::begin_panic_handler::{{closure}}
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/panicking.rs:520:13
3: std::sys_common::backtrace::__rust_end_short_backtrace
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/sys_common/backtrace.rs:141:18
4: rust_begin_unwind
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/panicking.rs:516:5
5: core::panicking::panic_fmt
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/core/src/panicking.rs:93:14
6: core::option::expect_failed
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/core/src/option.rs:1578:5
7: sc_client_db::Backend::try_commit_operation
8: <sc_client_db::Backend as sc_client_api::backend::Backend>::commit_operation
9: <sc_service::client::client::Client<B,E,Block,RA> as sc_client_api::backend::LockImportRun<Block,B>>::lock_import_and_run::{{closure}}
10: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
11: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
12: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
13: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
14: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
15: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
16: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
17: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
18: <sc_service::task_manager::prometheus_future::PrometheusFuture as core::future::future::Future>::poll
19: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
20: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
21: <tracing_futures::Instrumented as core::future::future::Future>::poll
22: std:🧵:local::LocalKey::with
23: futures_executor::local_pool::block_on
24: tokio::runtime::task::core::Core<T,S>::poll
25: <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once
26: tokio::runtime::task::harness::Harness<T,S>::poll
27: tokio::runtime::blocking::pool::Inner::run
28: tokio::runtime::context::enter
29: std::sys_common::backtrace::__rust_begin_short_backtrace
30: core::ops::function::FnOnce::call_once{{vtable.shim}}
31: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/alloc/src/boxed.rs:1572:9
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/alloc/src/boxed.rs:1572:9
std::sys::unix:🧵:Thread:🆕:thread_start
at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/sys/unix/thread.rs:91:17
32: start_thread
33: clone

Thread 'tokio-runtime-worker' panicked at 'existence of block with number new_canonical implies existence of blocks with all numbers before it; qed', /home/gh-actions/.cargo/git/checkouts/substrate-7e08433d4c370a21/74101dc/client/db/src/lib.rs:1309

bkchr · 2021-09-06T07:50:23Z

@crystalin this panic doesn't exist anymore. Please update your nodes. This was removed entirely.

crystalin · 2021-09-06T12:16:28Z

@bkchr sorry I might have got a mismatch of version, I was expecting to be fixed in our previous version. I'll keep this one open until I get the new version tested to avoid confusion.
(For future information, Moonbeam v0.12 should be the version including the fix)

JoshOrndorff changed the title ~~a bug running a collator node~~ existence of block with number new_canonical implies existence of blocks with all numbers before it Jul 2, 2021

crystalin mentioned this issue Jul 10, 2021

Thread 'tokio-runtime-worker' panicked at 'existence of block with number new_canonical implies existence of blocks with all numbers before it; qed #585

Closed

gdzien-co mentioned this issue Jul 12, 2021

Thread 'tokio-runtime-worker' panicked #595

Closed

jasl mentioned this issue Jul 14, 2021

existence of block with number new_canonical implies existence of blocks with all numbers before it paritytech/cumulus#534

Closed

arkpar mentioned this issue Jul 16, 2021

Replace DB assertion with error paritytech/substrate#9362

Merged

xlc mentioned this issue Jul 19, 2021

PANIC AcalaNetwork/Acala#1227

Closed

crystalin mentioned this issue Jul 21, 2021

Node by goodboy#0067, report #624

Closed

crystalin closed this as completed Jul 23, 2021

crystalin mentioned this issue Aug 31, 2021

Moonbeam is crushing #754

Closed

This was referenced Sep 5, 2021

Node running bug #720

Closed

Crash during node sync #689

Closed

report from Michael_go_on#7800 #681

Closed

crystalin reopened this Sep 5, 2021

crystalin mentioned this issue Sep 5, 2021

sync node error #678

Closed

crystalin closed this as completed May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

existence of block with number new_canonical implies existence of blocks with all numbers before it #541

existence of block with number new_canonical implies existence of blocks with all numbers before it #541

Comr8-Nga commented Jun 26, 2021 •

edited

Loading

bkchr commented Jun 28, 2021

arkpar commented Jun 28, 2021

bkchr commented Jun 28, 2021

arkpar commented Jun 28, 2021 •

edited

Loading

Comr8-Nga commented Jun 28, 2021

JoshOrndorff commented Jul 2, 2021 •

edited

Loading

razumv commented Jul 2, 2021

albertov19 commented Jul 5, 2021

crystalin commented Jul 13, 2021 •

edited

Loading

arkpar commented Jul 13, 2021

gdzien-co commented Jul 14, 2021

rphmeier commented Jul 16, 2021

arkpar commented Jul 16, 2021

crystalin commented Jul 16, 2021 •

edited

Loading

arkpar commented Jul 16, 2021

crystalin commented Jul 20, 2021

arkpar commented Jul 21, 2021

rphmeier commented Jul 21, 2021

crystalin commented Jul 22, 2021

crystalin commented Jul 23, 2021

crystalin commented Sep 5, 2021

bkchr commented Sep 6, 2021

crystalin commented Sep 6, 2021

existence of block with number new_canonical implies existence of blocks with all numbers before it #541

existence of block with number new_canonical implies existence of blocks with all numbers before it #541

Comments

Comr8-Nga commented Jun 26, 2021 • edited Loading

bkchr commented Jun 28, 2021

arkpar commented Jun 28, 2021

bkchr commented Jun 28, 2021

arkpar commented Jun 28, 2021 • edited Loading

Comr8-Nga commented Jun 28, 2021

JoshOrndorff commented Jul 2, 2021 • edited Loading

razumv commented Jul 2, 2021

albertov19 commented Jul 5, 2021

crystalin commented Jul 13, 2021 • edited Loading

arkpar commented Jul 13, 2021

gdzien-co commented Jul 14, 2021

rphmeier commented Jul 16, 2021

arkpar commented Jul 16, 2021

crystalin commented Jul 16, 2021 • edited Loading

arkpar commented Jul 16, 2021

crystalin commented Jul 20, 2021

arkpar commented Jul 21, 2021

rphmeier commented Jul 21, 2021

crystalin commented Jul 22, 2021

crystalin commented Jul 23, 2021

crystalin commented Sep 5, 2021

bkchr commented Sep 6, 2021

crystalin commented Sep 6, 2021

Comr8-Nga commented Jun 26, 2021 •

edited

Loading

arkpar commented Jun 28, 2021 •

edited

Loading

JoshOrndorff commented Jul 2, 2021 •

edited

Loading

crystalin commented Jul 13, 2021 •

edited

Loading

crystalin commented Jul 16, 2021 •

edited

Loading