Do not set best block when there already exists a best block with an higher number #544

bkchr · 2021-07-21T13:32:26Z

Closes: #534

…higher number

client/consensus/common/src/parachain_consensus.rs

client/consensus/common/src/tests.rs

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

…higher number (#544) * Do not set best block when there already exists a best block with an higher number * Apply suggestions from code review Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com> Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

crystalin · 2021-07-21T17:11:14Z

client/consensus/common/src/parachain_consensus.rs

@@ -346,6 +346,18 @@ where
 	P: UsageProvider<Block> + Send + Sync + BlockBackend<Block>,
 	for<'a> &'a P: BlockImport<Block>,
 {
+	let best_number = parachain.usage_info().chain.best_number;
+	if *header.number() < best_number {


@bkchr Could there be the case where a re-org on the relay would actually create a re-org on the parachain leading to a lower block number becoming best ?
I'm thinking in the case where a re-org would happen over 2 blocks on the relay, and the parachain network would have only produced 1 block in the fork branch.

Then it will take one block more to set a new best block and you see the reorg later.

I know this is probably a rare case, but what if the node is the selected author for the new block after this re-org.
The node would not import the new fork block and so will not produce a new one ?

(This can/should be solved by other mechanism like selecting authors based on other input that just the previous block, but it still remains a possible issue)

This is certainly not a perfect solution and I think you are correct that it may lead to missed authorship opportunities during reorgs.

We will probably need changes in Substrate client to avoid this entire issue, but it is without a doubt possible to improve this and future client versions will address it.

which block would not be imported?

This here has nothing to do with authorship. The best block is also not important for block authorship

This change here is entirely about which block will be set as new best block. And the bug we had seen will only happen on major sync, but never when we are syncing at the tip of the chain.

JoshOrndorff · 2021-08-05T13:13:54Z

Moonbeam nodes still occasionally panic with this existence of block ... implies existence... error (although it seems less frequently than before this PR). One user reported it in moonbeam-foundation/moonbeam#646

And also @purestakeoskar captured some logs of it happening on our own nodes. Can you please share those logs?

arkpar · 2021-08-05T13:22:28Z

That panic is no longer in the substrate code. Are they using the latest version of cumulus?

bkchr · 2021-08-05T13:24:44Z

We did not yet had a release that contains the Substrate version that removes the panic.

bkchr · 2021-08-05T13:24:59Z

Moonbeam nodes still occasionally panic with this existence of block ... implies existence... error occasionally (although it seems less frequently than before this PR. One user reported it in PureStake/moonbeam#646

And also @purestakeoskar captured some logs of it happening on our own nodes. Can you please share those logs?

Moonbeam 0.9.6 is using polkadot 0.9.8 branch?

crystalin · 2021-08-05T13:49:14Z

@bkchr yes it is

bkchr · 2021-08-05T14:04:53Z

@JoshOrndorff @crystalin the last log moonbeam-foundation/moonbeam#646 (comment) doesn't include the mentioned panic. So, this is not related.

JoshOrndorff · 2021-08-05T14:37:55Z

@bkchr You're right about the user-reported issue. He now has a different problem. I failed to notice that. I'm sorry about that.

However, @purestakeoskar (please chime in) does have this same panic. He is running Moonbeam 0.9.6 (polkadot-v0.9.8 dependency branches).

It happens on this block (https://moonbase.subscan.io/block/74522, https://moonbase-blockscout.testnet.moonbeam.network/blocks/74522/transactions) which is pretty simple (3 inherents, one eth transfer).

bkchr · 2021-08-05T17:58:35Z

This bug can not really occur on a specific block. Would be good to get some log.

gilmouta · 2021-08-24T11:31:12Z

@bkchr There is a Moonbeam node runner having this problem consistently (even after deleting the parachain db). He's running Moonbeam v0.11.2, which is based on Polkadot v0.9.8. Interestingly, he is also getting stuck syncing on the block reported above (74522) before the error occurs.

2021-08-24 06:37:00 [Relaychain] :sparkles: Imported #1289678 (0x1917…02d3)    
2021-08-24 06:37:01 [:last_quarter_moon:] :gear:  Syncing  0.0 bps, target=#608868 (31 peers), best: #74522 (0x0b3b…c1a0), finalized #34966 (0xced9…350c), :arrow_down: 163.8kiB/s :arrow_up: 1.3kiB/s    
2021-08-24 06:37:01 [Relaychain] :zzz: Idle (33 peers), best: #1289678 (0x1917…02d3), finalized #1289675 (0x211c…2fe8), :arrow_down: 9.5kiB/s :arrow_up: 9.7kiB/s    

====================

Version: 0.11.2-ae60d2a-x86_64-linux-gnu

   0: sp_panic_handler::set::{{closure}}
   1: std::panicking::rust_panic_with_hook
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/panicking.rs:627:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/panicking.rs:520:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/sys_common/backtrace.rs:141:18
   4: rust_begin_unwind
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/panicking.rs:516:5
   5: core::panicking::panic_fmt
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/core/src/panicking.rs:93:14
   6: core::option::expect_failed
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/core/src/option.rs:1578:5
   7: sc_client_db::Backend<Block>::try_commit_operation
   8: <sc_client_db::Backend<Block> as sc_client_api::backend::Backend<Block>>::commit_operation
   9: <sc_service::client::client::Client<B,E,Block,RA> as sc_client_api::backend::LockImportRun<Block,B>>::lock_import_and_run::{{closure}}
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  13: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  14: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  15: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  17: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
  18: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
  19: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  20: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  21: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  22: std::thread::local::LocalKey<T>::with
  23: futures_executor::local_pool::block_on
  24: tokio::runtime::task::core::Core<T,S>::poll
  25: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  26: tokio::runtime::task::harness::Harness<T,S>::poll
  27: tokio::runtime::blocking::pool::Inner::run
  28: tokio::runtime::context::enter
  29: std::sys_common::backtrace::__rust_begin_short_backtrace
  30: core::ops::function::FnOnce::call_once{{vtable.shim}}
  31: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/alloc/src/boxed.rs:1572:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/alloc/src/boxed.rs:1572:9
      std::sys::unix::thread::Thread::new::thread_start
             at rustc/4e282795d7d1d28a4c6c1c6521045ae2b59f3519/library/std/src/sys/unix/thread.rs:91:17
  32: start_thread
  33: clone


Thread 'tokio-runtime-worker' panicked at 'existence of block with number `new_canonical` implies existence of blocks with all numbers before it; qed', /home/gh-actions/.cargo/git/checkouts/substrate-7e08433d4c370a21/74101dc/client/db/src/lib.rs:1309

bkchr · 2021-08-24T14:01:48Z

Not sure how this can still happen. However, with the latest polkadot branch this is just an error and no panic anymore. So, updating should fix it.

…higher number (#544) * Do not set best block when there already exists a best block with an higher number * Apply suggestions from code review Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com> Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

Do not set best block when there already exists a best block with an …

32bd9e1

…higher number

andresilva approved these changes Jul 21, 2021

View reviewed changes

client/consensus/common/src/parachain_consensus.rs Outdated Show resolved Hide resolved

client/consensus/common/src/tests.rs Outdated Show resolved Hide resolved

Apply suggestions from code review

d527357

Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>

bkchr merged commit 3462a4b into master Jul 21, 2021

bkchr deleted the bkchr-do-not-set-best-block-to-lower-number branch July 21, 2021 14:21

tgmichel mentioned this pull request Jul 21, 2021

Pin substrate#74101dc + cumulus#ed6ba5d moonbeam-foundation/moonbeam#628

Merged

crystalin reviewed Jul 21, 2021

View reviewed changes

rphmeier mentioned this pull request Jul 21, 2021

existence of block with number new_canonical implies existence of blocks with all numbers before it moonbeam-foundation/moonbeam#541

Closed

github-actions bot mentioned this pull request Aug 24, 2021

Update substrate/polkadot/cumulus from v0.9.8 to v0.9.9 moonbeam-foundation/moonbeam#729

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not set best block when there already exists a best block with an higher number #544

Do not set best block when there already exists a best block with an higher number #544

bkchr commented Jul 21, 2021

crystalin Jul 21, 2021

bkchr Jul 21, 2021

crystalin Jul 21, 2021

rphmeier Jul 21, 2021

rphmeier Jul 21, 2021

bkchr Jul 21, 2021

bkchr Jul 21, 2021

JoshOrndorff commented Aug 5, 2021 •

edited

Loading

arkpar commented Aug 5, 2021

bkchr commented Aug 5, 2021

bkchr commented Aug 5, 2021

crystalin commented Aug 5, 2021

bkchr commented Aug 5, 2021

JoshOrndorff commented Aug 5, 2021 •

edited

Loading

bkchr commented Aug 5, 2021

gilmouta commented Aug 24, 2021

bkchr commented Aug 24, 2021

Do not set best block when there already exists a best block with an higher number #544

Do not set best block when there already exists a best block with an higher number #544

Conversation

bkchr commented Jul 21, 2021

crystalin Jul 21, 2021

Choose a reason for hiding this comment

bkchr Jul 21, 2021

Choose a reason for hiding this comment

crystalin Jul 21, 2021

Choose a reason for hiding this comment

rphmeier Jul 21, 2021

Choose a reason for hiding this comment

rphmeier Jul 21, 2021

Choose a reason for hiding this comment

bkchr Jul 21, 2021

Choose a reason for hiding this comment

bkchr Jul 21, 2021

Choose a reason for hiding this comment

JoshOrndorff commented Aug 5, 2021 • edited Loading

arkpar commented Aug 5, 2021

bkchr commented Aug 5, 2021

bkchr commented Aug 5, 2021

crystalin commented Aug 5, 2021

bkchr commented Aug 5, 2021

JoshOrndorff commented Aug 5, 2021 • edited Loading

bkchr commented Aug 5, 2021

gilmouta commented Aug 24, 2021

bkchr commented Aug 24, 2021

JoshOrndorff commented Aug 5, 2021 •

edited

Loading

JoshOrndorff commented Aug 5, 2021 •

edited

Loading