Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Storage root mismatch #413

Closed
dirmansebastian opened this issue Sep 1, 2019 · 32 comments · Fixed by paritytech/substrate#3989
Closed

Storage root mismatch #413

dirmansebastian opened this issue Sep 1, 2019 · 32 comments · Fixed by paritytech/substrate#3989

Comments

@dirmansebastian
Copy link

kusama, v0.5.1 ubuntu 18.04, after restart it worked again
kusama

@fgimenez
Copy link
Contributor

fgimenez commented Sep 2, 2019

I'm getting similar errors (also solved by a restart), in my case the nodes were still partially working, sending telemetry data and responding to health RPC calls.

@dirmansebastian
Copy link
Author

in my case blocks/finalized blocks weren't synchronized with telemetry, so it stuck

@fgimenez
Copy link
Contributor

fgimenez commented Sep 2, 2019

Yes, for me it was the same, new blocks were not being received, but telemetry data was still being sent.

@dirmansebastian
Copy link
Author

dirmansebastian commented Sep 2, 2019

for me it was the same then, do you remember the block when that happened? i think mine was 131904

@dirmansebastian
Copy link
Author

dirmansebastian commented Sep 2, 2019

it doesn't matter the block i guess , clau has another hash, so another block

image

image

@bkchr
Copy link
Member

bkchr commented Sep 2, 2019

How much memory is polkadot using? It looks like it uses a lot of your available main memory?

@fgimenez
Copy link
Contributor

fgimenez commented Sep 2, 2019

do you remember the block when that happened

Don't know exactly, for me it started at 2019-08-28 19:33:56, so block ~77400

@dirmansebastian
Copy link
Author

How much memory is polkadot using? It looks like it uses a lot of your available main memory?

atm 40% of cpu and just 1 gm ram, i guess it happened when cpu was 100%

@fgimenez
Copy link
Contributor

fgimenez commented Sep 2, 2019

How much memory is polkadot using? It looks like it uses a lot of your available main memory?

in my case both memory and CPU usage were under limits when the issue happened.
Screenshot_2019-09-02_11-16-59

@marcio-diaz
Copy link
Contributor

marcio-diaz commented Sep 2, 2019

Taking a look into this.

@bkchr
Copy link
Member

bkchr commented Sep 2, 2019

@marcio-diaz can you reproduce this?

@marcio-diaz
Copy link
Contributor

marcio-diaz commented Sep 2, 2019

@bkchr Yes, getting same log on master.

@dirmansebastian
Copy link
Author

in my case cpu was close to 100%
image

@andresilva
Copy link
Member

Similar report from @drewstone, using substrate with Aura. https://gist.github.com/drewstone/314b70f2ae25a084fd520aac7023b712

@marcio-diaz
Copy link
Contributor

Closed by #3989

@crsCR
Copy link

crsCR commented Nov 8, 2019

I'm getting the error with 0.6.10 but not 'gav' folder and I think all other reported were - if that makes a difference...
polkadot[12218]: Thread 'import-queue-worker-0' panicked at 'Storage root must match that calculated.', /home/chris/.cargo/git/checkouts/substrate-7e08433d4c370a21/d8df977/srml/executive/src/lib.rs:277

Screen Shot 2019-11-09 at 02 37 42

@marcio-diaz marcio-diaz reopened this Nov 8, 2019
@arkpar
Copy link
Member

arkpar commented Nov 8, 2019

@crsCR How long has your node has been running for when this happened?

UPDATE: Answered in Riot: Over 24 hours.

@crsCR
Copy link

crsCR commented Nov 13, 2019

Update - still getting with 0.6.16...

@arkpar
Copy link
Member

arkpar commented Nov 13, 2019

So far I could not reproduce locally or on the test node. Would be great to catch this with additional logging:

-l sync=trace,consensus=trace,client=trace,state-db=trace,db=trace,forks=trace,state_db=trace,substrate_client_db::storage_cache=trace,state-trace=trace

@gavofyork gavofyork changed the title kusama - Thread 'import-queue-worker-0' panicked at 'Storage root must match that calculated.', /Users/gav/.cargo/git/checkouts/substrate-7e08433d4c370a21/3e975b3/srml/executive/src/lib.rs:319:3 2019-09-01 17:30:12 Block import error: Execution(Wasmi(Trap(Trap { kind: Unreachable }))) Storage root mismatch Nov 13, 2019
@arkpar
Copy link
Member

arkpar commented Nov 13, 2019

I've finally caught this with a state trace against normal execution. Looks like this is not caching issue after all, but rather a consensus issue within wasm executor.
https://www.diffchecker.com/DUjdeTbz
cc @pepyakin

@bkchr
Copy link
Member

bkchr commented Nov 16, 2019

So, I looked into this and traced every call by hand, here are the last calls before the mismatch:

	CurrentSlot Get 137a2a48eeb4491643f107bcb12f1f81=Some(4ef8a10f00000000)		Get 137a2a48eeb4491643f107bcb12f1f81=Some(4ef8a10f00000000)
	EpochIndex Get 513a0658f849101c35b0ba116a2bbe4a=Some(2001000000000000)		Get 513a0658f849101c35b0ba116a2bbe4a=Some(2001000000000000)
	GenesisSlot Get 3cfadb81d3e4dab0f1c65f1bb012e22e=Some(0f6c970f00000000)		Get 3cfadb81d3e4dab0f1c65f1bb012e22e=Some(0f6c970f00000000)
	Grandpa authorities exists Get eecb67c20ca6cc8ba4d4434687f61309=None		Get eecb67c20ca6cc8ba4d4434687f61309=None
	Democracy NextTally Get 9651d20f401bfac47731a01d6eba33b4=Some(00000000)		Get 9651d20f401bfac47731a01d6eba33b4=Some(00000000)
	ReferendumCount Get a902f1f0ef97177b8df9f9fd413768e7=Some(00000000)		Get a902f1f0ef97177b8df9f9fd413768e7=Some(00000000)
	DispatchQeue::take Get f3185e4a66504e7ce03c18101b17ff046b163c064079e1be2c72b82834750404=None		Get f3185e4a66504e7ce03c18101b17ff046b163c064079e1be2c72b82834750404=None
	ElectionPhragmen::DidMigrate Get d0c2cab88f9b38c886cbcef484a13f98=Some(01)		Get d0c2cab88f9b38c886cbcef484a13f98=Some(01)
	Parachains DidUpdate kill Put 858943152ff634eb6c8c7bb4d2f83cba=None		Put 858943152ff634eb6c8c7bb4d2f83cba=None
	Slot AuctionInfo Get 4dcde43f827e7fec107f382b8f621956=None		Get 4dcde43f827e7fec107f382b8f621956=None
	Registrar SelectThreads Get f5d9efa37cdca1329050bbbe55e1899c=Some(080000)		Get f5d9efa37cdca1329050bbbe55e1899c=Some(080000)
	Registrar SelectThreads Put f5d9efa37cdca1329050bbbe55e1899c=Some(080000)		Put f5d9efa37cdca1329050bbbe55e1899c=Some(080000)
	Registrar RetryQueue Get c7ee2003a1d9ac142fb0a4bbeb0b715f=Some(00)		Get c7ee2003a1d9ac142fb0a4bbeb0b715f=Some(00)
	Registrar RetryQueue Put c7ee2003a1d9ac142fb0a4bbeb0b715f=Some(00)		Put c7ee2003a1d9ac142fb0a4bbeb0b715f=Some(00)
	Registrar Parachains Get bdaca66dddbb648fa1faa46a2d1243a4=None		Get bdaca66dddbb648fa1faa46a2d1243a4=None
	Registrar Active Put 8f86a494d1225289df335136f4e870be=Some(00)		Put 8f86a494d1225289df335136f4e870be=Some(00)
--- mismatch starts here ---
	System AllExtrinsicsWeight		Get 6301b8a945c24ad6ef0fd5d063c34f19=None
	System AllExtrinsicsWeight		Put 6301b8a945c24ad6ef0fd5d063c34f19=Some(00000000)
	System AllExtrinsicsWeight		Get 6301b8a945c24ad6ef0fd5d063c34f19=Some(00000000)
	System AllExtrinsicsWeight		Put 6301b8a945c24ad6ef0fd5d063c34f19=Some(00000000)

The code that generates the mismatch was added here: paritytech/substrate@b0bc705#diff-9be02fd5e4de6c77f22d6327fdaa09afR166

I think, the trace you collected is from a node that has a native runtime that is not equal to the wasm blob. This is possible as we have forgotten to update the spec version in the latest release. So, the collected trace probably does not helps us with the other storage root mismatches.

@arkpar
Copy link
Member

arkpar commented Nov 16, 2019

The two traces I diffed are from the same node. The block failed to import with storage root mismatch, and was re-downloaded and retried a second later, this time successfully. It is not clear then why the executed runtime was different then.

@bkchr
Copy link
Member

bkchr commented Nov 16, 2019

What is the left and right side of the diff? Is the left the failed and the right the successful run?

@arkpar
Copy link
Member

arkpar commented Nov 17, 2019

@bkchr Yes. The last line indicates result

@dirmansebastian
Copy link
Author

dirmansebastian commented Nov 17, 2019

again it crashed yday, same error at block 694226 , check the lines around NotInFinalizedChain
`2019-11-16 16:51:06 Imported #694225 (0xd09a…2f9a)
2019-11-16 16:51:06 Starting parachain attestation session on top of parent 0x534f45dda5b237bcf79fa067a89e9c7617827414e65fd167a59d0cb3625f767c. Local parachain duty is None
2019-11-16 16:51:06 Reorg from #694225,0xd09a…2f9a to #694225,0x534f…767c, common ancestor #694224,0x2aab…1b73
2019-11-16 16:51:06 Imported #694225 (0x534f…767c)
2019-11-16 16:51:06 Imported #694225 (0x354b…a062)
2019-11-16 16:51:09 Idle (50 peers), best: #694225 (0x534f…767c), finalized #694224 (0x2aab…1b73), ⬇ 648.5kiB/s ⬆ 714.8kiB/s
2019-11-16 16:51:12 Starting parachain attestation session on top of parent 0x56fb9cc2b47eb5640c1ea3c82b6dcdb0737ddb92021c911d7a05bf1b593638fa. Local parachain duty is None
2019-11-16 16:51:12 Imported #694226 (0x56fb…38fa)
2019-11-16 16:51:13 Starting parachain attestation session on top of parent 0x220ce64d6fcb9eda8aa48f4e6ed040e0f9992541ee24e594c30e850d54bb4521. Local parachain duty is None
2019-11-16 16:51:13 Reorg from #694226,0x56fb…38fa to #694226,0x220c…4521, common ancestor #694224,0x2aab…1b73
2019-11-16 16:51:13 Imported #694226 (0x220c…4521)
2019-11-16 16:51:14 Idle (50 peers), best: #694226 (0x220c…4521), finalized #694224 (0x2aab…1b73), ⬇ 561.9kiB/s ⬆ 591.4kiB/s
2019-11-16 16:51:19 Idle (50 peers), best: #694225 (0x534f…767c), finalized #694225 (0x534f…767c), ⬇ 657.5kiB/s ⬆ 667.1kiB/s

2019-11-16 16:51:21 Potential safety failure: reverting finalized block (694225, 0x534f45dda5b237bcf79fa067a89e9c7617827414e65fd167a59d0cb3625f767c)
2019-11-16 16:51:21 Block import error:
NotInFinalizedChain

2019-11-16 16:51:24 Idle (50 peers), best: #694225 (0x534f…767c), finalized #694225 (0x534f…767c), ⬇ 596.9kiB/s ⬆ 629.6kiB/s
2019-11-16 16:51:29 Idle (50 peers), best: #694226 (0x56fb…38fa), finalized #694226 (0x56fb…38fa), ⬇ 631.0kiB/s ⬆ 693.3kiB/s
2019-11-16 16:51:34 Idle (50 peers), best: #694226 (0x56fb…38fa), finalized #694226 (0x56fb…38fa), ⬇ 329.8kiB/s ⬆ 447.6kiB/s
2019-11-16 16:51:40 Idle (50 peers), best: #694226 (0x56fb…38fa), finalized #694226 (0x56fb…38fa), ⬇ 194.9kiB/s ⬆ 285.3kiB/s
2019-11-16 16:51:45 Idle (50 peers), best: #694226 (0x56fb…38fa), finalized #694226 (0x56fb…38fa), ⬇ 73.7kiB/s ⬆ 21.2kiB/s
2019-11-16 16:51:50 Idle (50 peers), best: #694226 (0x56fb…38fa), finalized #694226 (0x56fb…38fa), ⬇ 86.4kiB/s ⬆ 20.8kiB/s`

then same error:
`Version: 0.6.6-ccd1c6eb-x86_64-linux-gnu

stack backtrace:
0: substrate_panic_handler::set::{{closure}}
1: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:481
2: std::panicking::begin_panic
3: srml_executive::Executive<System,Block,Context,UnsignedValidator,AllModules>::execute_block
4: polkadot_runtime::api::dispatch
5: std::panicking::try::do_call
6: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:80
7: substrate_executor::native_executor::safe_call
8: std::thread::local::LocalKey::with
9: <polkadot_executor::Executor as substrate_executor::native_executor::NativeExecutionDispatch>::dispatch
10: std::thread::local::LocalKey::with
11: substrate_state_machine::StateMachine<B,H,N,T,Exec>::execute_aux
12: substrate_state_machine::StateMachine<B,H,N,T,Exec>::execute_call_with_native_else_wasm_strategy
13: substrate_state_machine::StateMachine<B,H,N,T,Exec>::execute_using_consensus_failure_handler
14: <substrate_client::call_executor::LocalCallExecutor<B,E> as substrate_client::call_executor::CallExecutor<Block,substrate_primitives::hasher::blake2::Blake2Hasher>>::call_at_state
15: <&substrate_client::client::Client<B,E,Block,RA> as substrate_consensus_common::block_import::BlockImport>::import_block
16: <substrate_finality_grandpa::import::GrandpaBlockImport<B,E,Block,RA,SC> as substrate_consensus_common::block_import::BlockImport>::import_block
17: <substrate_consensus_babe::BabeBlockImport<B,E,Block,I,RA,PRA> as substrate_consensus_common::block_import::BlockImport>::import_block
18: substrate_consensus_common::import_queue::import_single_block
19: <futures_util::future::poll_fn::PollFn as core::future::future::Future>::poll
20: <futures_util::future::then::Then<Fut1,Fut2,F> as core::future::future::Future>::poll
21: <futures_util::future::poll_fn::PollFn as core::future::future::Future>::poll
22: futures_executor::thread_pool::PoolState::work
23: std::sys_common::backtrace::__rust_begin_short_backtrace
24: std::panicking::try::do_call
25: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:80
26: core::ops::function::FnOnce::call_once{{vtable.shim}}
27: <alloc::boxed::Box as core::ops::function::FnOnce>::call_once
at rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/liballoc/boxed.rs:787
28: <alloc::boxed::Box as core::ops::function::FnOnce>::call_once
at rustc/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/liballoc/boxed.rs:787
std::sys_common::thread::start_thread
at src/libstd/sys_common/thread.rs:13
std::sys::unix::thread::Thread::new::thread_start
at src/libstd/sys/unix/thread.rs:79
29: start_thread
30: __clone

Thread 'import-queue-worker-0' panicked at 'Storage root must match that calculated.', /home/sebytza05/.cargo/git/checkouts/substrate-7e08433d4c370a21/595f18e/srml/executive/src/lib.rs:271

This is a bug. Please report it at:

https://github.com/paritytech/polkadot/issues/new

Hash not equal
f053a589eeb983c931b60bdd3daf2342ed8716e23ba2020263bdabfc6f740648
910ba2f37e30ccd0336b4dd2d269a7bffe66c9709bff772917b4d34144bbca9d
panicked at 'Storage root must match that calculated.', /Users/gav/.cargo/git/checkouts/substrate-7e08433d4c370a21/595f18e/srml/executive/src/lib.rs:271:3
2019-11-06 14:42:21 Block import error:
Execution(Wasmi(Trap(Trap { kind: Unreachable })))
Hash: given=f053a589eeb983c931b60bdd3daf2342ed8716e23ba2020263bdabfc6f740648, expected=910ba2f37e30ccd0336b4dd2d269a7bffe66c9709bff772917b4d34144bbca9d`

after restart it worked without problems

@bkchr
Copy link
Member

bkchr commented Nov 17, 2019

@bkchr Yes. The last line indicates result

@arkpar Why was the first run native and the second not? Weird.

@culmore2015
Copy link

Nov 21 04:35:27 kusama kusama[17890]: Version: 0.6.17-0929fe2-x86_64-linux-gnu
Nov 21 04:35:27 kusama kusama[17890]: stack backtrace:
Nov 21 04:35:27 kusama kusama[17890]: 0: substrate_panic_handler::set::{{closure}}
Nov 21 04:35:27 kusama kusama[17890]: 1: std::panicking::rust_panic_with_hook
Nov 21 04:35:27 kusama kusama[17890]: at src/libstd/panicking.rs:477
Nov 21 04:35:27 kusama kusama[17890]: 2: std::panicking::begin_panic
Nov 21 04:35:27 kusama kusama[17890]: 3: srml_executive::Executive<System,Block,Context,UnsignedValidator,AllModules>::execute_block
Nov 21 04:35:27 kusama kusama[17890]: 4: polkadot_runtime::api::dispatch
Nov 21 04:35:27 kusama kusama[17890]: 5: std::panicking::try::do_call
Nov 21 04:35:27 kusama kusama[17890]: 6: __rust_maybe_catch_panic
Nov 21 04:35:27 kusama kusama[17890]: at src/libpanic_unwind/lib.rs:80
Nov 21 04:35:27 kusama kusama[17890]: 7: substrate_executor::native_executor::safe_call
Nov 21 04:35:27 kusama kusama[17890]: 8: std::thread::local::LocalKey::with
Nov 21 04:35:27 kusama kusama[17890]: 9: <polkadot_executor::Executor as substrate_executor::native_executor::NativeExecutionDispatch>::dispatch
Nov 21 04:35:27 kusama kusama[17890]: 10: std::thread::local::LocalKey::with
Nov 21 04:35:27 kusama kusama[17890]: 11: substrate_state_machine::StateMachine<B,H,N,T,Exec>::execute_aux
Nov 21 04:35:27 kusama kusama[17890]: 12: substrate_state_machine::StateMachine<B,H,N,T,Exec>::execute_call_with_native_else_wasm_strategy
Nov 21 04:35:27 kusama kusama[17890]: 13: substrate_state_machine::StateMachine<B,H,N,T,Exec>::execute_using_consensus_failure_handler
Nov 21 04:35:27 kusama kusama[17890]: 14: <substrate_client::call_executor::LocalCallExecutor<B,E> as substrate_client::call_executor::CallExecutor<Block,substrate_primitives::hasher::blake2::Blake2Hasher>>::call_at_state
Nov 21 04:35:27 kusama kusama[17890]: 15: <&substrate_client::client::Client<B,E,Block,RA> as substrate_consensus_common::block_import::BlockImport>::import_block
Nov 21 04:35:27 kusama kusama[17890]: 16: <substrate_finality_grandpa::import::GrandpaBlockImport<B,E,Block,RA,SC> as substrate_consensus_common::block_import::BlockImport>::import_block
Nov 21 04:35:27 kusama kusama[17890]: 17: <substrate_consensus_babe::BabeBlockImport<B,E,Block,I,RA,PRA> as substrate_consensus_common::block_import::BlockImport>::import_block
Nov 21 04:35:27 kusama kusama[17890]: 18: substrate_consensus_common::import_queue::import_single_block
Nov 21 04:35:27 kusama kusama[17890]: 19: <futures_util::future::poll_fn::PollFn as core::future::future::Future>::poll
Nov 21 04:35:27 kusama kusama[17890]: 20: <futures_util::future::then::Then<Fut1,Fut2,F> as core::future::future::Future>::poll
Nov 21 04:35:27 kusama kusama[17890]: 21: <futures_util::future::poll_fn::PollFn as core::future::future::Future>::poll
Nov 21 04:35:27 kusama kusama[17890]: 22: futures_executor::thread_pool::PoolState::work
Nov 21 04:35:27 kusama kusama[17890]: 23: std::sys_common::backtrace::__rust_begin_short_backtrace
Nov 21 04:35:27 kusama kusama[17890]: 24: std::panicking::try::do_call
Nov 21 04:35:27 kusama kusama[17890]: 25: __rust_maybe_catch_panic
Nov 21 04:35:27 kusama kusama[17890]: at src/libpanic_unwind/lib.rs:80
Nov 21 04:35:27 kusama kusama[17890]: 26: core::ops::function::FnOnce::call_once{{vtable.shim}}
Nov 21 04:35:27 kusama kusama[17890]: 27: <alloc::boxed::Box as core::ops::function::FnOnce>::call_once
Nov 21 04:35:27 kusama kusama[17890]: at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/liballoc/boxed.rs:922
Nov 21 04:35:27 kusama kusama[17890]: 28: <alloc::boxed::Box as core::ops::function::FnOnce>::call_once
Nov 21 04:35:27 kusama kusama[17890]: at /rustc/4560ea788cb760f0a34127156c78e2552949f734/src/liballoc/boxed.rs:922
Nov 21 04:35:27 kusama kusama[17890]: std::sys_common::thread::start_thread
Nov 21 04:35:27 kusama kusama[17890]: at src/libstd/sys_common/thread.rs:13
Nov 21 04:35:27 kusama kusama[17890]: std::sys::unix::thread::Thread::new::thread_start
Nov 21 04:35:27 kusama kusama[17890]: at src/libstd/sys/unix/thread.rs:79
Nov 21 04:35:27 kusama kusama[17890]: 29: start_thread
Nov 21 04:35:27 kusama kusama[17890]: 30: clone
Nov 21 04:35:27 kusama kusama[17890]: Thread 'import-queue-worker-0' panicked at 'Storage root must match that calculated.', /root/.cargo/git/checkouts/substrate-7e08433d4c370a21/aa937d9/srml/executive/src/lib.rs:292

@arkpar
Copy link
Member

arkpar commented Dec 4, 2019

Should be fixed in kusama 7.x releases. Please reopen if it happens again.

@jordy25519
Copy link

Should be fixed in kusama 7.x releases. Please reopen if it happens again.

@arkpar should this issue be closed?

@arkpar
Copy link
Member

arkpar commented Dec 6, 2019

right

@badkk
Copy link

badkk commented Feb 20, 2020

Should be fixed in kusama 7.x releases. Please reopen if it happens again.

@arkpar Is there any fixed pr links?

@arkpar
Copy link
Member

arkpar commented Feb 20, 2020

paritytech/substrate#4194

imstar15 pushed a commit to imstar15/polkadot that referenced this issue Aug 25, 2021
Instead of `ensure` with dedicated errors use `panic` or `assert`. See
for details paritytech#410

Closes paritytech#410

Co-authored-by: Gavin Wood <gavin@parity.io>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.