Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

[runtime 9122] panicked at 'Externalities not allowed to fail within runtime: DefaultError' #4232

Closed
GopherJ opened this issue Nov 5, 2021 · 30 comments

Comments

@GopherJ
Copy link
Contributor

GopherJ commented Nov 5, 2021

I encounterred this error in relaychain side while doing runtime upgrade for a parachain

panicked at 'Externalities not allowed to fail within runtime: DefaultError', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/primitives/state-machine/src/ext.rs:325:48
@GopherJ GopherJ changed the title panicked at 'Externalities not allowed to fail within runtime: DefaultError' [runtime 9122] panicked at 'Externalities not allowed to fail within runtime: DefaultError' Nov 5, 2021
@bkchr
Copy link
Member

bkchr commented Nov 5, 2021

Means that the storage proof was probably missing some data.

@GopherJ
Copy link
Contributor Author

GopherJ commented Nov 5, 2021

@bkchr what shall I do to fix it? today I've been working on runtime upgrade tests for our dot auction, but it always fails

I've had this error, and the parachain cannot produce blocks anymore after doing runtime upgrade

@weichweich
Copy link
Contributor

We run into the same issue. The runtime upgrade didn't include any significant changes. We where already upgrades to 0.9.12. The logs show:

panicked at 'Externalities not allowed to fail within runtime: DefaultError', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/primitives/state-machine/src/ext.rs:201:58

Relay chain: westend runtime 9122 and client 0.9.12
parachain: client and runtime was also using the 0.9.12 branches (before and after the upgrade)

@bkchr
Copy link
Member

bkchr commented Nov 8, 2021

@weichweich can you give me the 0.9.11 based branch? Then I can try to run your parachain and upgrade to the given 0.9.12 based release.

@weichweich
Copy link
Contributor

@bkchr Our runtime with 0.9.11 was version 1.1.0 (https://github.com/KILTprotocol/mashnet-node/tree/1.1.0, 5209c7a19150f5033477f148a8329f711441a543)
We did the upgrade to 0.9.12 in version 1.1.1 (https://github.com/KILTprotocol/mashnet-node/tree/1.1.1, ba08a359936846cbfee4b7ebce3af1b0f884f321)

Now we are testing version 1.2.0 which fails. https://github.com/KILTprotocol/mashnet-node/tree/master

@bkchr
Copy link
Member

bkchr commented Nov 8, 2021

Ahh, so you are already on the 0.9.12 branch? And update 1.1.1 worked? But another upgrade, that uses the 0.9.12 branch is failing? Is that correct?

@weichweich
Copy link
Contributor

yes! The upgrade to 1.1.1 (with version 0.9.12) worked on all testnets (we will upgrade to it today on our live net).
The next upgrade after that is failing.

@weichweich
Copy link
Contributor

I also just noticed that my error happens in a different line. 🤔

@bkchr
Copy link
Member

bkchr commented Nov 8, 2021

@weichweich where is it failing for you?

@weichweich
Copy link
Contributor

@weichweich where is it failing for you?

/cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/primitives/state-machine/src/ext.rs:201:58

@rphmeier
Copy link
Contributor

rphmeier commented Nov 8, 2021

The best way to check whether the relay chain and the parachain synchronized the code upgrade correctly are to check the state of the paras module in the relay chain and compare to the value of :code in the parachain. If it's done right there should be no future code upgrade for the para either.

If they're desynced, there's either a bug in the paras pallet or parachain-system in Cumulus, depending how you look at it. And that'd definitely cause a stall.

I wonder if it also could be that the pvf Executor doesn't provide all host functions to the pvf.

@weichweich
Copy link
Contributor

@rphmeier
Our endpoints are public if you want to check. I tried to check and the paras > futureCodeUpgrades is empty. But the code looks different (:code on the parachain and paras.codeByHash on the relay chain).
parachain
relaychain

@pepyakin
Copy link
Contributor

pepyakin commented Nov 8, 2021

can confirm,

Parachain's state_getStorageHash(":code") is 0x2c45edfdbb4c2038b50b86c1f3e8dd0a6e6f61cf65aeb272d16be1ccc1a01a4a

whereas on relay-chain paras.currentCodeHash is 0x41b47379b238d00700719fe93871321011494d83e1c60d8da7a4f5a9dde8ef5b

@pepyakin
Copy link
Contributor

pepyakin commented Nov 8, 2021

ok, here is the reconstruction what happened:

On the parachain the last block is 329,958. The relay-parent of that block is 1,338,953. The upgrade was scheduled by parablock 329,943 (relay-parent=1,338,923).

From what I see 329,958 had to enact upgrade but did not. I make this conclusion looking at the sequence of events of the relay-chain, which seems to be working correctly:

(from most recent to least recent)

  • 1,338,955 0x861878a13466c2320b1b1c91efac055320c5e6359f418e79c94edfc4179168e7
    • candidate for parablock 329,958 was included
    • go ahead signal is removed
  • 1,338,954 0x8cb07c91f08581fe9269dbffd13676d46eab1c704b9c89c0b14a1832da005c49
    • in the post-state of this block we still have futureCodeHash, meaning at the next block enacted the upgrade.
    • go ahead signal is still present
    • in this block 329,958 was backed.
  • 1,338,953 0x4fd452fc03f3f637057aa7bc4014cefbae2cfc7d44f03380f130474d8d41bb75
    • go ahead signal is now set.
    • this relay-chain block was used as a parent for the parachain block that did not enact the upgrade.
  • 1,338,952 0xf208cbe8dd1016350ee23b4bc4e994bd69784241b7223de9add5b94d33d44bff
  • ...
  • 1,338,923 0xfb0676d9133b6677b1cdb54dd157559ef408a8b114d4692a5270cfbc4bbbedf9
    • this is when 329,943 the code upgrade was scheduled
  • 1,338,922 0x1bf205e024c4152aa37c16b21ab1715c8e9db07cdd65dfc7b1a49c5b56486828
    • this is when the upgrading parablock (329,943) was backed with the new code.

@rphmeier
Copy link
Contributor

rphmeier commented Nov 8, 2021

@weichweich

image

Your parachain is registered on the relay chain with ID 2001 but believes itself to be para 2000. This breaks the upgrade go-ahead signal code because it reads the signal for para 2000 instead of the signal for para 2001. The relay chain believes that the parachain has read the signal and will upgrade, whereas the parachain does not act on it and then bricks.

@GopherJ
Copy link
Contributor Author

GopherJ commented Nov 9, 2021

@rphmeier ok I just found that our submitted genesis is using wrong parachain id, didn't find it during our tests, I'm very very sad now :(

image

this should be due to one mistake on my side, I'm wondering what can we do to fix this...it's already live on polkadot auction

so there are two para id:

  • para id in genesis (we are wrong on this)
  • para id in extensions (we are correct on this)

will paras -> forceSetCurrentHead(para, newHead) work for us?

@GopherJ
Copy link
Contributor Author

GopherJ commented Nov 9, 2021

it seems that build-spec subcommand cannot have --parachain-id argument, which finally caused this, it's always using the parachain id in code (old one):

./target/debug/parallel build-spec --parachain-id 2012 --chain parallel

error: Found argument '--parachain-id' which wasn't expected, or isn't valid in this context

USAGE:
    parallel build-spec [FLAGS] [OPTIONS]

For more information try --help

@rphmeier
Copy link
Contributor

rphmeier commented Nov 9, 2021

@GopherJ Governance should be able to update the storage in the ParachainsInfo pallet to update it to the correct value.

@rphmeier
Copy link
Contributor

rphmeier commented Nov 9, 2021

@bkchr do we need a separate issue for the build-spec cli flag in Cumulus?

@rphmeier rphmeier closed this as completed Nov 9, 2021
@GopherJ
Copy link
Contributor Author

GopherJ commented Nov 9, 2021

@GopherJ Governance should be able to update the storage in the ParachainsInfo pallet to update it to the correct value.

which governance call? on parachain side? I didn't see an extrinsic for changing this

@rphmeier
Copy link
Contributor

rphmeier commented Nov 9, 2021

You can use system::setStorage directly (you can do almost anything with this call if you know what you're doing)

@GopherJ
Copy link
Contributor Author

GopherJ commented Nov 9, 2021

You can use system::setStorage directly (you can do almost anything with this call if you know what you're doing)

will try it, thanks

@weichweich
Copy link
Contributor

Yes! Thanks for all the help! We reset the parachain code on the relaychain which let the parachain produce blocks again, fixed the parachain ID, and force-scheduled the runtime upgrade from the relay chain.
This time the upgrade run through and worked like a charm! Thanks again for all the help!

@GopherJ
Copy link
Contributor Author

GopherJ commented Nov 10, 2021

@rphmeier hi, I still have some issues, I tried to to runtime upgrade on a runtime with correct parachain id (2085) but then found that currentCodeHash has been updated on relaychain but :code on parachain stays the same (old one)

what can this be? do you have an idea?

@pepyakin
Copy link
Contributor

Maybe those:

?

@GopherJ
Copy link
Contributor Author

GopherJ commented Nov 11, 2021

@pepyakin yes just found them yesterday

@stakeworld
Copy link
Contributor

@rphmeier, I got this same message today in an active paravalidating session on kusama, is it relevant?

polkadot[115589]: 2022-07-07 02:26:52 panicked at 'Externalities not allowed to fail within runtime: DefaultError', /root/.cargo/git/checkouts/substrate-56aa5b68bc705a80/0a29072/primitives/state-machine/src/ext.rs:189:58

There were two bursts in one parachain session this night. Everything seems to work normal after and the session was a grade A so i'm not to worried.

@pepyakin
Copy link
Contributor

pepyakin commented Jul 7, 2022

Interestingly, this does not seem to be an error coming from the polkadot node. DefaultError should be a string, but when it is compiled to no_std it is a unit-struct that is displayed as DefaultError which we see here.

From here it is not clear what emits this log message. Could it be emitted by a parachain?

@stakeworld
Copy link
Contributor

i'm running the default apt binary polkadot 0.9.25-5174e9ae75b on a default ubuntu 20.04 5.4.0-113-generic, got two burst of errors inside of a paravalidating session. I'm not sure how I can see its origin, will report back if it happens more often.

Jul 07 02:26:43 node01.stakeworld.nl polkadot[115590]: 2022-07-07 02:26:43 paniked at 'Externalities not allowed to fail within runtime: DefaultError', /root/.cargo/git/checkouts/substrate-56aa5b68bc705a80/0a29072/primitives/state-machine/src/ext.rs:189:58
Jul 07 02:26:43 node01.stakeworld.nl polkadot[115590]: 2022-07-07 02:26:43 paniked at 'Externalities not allowed to fail within runtime: DefaultError', /root/.cargo/git/checkouts/substrate-56aa5b68bc705a80/0a29072/primitives/state-machine/src/ext.rs:189:58
Jul 07 02:26:51 node01.stakeworld.nl polkadot[115590]: 2022-07-07 02:26:51 paniked at 'Externalities not allowed to fail within runtime: DefaultError', /root/.cargo/git/checkouts/substrate-56aa5b68bc705a80/0a29072/primitives/state-machine/src/ext.rs:189:58

@bkchr
Copy link
Member

bkchr commented Jul 16, 2022

Could it be emitted by a parachain?

Yes it is coming from a Parachain. For the relay chain, we don't use this code from inside wasm.

@stakeworld you can ignore this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@pepyakin @bkchr @rphmeier @weichweich @GopherJ @stakeworld and others