[FRAME] Static analysis for HDH/ME code #165

ggwpez · 2023-06-12T11:18:18Z

@gavofyork brought up the idea of doing static code analysis to ensure the runtime's integrity in the presence of MBMs.
(Trying my best to recapitulate this)

All code that needs to execute by a specific deadline (aka. HDH/ME) must never read or write storage that is written by
a pallet that can do MBMs. Or rather: by MBMs themselves. The fact that MBMs normally operate on pallets is, I think, irrelevant.
Currently this is not checked and could lead to cases where on_initialize calls into a pallet or reads storage that is under the effect of a MBM and thereby results in undefined behaviour.

Entrypoints for this analysis would be on_initialize, on_finalize and inherents. The analysis would probably need to run on the runtime definition itself to be able to consider all callbacks and hooks that get configured in the pallets.
It needs to check reachability to the storage API from all HDH/ME hooks and analyze their arguments to be possibly overlapping with MBM code. The output should present all traces to possible UB and error if anything is found.

For the implementation: no idea - just experiment.
Maybe in a build script, or as external tool that we can then call in a build script. Could also be some macro magic on the construct_runtime itself 🤷

Taxonomy

HDH (Hard Deadline Hook): Something that must execute by a specific deadline (ie. every block or within the next 10 blocks).
ME (Mandatory Extrinsic): Extrinsic that must execute.
MBM (Multi Block Migration): A storage migration that spans over multiple blocks. The storage that is being migrated must be shielded from external access and modification.
UB (Undefined behaviour): Anything can happen.

The text was updated successfully, but these errors were encountered:

xlc · 2023-06-12T21:55:07Z

the more I read about all the considerations required for MBM, the less convinced for me that’s a good idea. can’t we just provide better lazy migration helpers? we already use it in pallet balances and it is much less likely to introduce unexpected side effects

bkchr · 2023-06-13T07:49:40Z

The "problem" with lazy migrations is that you need to keep them around "forever". You will also double the number of reads.

IMO if there is a multi block migration running we should disable as much as possible of all functionality. I think the best would be some mode where only important things like System::on_initialize() and some required consensus stuff is running. These things should have a very small "state surface" and if we need to migrate there something it needs to happen at the first block with the new runtime.

xlc · 2023-06-13T09:05:12Z

We can easily perform eager migration during on_idle hook, or just a permisionless call to touch & migrate something and in that way we can ensure the migration will be completed at some point. Again, this is implemented in pallet-balances. Double read isn't a real issue.

Another issue is about block explorer / indexer. I am sure they will have some quality time when figuring out how to deal with data format change in migration blocks.

bkchr · 2023-06-13T10:13:22Z

Another issue is about block explorer / indexer. I am sure they will have some quality time when figuring out how to deal with data format change in migration blocks.

The same argument also applies to lazy migrations. Having a dedicated window for migrations is probably better for indexers.

xlc · 2023-06-13T10:40:50Z

Not really. The state for lazy migration are always consistent (if done right).

For example, lets say we have a map (key: u32) => (value: u32) and we need a simple migration to increase value by 1. Using the MBM approach, it will modify the value in place. It is very very hard for indexer/dapp/client to determine if a value for a given key is upgraded or not and if it should use the old logic to handle it or the new logic.

With lazy migration, there are many ways to do it, and one way could be just add a new map and moving values from old map to new map. indexer/dapp/client will need some extra logic to access both map to get the value but the implementation is simple.

You will notice the lazy migration method can also be used with MBM. They can be compatible. It is just that the lazy migration approach ensures proper handling of the values during migration. Both the old and new data must be consistent by design. External modification / user transactions must be supported and not resulting corrupted data. Compare to the single pass migration approach (or MBM), it is kind of expected that the migration code will break consistency during migration and there cannot be external modification or everything will break apart. IF MBM code can ensure the data consistence during migration period, then we won't have any this kind of issue and no need to pause chain.

Having a dedicated window for migrations is probably better for indexers.

Well, they need to handle the migrating state during this window, having it longer or shorter doesn't reduce the amount of code required to handle such state.

ggwpez · 2023-06-13T10:58:34Z

I think some migrations are much easier to write as MBM than as lazy. so far the lazy migrations were all on the "easier" IIRC.
But it does incur a lot of complexity to ensure storage consistency. Once upstream pallets starts using MBMs they will be more or less mandatory for parachains.

IF MBM code can ensure the data consistence during migration period, then we won't have any this kind of issue and no need to pause chain.

Hm did not consider this… but yes. In theory we would only need to pause transactions that could call into MBM state. Without static analysis this would be very difficult to ensure though. So pausing everything is an intermediate step.

Is your main concern about pausing the chain or that some storage invariant gets violated?

xlc · 2023-06-13T22:03:36Z

Pausing the chain for like 5 mins is fine to me but any chance of storage invariant gets violated is big no.

Yeah it takes more work to ensure the storages are always consistent but so what? If it is doable, not a crazy amount of work, and makes code safer, why not? I am sure doing this static analysis will take a crazy amount of work and may still not get you 100% confidence.

Take another example, say if we want to burn 5% of the token for every account for whatever silly reason, lazy migration will reduce the balance for a single account, and update total issuance. Not efficient but safe. Single pass migration will update the balance for all the accounts, and update the total issuance. We can totally make the MBM version to update balances for X accounts, and update the total issuance at end of this step. So that the data is always consistent to observers. Instead of, update the total issuance at very beginning or very end that will lead to data inconsistency during migrations.

So I guess what I am really is asking for is to make MBM be consistent at the end of each individual step.

kianenigma · 2023-06-14T07:54:02Z

For the matter of lazy migrations vs MBM, I don't see them clashing with each other as much as being discussed here.

Lazy migrations are and will remain a viable approach for circumstances where the code overhead of ensuring consistency is acceptable, and MBMs are more akin to a scenario where one is willing to "almost pause" their chain and get it done with faster.

If I were an indexer/explorer, I would listen for the existence of MBMs and also discourage user interactions while MBMs are ongoing.

xlc · 2023-06-14T09:32:43Z

The main issue I have is that the decision from FRAME developer will force into parachain developers and therefore FRAME pallets should default to the safer approach unless there are some good reasons.

ggwpez added the S0-design label Jun 12, 2023

ggwpez changed the title ~~[FRAME] Static code analysis for HDH/ME~~ [FRAME] Static analysis for HDH/ME code Jun 12, 2023

kianenigma mentioned this issue Jun 15, 2023

Guest Lectures on Static Analysis for Rust: tracking issue Polkadot-Blockchain-Academy/pba-content#609

Closed

ggwpez added the U4-some_day_maybe label Jun 16, 2023

ggwpez mentioned this issue Aug 24, 2023

Buffer Downward Messages for Parachains during MBMs #480

Open

juangirini added the T1-runtime label Aug 24, 2023

juangirini transferred this issue from paritytech/substrate Aug 24, 2023

the-right-joyce added T1-FRAME This PR/Issue is related to core FRAME, the framework. and removed S0-design labels Aug 25, 2023

kianenigma mentioned this issue Sep 26, 2023

Basic example pallet refactor #1707

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FRAME] Static analysis for HDH/ME code #165

[FRAME] Static analysis for HDH/ME code #165

ggwpez commented Jun 12, 2023

xlc commented Jun 12, 2023

bkchr commented Jun 13, 2023

xlc commented Jun 13, 2023 •

edited

bkchr commented Jun 13, 2023

xlc commented Jun 13, 2023 •

edited

ggwpez commented Jun 13, 2023 •

edited

xlc commented Jun 13, 2023 •

edited

kianenigma commented Jun 14, 2023

xlc commented Jun 14, 2023

[FRAME] Static analysis for HDH/ME code #165

[FRAME] Static analysis for HDH/ME code #165

Comments

ggwpez commented Jun 12, 2023

Taxonomy

xlc commented Jun 12, 2023

bkchr commented Jun 13, 2023

xlc commented Jun 13, 2023 • edited

bkchr commented Jun 13, 2023

xlc commented Jun 13, 2023 • edited

ggwpez commented Jun 13, 2023 • edited

xlc commented Jun 13, 2023 • edited

kianenigma commented Jun 14, 2023

xlc commented Jun 14, 2023

xlc commented Jun 13, 2023 •

edited

xlc commented Jun 13, 2023 •

edited

ggwpez commented Jun 13, 2023 •

edited

xlc commented Jun 13, 2023 •

edited