Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FRAME] Static analysis for HDH/ME code #165

Open
ggwpez opened this issue Jun 12, 2023 · 9 comments
Open

[FRAME] Static analysis for HDH/ME code #165

ggwpez opened this issue Jun 12, 2023 · 9 comments
Labels
T1-FRAME This PR/Issue is related to core FRAME, the framework.

Comments

@ggwpez
Copy link
Member

ggwpez commented Jun 12, 2023

@gavofyork brought up the idea of doing static code analysis to ensure the runtime's integrity in the presence of MBMs.
(Trying my best to recapitulate this)

All code that needs to execute by a specific deadline (aka. HDH/ME) must never read or write storage that is written by
a pallet that can do MBMs. Or rather: by MBMs themselves. The fact that MBMs normally operate on pallets is, I think, irrelevant.
Currently this is not checked and could lead to cases where on_initialize calls into a pallet or reads storage that is under the effect of a MBM and thereby results in undefined behaviour.

Entrypoints for this analysis would be on_initialize, on_finalize and inherents. The analysis would probably need to run on the runtime definition itself to be able to consider all callbacks and hooks that get configured in the pallets.
It needs to check reachability to the storage API from all HDH/ME hooks and analyze their arguments to be possibly overlapping with MBM code. The output should present all traces to possible UB and error if anything is found.

For the implementation: no idea - just experiment.
Maybe in a build script, or as external tool that we can then call in a build script. Could also be some macro magic on the construct_runtime itself 🤷

Taxonomy

  • HDH (Hard Deadline Hook): Something that must execute by a specific deadline (ie. every block or within the next 10 blocks).
  • ME (Mandatory Extrinsic): Extrinsic that must execute.
  • MBM (Multi Block Migration): A storage migration that spans over multiple blocks. The storage that is being migrated must be shielded from external access and modification.
  • UB (Undefined behaviour): Anything can happen.
@ggwpez ggwpez changed the title [FRAME] Static code analysis for HDH/ME [FRAME] Static analysis for HDH/ME code Jun 12, 2023
@xlc
Copy link
Contributor

xlc commented Jun 12, 2023

the more I read about all the considerations required for MBM, the less convinced for me that’s a good idea. can’t we just provide better lazy migration helpers? we already use it in pallet balances and it is much less likely to introduce unexpected side effects

@bkchr
Copy link
Member

bkchr commented Jun 13, 2023

The "problem" with lazy migrations is that you need to keep them around "forever". You will also double the number of reads.

IMO if there is a multi block migration running we should disable as much as possible of all functionality. I think the best would be some mode where only important things like System::on_initialize() and some required consensus stuff is running. These things should have a very small "state surface" and if we need to migrate there something it needs to happen at the first block with the new runtime.

@xlc
Copy link
Contributor

xlc commented Jun 13, 2023

We can easily perform eager migration during on_idle hook, or just a permisionless call to touch & migrate something and in that way we can ensure the migration will be completed at some point. Again, this is implemented in pallet-balances. Double read isn't a real issue.

Another issue is about block explorer / indexer. I am sure they will have some quality time when figuring out how to deal with data format change in migration blocks.

@bkchr
Copy link
Member

bkchr commented Jun 13, 2023

Another issue is about block explorer / indexer. I am sure they will have some quality time when figuring out how to deal with data format change in migration blocks.

The same argument also applies to lazy migrations. Having a dedicated window for migrations is probably better for indexers.

@xlc
Copy link
Contributor

xlc commented Jun 13, 2023

Not really. The state for lazy migration are always consistent (if done right).

For example, lets say we have a map (key: u32) => (value: u32) and we need a simple migration to increase value by 1. Using the MBM approach, it will modify the value in place. It is very very hard for indexer/dapp/client to determine if a value for a given key is upgraded or not and if it should use the old logic to handle it or the new logic.

With lazy migration, there are many ways to do it, and one way could be just add a new map and moving values from old map to new map. indexer/dapp/client will need some extra logic to access both map to get the value but the implementation is simple.

You will notice the lazy migration method can also be used with MBM. They can be compatible. It is just that the lazy migration approach ensures proper handling of the values during migration. Both the old and new data must be consistent by design. External modification / user transactions must be supported and not resulting corrupted data. Compare to the single pass migration approach (or MBM), it is kind of expected that the migration code will break consistency during migration and there cannot be external modification or everything will break apart. IF MBM code can ensure the data consistence during migration period, then we won't have any this kind of issue and no need to pause chain.

Having a dedicated window for migrations is probably better for indexers.

Well, they need to handle the migrating state during this window, having it longer or shorter doesn't reduce the amount of code required to handle such state.

@ggwpez
Copy link
Member Author

ggwpez commented Jun 13, 2023

I think some migrations are much easier to write as MBM than as lazy. so far the lazy migrations were all on the "easier" IIRC.
But it does incur a lot of complexity to ensure storage consistency. Once upstream pallets starts using MBMs they will be more or less mandatory for parachains.

IF MBM code can ensure the data consistence during migration period, then we won't have any this kind of issue and no need to pause chain.

Hm did not consider this… but yes. In theory we would only need to pause transactions that could call into MBM state. Without static analysis this would be very difficult to ensure though. So pausing everything is an intermediate step.

Is your main concern about pausing the chain or that some storage invariant gets violated?

@xlc
Copy link
Contributor

xlc commented Jun 13, 2023

Pausing the chain for like 5 mins is fine to me but any chance of storage invariant gets violated is big no.

Yeah it takes more work to ensure the storages are always consistent but so what? If it is doable, not a crazy amount of work, and makes code safer, why not? I am sure doing this static analysis will take a crazy amount of work and may still not get you 100% confidence.

Take another example, say if we want to burn 5% of the token for every account for whatever silly reason, lazy migration will reduce the balance for a single account, and update total issuance. Not efficient but safe. Single pass migration will update the balance for all the accounts, and update the total issuance. We can totally make the MBM version to update balances for X accounts, and update the total issuance at end of this step. So that the data is always consistent to observers. Instead of, update the total issuance at very beginning or very end that will lead to data inconsistency during migrations.

So I guess what I am really is asking for is to make MBM be consistent at the end of each individual step.

@kianenigma
Copy link
Contributor

For the matter of lazy migrations vs MBM, I don't see them clashing with each other as much as being discussed here.

Lazy migrations are and will remain a viable approach for circumstances where the code overhead of ensuring consistency is acceptable, and MBMs are more akin to a scenario where one is willing to "almost pause" their chain and get it done with faster.

If I were an indexer/explorer, I would listen for the existence of MBMs and also discourage user interactions while MBMs are ongoing.

@xlc
Copy link
Contributor

xlc commented Jun 14, 2023

The main issue I have is that the decision from FRAME developer will force into parachain developers and therefore FRAME pallets should default to the safer approach unless there are some good reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T1-FRAME This PR/Issue is related to core FRAME, the framework.
Projects
Status: Shaping up
Development

No branches or pull requests

6 participants