Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic scaling: runtime dependency tracking and enactment #3479

Merged
merged 53 commits into from Mar 21, 2024

Conversation

alindima
Copy link
Contributor

@alindima alindima commented Feb 26, 2024

Changes needed to implement the runtime part of elastic scaling: #3131, #3132, #3202

Also fixes #3675

TODOs:

Relies on the changes made in #3233 in terms of the inclusion policy and the candidate ordering

@alindima alindima marked this pull request as draft February 26, 2024 10:44
@alindima alindima added T8-polkadot This PR/Issue is related to/affects the Polkadot network. I5-enhancement An additional feature request. labels Feb 26, 2024
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good at first pass, but readability can certainly be improved.

I think we should also see if we can adjust apply_weight_limit candidate selection to account for elastic scaling.

polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
@sandreim
Copy link
Contributor

sandreim commented Mar 5, 2024

candidate_pending_availability runtime API is needed by collators. At first glance we might need to return all of them. or maybe just the last one.

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, left a couple of questions. Happy to approve once this is tested/burned-in.

Comment on lines -645 to -650
// In `Enter` context (invoked during execution) there should be no backing votes from
// disabled validators because they should have been filtered out during inherent data
// preparation (`ProvideInherent` context). Abort in such cases.
if context == ProcessInherentDataContext::Enter {
ensure!(!votes_from_disabled_were_dropped, Error::<T>::BackedByDisabled);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this error removed? is it because it was merged into a generic CandidatesFilteredDuringExecution error? i liked the specificity of the previous errors more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the original intention was to trade the specific errors for simplicity by using a catch all approach. I will look and see if we can keep it simple and have these errors specific or maybe logging these errors instead of returning them might achieve same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right. I generally added debug logs to the filtering functions called in sanitize_backed_candidates whenever a candidate is filtered and the reason why it was dropped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, checking that filtering filtered nothing at the outer most level is the most robust way to check.

polkadot/runtime/parachains/src/inclusion/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/paras_inherent/mod.rs Outdated Show resolved Hide resolved
polkadot/runtime/parachains/src/inclusion/mod.rs Outdated Show resolved Hide resolved
@alindima
Copy link
Contributor Author

bot bench polkadot-pallet --runtime=rococo --pallet=runtime_parachains::paras_inherent -v UPSTREAM_MERGE=n

@command-bot
Copy link

command-bot bot commented Mar 18, 2024

@alindima https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5559128 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=rococo --target_dir=polkadot --pallet=runtime_parachains::paras_inherent. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 3-24f8b8dd-8273-4919-8221-ad79448fa599 to cancel this command or bot cancel to cancel all commands in this pull request.

@alindima
Copy link
Contributor Author

While testing the runtime upgrade locally with zombienet, I found out that after the runtime upgrade, due to #64, availability_cores runtime API panics (the new runtime API is called with unmigrated storage).

availability_cores being such a critical API in all many parts of consensus, it seems that candidate backing and inclusion is stalled until the next session (when it recovers). I'm not sure which subsystem causes this, but I'm thinking of modifying the runtime API such that it will know how to work with both versions of the pallet storage format (v0 and v1).

Other than this, the PR should be ready

…=rococo --target_dir=polkadot --pallet=runtime_parachains::paras_inherent
@command-bot
Copy link

command-bot bot commented Mar 18, 2024

@alindima Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=rococo --target_dir=polkadot --pallet=runtime_parachains::paras_inherent has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5559128 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5559128/artifacts/download.

@alindima
Copy link
Contributor Author

Fixed the runtime API panic caused by #64 and reran benchmarks for westend and rococo

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rococo weights seem off. Westend look good.

@alindima
Copy link
Contributor Author

yeah, the weights for rococo are way off when comparing to the previous values. I'm betting that's because they were last updated in 2021. The difference for westend shows that there isn't a significant change

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @alindima ! I couldn't help it and still had a few nits, but it is good to go!

let freed = freed_concluded
.into_iter()
.map(|(c, _hash)| (c, FreedReason::Concluded))
.chain(freed_disputed.into_iter().map(|core| (core, FreedReason::Concluded)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced here, but a third enum variant Disputed would have done no harm 😶‍🌫️ (and also no need to fix it here)

// Cores 1, 2 and 3 are being made available in this block. Propose 6 more candidates (one
// for each core) and check that the right ones are successfully backed and the old ones
// enacted.
let config = default_config();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are not even sharing initialization, why is this not a separate test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid long and dubious test names :D that's arguably a bad reason but I didn't think too much about it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair.

polkadot/runtime/parachains/src/paras_inherent/tests.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @alindima

@eskimor eskimor enabled auto-merge March 21, 2024 09:47
@eskimor eskimor added this pull request to the merge queue Mar 21, 2024
Merged via the queue into master with commit 4842faf Mar 21, 2024
127 of 132 checks passed
@eskimor eskimor deleted the alindima/elastic-scaling-runtime branch March 21, 2024 10:44
dharjeezy pushed a commit to dharjeezy/polkadot-sdk that referenced this pull request Mar 24, 2024
…h#3479)

Changes needed to implement the runtime part of elastic scaling:
paritytech#3131,
paritytech#3132,
paritytech#3202

Also fixes paritytech#3675

TODOs:

- [x] storage migration
- [x] optimise process_candidates from O(N^2)
- [x] drop backable candidates which form cycles
- [x] fix unit tests
- [x] add more unit tests
- [x] check the runtime APIs which use the pending availability storage.
We need to expose all of them, see
paritytech#3576
- [x] optimise the candidate selection. we're currently picking randomly
until we satisfy the weight limit. we need to be smart about not
breaking candidate chains while being fair to all paras -
paritytech#3573

Relies on the changes made in
paritytech#3233 in terms of the
inclusion policy and the candidate ordering

---------

Signed-off-by: alindima <alin@parity.io>
Co-authored-by: command-bot <>
Co-authored-by: eskimor <eskimor@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
Status: Scheduled
4 participants