Adds `stake-tracker` pallet and integrates with the staking pallet #1933

gpestana · 2023-10-18T18:29:34Z

This PR adds and integrates the stake-tracker pallet in the staking system.

Goals:

To keep a TargetList list of validators strictly and always sorted by their approval votes. Approvals consist of validator's self-vote and the sum of all the corresponding nominations across all the system.
The TargetList sorting must be always kept up to date, even in the event of new nomination updates, nominator/validator slashes and rewards. The stake-tracker pallet must ensure that the scores of the targets are always up to date and the targets are sorted by score at all time.
To keep a VoterList list of voters that may be either 1) strictly and always sorted by their score (i.e. bonded stake of an individual voter) or 2) loosely sorted list. Choosing between mode 1) and 2) can be done through stake-tracker configurations.

TL;DR 2nd order changes

An idle or unbonded target account may have a node in T::TargetList, insofar as there are still nominators nominating it;
New nominations must contain either a validator or idle staker. Otherwise, calling Staking::nominate will fail;
Staking::nominate will remove all the duplicate nominations implicitly, if any.
- Note: the migration will remove all the duplicate nominations for all the nominators in the system and all the dangling nominations.
If a nominator's bond drops to 0 after a slash, the nominator will be chilled.

Why?

Currently, we select up to N registered validators to be part of the snapshot for the next election. When the number of registered validators in the system exceeds that number, we'll need to have an efficient way to select the top validators with more approvals stake to construct the snapshot.

Thus, we need to keep list of validators sorted by their approval stakes at all time. This means that any update to nominations and their stake (due to slashing, bonding extra, rewards, etc) needs to be reflected in the targets nominated by the stake. Enters the pallet-stake-tracker: this pallet keeps track of relevant staking events (through implementing the trait OnStakingEvent) and updates the target bags list with the target's approvals stake accordingly.

In order to achieve this, the target list must keep track of all target stashes that have at least one nomination in the system (i.e. their approval_stake > 0), regardless of their state. This means that it needs to keep track of the stake of active validators, idle validator and even targets that are not validators anymore but have still nominations lingering.

How?

The stake-tracker pallet implements the OnStakingUpdate trait to listen to staking events and multiplexes those events to one or multiple types (e.g. pallets). The stake tracker pallet is used as a degree of indirection to maintain the target and voter semi-sorted lists (implemented by the bags list pallet) up to date.

The main goal is that all the updates to the targets and voters lists are performed at each relevant staking event through the stake-tracker pallet. However, the voter and target list reads are performed directly through the SortedListProvider set in the staking's config.

Changes to assumptions in chilled and removed stakers

This PR changes some assumptions behind chilled stakers: the chilled/idle validators will be kept in the Target lists, and only removed from the target list when:

It's ledger is unbonded and
It's approval voting score is zero (i.e. no other stakers are nominating it).

This allows the stake-tracker to keep track of the chilled validators's and respective score after the validator is chilled and completely unbonds. This way, when a validator sets the intention to re-validate, the target's score is brought up with the correct sum of approvals in the system (i.e. self stake + all current nominations, which have been set previous to the re-validation).

Changes to `Call::nominate`

New nominations can only be performed on active or chilling validators. "Moot" nominations still exist (i.e. nominations that points at an inactive/inexistent validator), but only when a validator stops nominating or is chilled (in which case it may remain in the target list if the approvals are higher than 0).

In addition, the runtime ensures that each nominator does not nominates a target more than once at a time. This is achieved by deduplicating the nominations in the extrinsic Staking::nominate.

Changes to `OnStakingUpdate`

1. New methods

Added a couple more methods to the OnStakingUpdate trait in order to differentiate removed stakers from idle (chilling) stakers. For a rationale on why this is needed see in this discussion #1933 (comment).

pub trait OnStakingUpdate<AccountId, Balance> {
  // snip

  /// Fired when an existng nominator becomes idle.
  ///
  /// An idle nominator stops nominating but its stake state should not be removed.
  fn on_nominator_idle(_who: &AccountId, _prev_nominations: Vec<AccountId>) {}
  
  // snip

  /// Fired when an existing validator becomes idle.
  ///
  /// An idle validator stops validating but its stake state should not be removed.
  fn on_validator_idle(_who: &AccountId) {}
}

2. Refactor existing methods for safety

With this refactor, the event emitter needs to explicitly call the events with more information about the event. The advantage is that this new interface design prevents potential issues with data sync, e.g. the event emitter does not necessarily need to update the staking state before emitting the events and the OnStakingUpdate implementor does not need to rely as much on the staking interface, making the interface less error prone.

Changes to `SortedListProvider`

Added a new method to the trait, gated by try-runtime, which returns whether a given node is in the correct position in the list or not given its current score. This method will help with the try state checks for the target list.

pub trait SortedListProvider<AccountId> {
  // snip

  /// Returns whether the `id` is in the correct bag, given its score.
  ///
  /// Returns a boolean and it is only available in the context of `try-runtime` checks.
  #[cfg(feature = "try-runtime")]
  fn in_position(id: &AccountId) -> Result<bool, Self::Error>;
}

Migrations

The migration code has been validated against the Polkadot using the externalities tests in polkadot/runtime/westend/src/lib.rs. Upon running the migrations, we ensure that:

All validators have been added to the target list with their correct approvals score (as per the try-state checks).
All nominations are "cleaned" (see def. of clean above)
Try-state checks related to stake-tracker and approvals pass.

Check #4673 for more info on migrations and related tests.

Note that the migrations will "clean" the current nominations in the system namely:

Migration removes duplicate nominations in all nominators, if they exist (changes by calling fn do_add_nominator with dedup nominations)
Migration removes all the non active validator nominations to avoid adding dangling nominations (changes by calling fn do_add_nominator if necessary)

Weight complexity

Keeping both target and voter list sorted based on their scores requires their scores to be up to date after single operations (add nominator/validator, update stake, etc) and composite staking events (validator slashing and payouts). See https://hackmd.io/FH8Uhi2aQ5mD0aMm-BbqMQ for more details and back of the envelope calculations.

This PR #4686 shows how the target list affects the staking's MaxExposurePageSize based on benchmarks with different modes. In sum:

 == Strict VoterList sorting mode
 - Max. page_size: 1984
 - Weight { ref_time: 1470154955707, proof_size: 7505385 }

 == Lazy VoterList sorting mode
 - Max page_size: 2496
 - Weight { ref_time: 1469080809430, proof_size: 9437673 }

 == No stake-tracker
 - Max page_size: 3008
 - Weight { ref_time: 1474536186486, proof_size: 11362971 }

To do later

A. Remove legacy `CurrencyToVote`

github tracking issue: Remove legacy CurrencyToVote in staking #4725

Remove the need for the CurrencyToVote converter type in the pallet-staking. This type converts coverts from BalanceOf<T> to a u64 "vote" type, and from a safe u128 (i.e. ExtendedBalance) back to BalanceOf<T>. In both conversion directions, the total issuance of the system must be provided.

The main reason for this convertion is that the current phragmen implementation does not correctly support types u128 as the main type. Thus, the conversion between balance (u128) and the supported "vote" u64 type.

Relying on the current issuance will be a problem with the staking parachain (let's assume that the staking runtime is not deployed in AH). In addition, it removing the need for this conversion will simplify and make it cheaper to run the stake-tracker and the associated list updates.

To finish

throughout testing in stake-tracker pallet and integration with the Staking pallet
bring back nominations from slashed and chilled validator after re-validate
throughout try-runtime checks in stake-tracker which also run after the staking pallet tests
figure out a way to do switch between target list providers (to allow for a phased rollout of this feature)
benchmarks for Call::drop_dangling_nomination
migrations (requires MBMs)
test MBM migrations (see Stake tracker improvements (migration and try-state checks OK in Polkadot) #4673)
migrate + follow-chain in Polkadot

Closes #442

substrate/frame/stake-tracker/src/lib.rs

substrate/frame/stake-tracker/src/tests.rs

substrate/frame/stake-tracker/src/mock.rs

substrate/frame/staking/src/pallet/mod.rs

substrate/frame/stake-tracker/src/lib.rs

substrate/frame/staking/src/lib.rs

substrate/frame/staking/src/pallet/impls.rs

substrate/frame/staking/src/tests.rs

command-bot · 2024-06-03T13:40:10Z

@gpestana https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6383364 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=westend --target_dir=polkadot --pallet=pallet_staking. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 3-a21c1cbc-499b-49d0-857d-5ac845653e0d to cancel this command or bot cancel to cancel all commands in this pull request.

command-bot · 2024-06-03T13:42:50Z

@gpestana Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=westend --target_dir=polkadot --pallet=pallet_staking has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6383364 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6383364/artifacts/download.

gpestana · 2024-06-03T16:20:29Z

bot bench polkadot-pallet --runtime=westend --pallet=pallet_staking

command-bot · 2024-06-03T16:20:35Z

@gpestana https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6387304 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=westend --target_dir=polkadot --pallet=pallet_staking. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 6-6ed494ea-75a8-4828-a8ae-cf3490eada80 to cancel this command or bot cancel to cancel all commands in this pull request.

…=westend --target_dir=polkadot --pallet=pallet_staking

command-bot · 2024-06-03T18:12:21Z

@gpestana Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=westend --target_dir=polkadot --pallet=pallet_staking has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6387304 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6387304/artifacts/download.

gpestana · 2024-06-03T18:53:15Z

bot bench polkadot-pallet --pallet=pallet_staking

command-bot · 2024-06-03T18:53:20Z

@gpestana https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6388231 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=rococo --target_dir=polkadot --pallet=pallet_staking. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 8-00657f8c-1b16-4ad3-a9ec-b4a06016b5e2 to cancel this command or bot cancel to cancel all commands in this pull request.

command-bot · 2024-06-03T19:32:11Z

@gpestana Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=rococo --target_dir=polkadot --pallet=pallet_staking has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6388231 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6388231/artifacts/download.

kianenigma

I have reviewed most of this, and looks mainly good to me.

The only major issue I see is that the MBM step is way too small, and prefer not to pause the chain for a long long time.

polkadot/runtime/westend/src/weights/pallet_staking.rs

kianenigma · 2024-06-10T00:28:08Z

polkadot/runtime/westend/src/bag_thresholds.rs

+];
+
+/// Upper thresholds delimiting the targets bag list.
+pub const TARGET_THRESHOLDS: [u128; 200] = [


Is the distribution of initial validators in these bags good? We could go for a different bag distribution for targets, given that approval stakes are generally much higher. Just an option.

substrate/frame/election-provider-support/src/lib.rs

kianenigma · 2024-06-10T02:19:27Z

substrate/primitives/staking/src/lib.rs


-	/// Fired when a portion of a staker's balance has been withdrawn.
-	fn on_withdraw(_stash: &AccountId, _amount: Balance) {}
+/// Representation of the `OnStakingUpdate` events.


This gives me a possible refactoring idea which I want to share but I don't necessarily want you to do, unless if you see clear benefits to it:

trait OnStakingUpdate { fn update(event: OnStakingUpdateEvent) {} } pub enum OnStakingUpdateEvent { .. }

And now we just have the events listed once :D

substrate/frame/staking/stake-tracker/src/lib.rs

kianenigma · 2024-06-10T02:58:37Z

substrate/frame/staking/stake-tracker/src/mock.rs

+}
+
+impl pallet_balances::Config for Test {
+	type Balance = Balance;


Let's use pallet_balances::TestDefaultConfig please :)

I am generally going to shamelessly be more demanding on requesting people to use our latest features internally :D

kianenigma · 2024-06-10T02:59:49Z

substrate/frame/staking/src/migrations/v13_stake_tracker/mod.rs

+
+/// V13 Multi-block migration to introduce the stake-tracker pallet.
+///
+/// A step of the migration consists of processing one nominator in the [`Nominators`] list or one


This means the migration would take thousands of blocks, right?

not great, as all other extrinsics are blocked while this happens.

For some reason, from reading the docs first time, I got the impression that the migrations logic would fit as many steps in a block as possible. After revisiting the code and docs again, I see that's not the case and that a step will be executed once per block as you mentioned.

I will refactor this code accordingly and try to fit as many nominator migration per block as possible, thanks!

kianenigma · 2024-06-10T03:03:41Z

substrate/frame/staking/src/migrations/v13_stake_tracker/mod.rs

+		if meter.remaining().any_lt(required) {
+			return Err(SteppedMigrationError::InsufficientWeight { required });
+		}
+


What is the actual outcome when we return this? the chain will ditch this migration and move on? or just this step and it will retry?

substrate/frame/staking/src/migrations/v13_stake_tracker/mod.rs

kianenigma · 2024-06-10T03:08:37Z

substrate/frame/staking/src/migrations/v13_stake_tracker/mod.rs

+			// if no nominations are left, chill the nominator.
+			let _ = <Pallet<T> as StakingInterface>::chill(&who)
+				.map_err(|e| {
+					log!(error, "error when chilling {:?}", who);


How many of these we have in Polkadot now?

paritytech-cicd-pr · 2024-06-13T13:06:21Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6459705

Adds stake-tracker pallet and integrates with the staking pallet

5803d69

gpestana added the T1-FRAME This PR/Issue is related to core FRAME, the framework. label Oct 18, 2023

gpestana self-assigned this Oct 18, 2023

gpestana requested review from a team October 18, 2023 18:29

gpestana marked this pull request as draft October 18, 2023 18:29

gpestana mentioned this pull request Oct 18, 2023

Adds stake-tracker pallet #1654

Closed

1 task

gpestana added 3 commits October 18, 2023 23:19

Comments and other nit fixes

c7ad6bf

Removes the untracked stake

cb393b9

Merge branch 'master' into gpestana/stake-tracker_integration

84a3b71