Force new era only if 1/3 validators is disabled.#3533
Conversation
|
We're still missing an implementation for the finality fall back from GRANDPA to BABE. We therefore cannot actually finalize anything if 1/3 get disabled and thus cannot switch, so this should probably happen lower like 1/5 or 1/6. cc @AlistairStewart |
Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>
Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>
Demi-Marie
left a comment
There was a problem hiding this comment.
Pinging @rphmeier ― does this make us vulnerable to attacks? If not, this should be fine.
kianenigma
left a comment
There was a problem hiding this comment.
I am not the biggest expert here (consensus stuff) but afaik this we are screwed only if 1/3 are disabled seem to be Grandpa-specific. I think it might be worthwhile to make this configurable in session:
/// Force a new era if we are left with less than the given ratio of validators as _active_.
ForceEraThreshold: Get<Perbill>
or sth like this.
See the FLP impossibility result - 1/3 offline is the boundary for asynchronous liveness. But yes, could be made configurable. Many protocols dealing with a common coin to achieve async liveness can only tolerate 1/5 or 1/7 byzantine. |
|
Ready for review. Added parametrisation and set to |
| } | ||
|
|
||
| parameter_types! { | ||
| pub const DisabledValidatorsThreshold: Perbill = Perbill::from_percent(33); |
There was a problem hiding this comment.
I'ts only a test, I don't think it matters at all.
|
I'm not convinced this is necessary and it's strictly less safe than before. It makes us more open to liveness attacks where the attacker cannot control the exact timing of each individual node that they take down or where they can't take down all nodes at once. Basically, by further delaying the replacement of offline or even outright Byzantine validators by upto 24 hours, we reduce our liveness tolerance. In the simplest case with 4 validators, then as soon as as one validator goes offline (accidentally or otherwise), then a Byzantine party has up to 24 hours in which to mount an attack to stall the network. In reality, the sooner we replace known faulty nodes the safer. |
|
on ice until I have proof from the research team that this is not strictly less safe. |
|
@gavofyork afaict, there is nothing currently in code preventing the same validator to be reelected in the next era, so forcing it does not guarantee that the offline/byzantine validators are replaced at all. |
|
@gavofyork If we rerun elections due to slashing, then small shashes can influence the randomness, maybe influencing parachain assignments, elections, smart contract results, etc., so this improves security for those functions. We cannot do 1/3rd here like I said above, so the issue is balancing safer randomness with any power attackers gain by luck or imprecise attacks. I gather: We currently oversupply validator candidates in Phragmen, which helps ensure all elected validators actually start when an era starts somehow? We cannot however reuse those extras when one gets kicked because we cannot expect a validator who fails to be elected to remain online? We must therefore rerun Phragmen to begin a new era, right? We could allow extras to make blocks and do secondary validity checks, so that they'd stay online for the profits, and so that inserting them cannot influence the randomness. We'll discuss the options some this week. @tomusdrw I discussed that issues in the new Slashing with NPoS (md) document, thanks to @rphmeier noticing our slashing issues. Anytime a nominator N gets slashed due to actions by a validator V then we should:
As a result, V should stay out until nominators manually reinstate them.
It'd remain possible for nominators to reinstate V in the next era with manual effort though. We could permit reinstating in the next session even without an era change, but not without manual approval since this approval switches the slashing computation from a max to a sum. |
incorrect. you might be thinking of our usage of phragmen for council elections.
we don't rerun them immediately; only at the end of the session. it has no effect on the rotation of babe authorities - the next epoch's validator set will be the same. all it means is that we rotate them off a bit faster than we otherwise would. for polkadot, assuming a validator wanted to grief the network in this way, they would essentially rotate themselves off/on every epoch, thereby causing 6x more elections than normal (there are only 6 epochs/sessions in an era). this isn't such a big deal.
this currently has no effect. -- the only question in my mind is: is a 6x increase in elections (that can be caused by only 1 bad validator) sufficiently annoying to bring in this logic that would essentially increase the requirement from only 1 validator to some minimum number of validators. |
|
Anytime we rerun elections we should wait at least one full sessions so that nobody can bias their voting based on the expected randomness. I gather this is already baked into the code pretty deep, so actually not problem there. I doubt a the factor of 6 matters here. I'll push Al to weigh in. |
|
Updated to 1/6 of validators as per @AlistairStewart 's suggestion. |
* Force new era only when over third validators is disabled. * Update srml/session/src/lib.rs Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com> * Update srml/staking/src/lib.rs Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com> * Parametrize the threshold. * Bump runtime version. * Update node/runtime/src/lib.rs * Fix build. * Fix im-online test.
Currently validators that are slashed are disabled till the end of the era (i.e. we ignore anything they do). We also trigger a new era right after the end of current session, which seems completely unnecessary.
This PR forces a new era only if we disable at least one third of the validators to prevent stalling block production.
Marking as WiP as I'm unsure if this is actually needed either or if we should rather never force a new era.