Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Force new era only if 1/3 validators is disabled.#3533

Merged
gavofyork merged 11 commits intomasterfrom
td-new-era
Sep 19, 2019
Merged

Force new era only if 1/3 validators is disabled.#3533
gavofyork merged 11 commits intomasterfrom
td-new-era

Conversation

@tomusdrw
Copy link
Copy Markdown
Contributor

@tomusdrw tomusdrw commented Sep 2, 2019

Currently validators that are slashed are disabled till the end of the era (i.e. we ignore anything they do). We also trigger a new era right after the end of current session, which seems completely unnecessary.

This PR forces a new era only if we disable at least one third of the validators to prevent stalling block production.

Marking as WiP as I'm unsure if this is actually needed either or if we should rather never force a new era.

@tomusdrw tomusdrw added A0-please_review Pull request needs code review. M4-core labels Sep 2, 2019
@tomusdrw tomusdrw requested a review from kianenigma as a code owner September 2, 2019 15:55
@burdges
Copy link
Copy Markdown

burdges commented Sep 2, 2019

We're still missing an implementation for the finality fall back from GRANDPA to BABE. We therefore cannot actually finalize anything if 1/3 get disabled and thus cannot switch, so this should probably happen lower like 1/5 or 1/6. cc @AlistairStewart

tomusdrw and others added 2 commits September 3, 2019 10:58
Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>
Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@Demi-Marie Demi-Marie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @rphmeier ― does this make us vulnerable to attacks? If not, this should be fine.

Copy link
Copy Markdown
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not the biggest expert here (consensus stuff) but afaik this we are screwed only if 1/3 are disabled seem to be Grandpa-specific. I think it might be worthwhile to make this configurable in session:

/// Force a new era if we are left with less than the given ratio of validators as _active_.
ForceEraThreshold: Get<Perbill>

or sth like this.

@rphmeier
Copy link
Copy Markdown
Contributor

rphmeier commented Sep 5, 2019

we are screwed only if 1/3 are disabled seem to be Grandpa-specific.

See the FLP impossibility result - 1/3 offline is the boundary for asynchronous liveness. But yes, could be made configurable. Many protocols dealing with a common coin to achieve async liveness can only tolerate 1/5 or 1/7 byzantine.

@tomusdrw tomusdrw changed the title WiP: Force new era only if 1/3 validators is disabled. Force new era only if 1/3 validators is disabled. Sep 6, 2019
@tomusdrw
Copy link
Copy Markdown
Contributor Author

tomusdrw commented Sep 6, 2019

Ready for review. Added parametrisation and set to 1/5 for node-runtime.

Copy link
Copy Markdown
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@kianenigma kianenigma added A8-looksgood and removed A0-please_review Pull request needs code review. labels Sep 6, 2019
}

parameter_types! {
pub const DisabledValidatorsThreshold: Perbill = Perbill::from_percent(33);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 33 here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'ts only a test, I don't think it matters at all.

@gavofyork
Copy link
Copy Markdown
Member

I'm not convinced this is necessary and it's strictly less safe than before.

It makes us more open to liveness attacks where the attacker cannot control the exact timing of each individual node that they take down or where they can't take down all nodes at once.

Basically, by further delaying the replacement of offline or even outright Byzantine validators by upto 24 hours, we reduce our liveness tolerance. In the simplest case with 4 validators, then as soon as as one validator goes offline (accidentally or otherwise), then a Byzantine party has up to 24 hours in which to mount an attack to stall the network. In reality, the sooner we replace known faulty nodes the safer.

CC @AlistairStewart

@gavofyork
Copy link
Copy Markdown
Member

on ice until I have proof from the research team that this is not strictly less safe.

@tomusdrw
Copy link
Copy Markdown
Contributor Author

tomusdrw commented Sep 8, 2019

@gavofyork afaict, there is nothing currently in code preventing the same validator to be reelected in the next era, so forcing it does not guarantee that the offline/byzantine validators are replaced at all.
Actually before we do elections they are at least disabled, if they are elected again they are considered back online afaict.

@burdges
Copy link
Copy Markdown

burdges commented Sep 8, 2019

@gavofyork If we rerun elections due to slashing, then small shashes can influence the randomness, maybe influencing parachain assignments, elections, smart contract results, etc., so this improves security for those functions. We cannot do 1/3rd here like I said above, so the issue is balancing safer randomness with any power attackers gain by luck or imprecise attacks.

I gather: We currently oversupply validator candidates in Phragmen, which helps ensure all elected validators actually start when an era starts somehow? We cannot however reuse those extras when one gets kicked because we cannot expect a validator who fails to be elected to remain online? We must therefore rerun Phragmen to begin a new era, right?

We could allow extras to make blocks and do secondary validity checks, so that they'd stay online for the profits, and so that inserting them cannot influence the randomness.

We'll discuss the options some this week.

@tomusdrw I discussed that issues in the new Slashing with NPoS (md) document, thanks to @rphmeier noticing our slashing issues. Anytime a nominator N gets slashed due to actions by a validator V then we should:

First, we post a slashing transaction to the chain, which drops V from the active validator list by invalidating their session keys, which makes everyone ignore V for the remainder of the era, and also invalidates any future blocks that do not ignore V. We also remove all nomination approval votes by any nominator for V, even those who currently allocate V zero stake.

As a result, V should stay out until nominators manually reinstate them.

Second, we remove all N's nomination approval votes for future eras. We do not remove N's current nominations for the current era or reduce the stake currently backing other validators. Also we permit N to add new nomination approval votes for future eras during the current era. We also notify N that V cause them to be slashed.

It'd remain possible for nominators to reinstate V in the next era with manual effort though. We could permit reinstating in the next session even without an era change, but not without manual approval since this approval switches the slashing computation from a max to a sum.

@gavofyork
Copy link
Copy Markdown
Member

We currently oversupply validator candidates in Phragmen

incorrect. you might be thinking of our usage of phragmen for council elections.

If we rerun elections due to slashing

we don't rerun them immediately; only at the end of the session. it has no effect on the rotation of babe authorities - the next epoch's validator set will be the same. all it means is that we rotate them off a bit faster than we otherwise would. for polkadot, assuming a validator wanted to grief the network in this way, they would essentially rotate themselves off/on every epoch, thereby causing 6x more elections than normal (there are only 6 epochs/sessions in an era). this isn't such a big deal.

Actually before we do elections they are at least disabled

this currently has no effect.

--

the only question in my mind is: is a 6x increase in elections (that can be caused by only 1 bad validator) sufficiently annoying to bring in this logic that would essentially increase the requirement from only 1 validator to some minimum number of validators.

@AlistairStewart @burdges

@burdges
Copy link
Copy Markdown

burdges commented Sep 11, 2019

Anytime we rerun elections we should wait at least one full sessions so that nobody can bias their voting based on the expected randomness. I gather this is already baked into the code pretty deep, so actually not problem there.

I doubt a the factor of 6 matters here. I'll push Al to weigh in.

@gavofyork
Copy link
Copy Markdown
Member

Updated to 1/6 of validators as per @AlistairStewart 's suggestion.

@gavofyork gavofyork merged commit d320249 into master Sep 19, 2019
@gavofyork gavofyork deleted the td-new-era branch September 19, 2019 10:04
en pushed a commit to en/substrate that referenced this pull request Sep 24, 2019
* Force new era only when over third validators is disabled.

* Update srml/session/src/lib.rs

Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>

* Update srml/staking/src/lib.rs

Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>

* Parametrize the threshold.

* Bump runtime version.

* Update node/runtime/src/lib.rs

* Fix build.

* Fix im-online test.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants