Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Recovery of PoVs #388

Closed
akru opened this issue Apr 4, 2021 · 13 comments · Fixed by #445
Closed

Recovery of PoVs #388

akru opened this issue Apr 4, 2021 · 13 comments · Fixed by #445

Comments

@akru
Copy link
Contributor

akru commented Apr 4, 2021

This report is not only an idea but already happens on the test network.

Description

Let parachain A has four collators: Alice, Bob, Charlie, and Dave. Let Dave is a malicious collator. Currently having at least one collator is possible to stop block producing on a parachain. Let's see following steps:

  1. Alice, Bob, Charlie, and Dave proposing blocks to relay chain validators.
  2. All collators are synced up and can propose a block.
  3. If the block proposal (or block sentence) from Dave accepted, then Dave stops to share blocks with each other.
  4. As a result Dave only one can propose a block because of having a full chain.

Possible solutions

Probably collator should be able to recover the block from the relay when it can't get the best block from sync process.

@bkchr
Copy link
Member

bkchr commented Apr 4, 2021

Yeah, we are aware of it.

And somewhere I have an todo to implement recovery from the relay chain.

@akru
Copy link
Contributor Author

akru commented Apr 4, 2021

@bkchr thank you, looking forward to updates.

@burdges
Copy link

burdges commented Apr 4, 2021

I've explained this in chat and paritytech/polkadot#2203, but might as well spell it out here..

Safer Option: We've at least 2f+1 out of 3f+1 validators who signed that they have availability pieces of Dave's block, so a collator could contact any f+1 of those, obtain their piece, and reconstruct Dave's block from the erasure coding. This works fine, but requires opening connections to 256 < f+1 < 512 validators. It's fine if only the next upcoming collator does so, but not good if many do so.

Faster Option: There are three node classifications who know the full parachain block, the collator who created it (Dave), the two-ish validators who acted as backing checkers for it (maybe Dave's buddies), and the approval checkers who checked the block. Approval checkers can only be known by listening to the relay chain gossip, but Alice, Bob, Charlie already do so anyways. Approval checkers do not currently retain Dave's block, but they could do so temporarily, which enables downloading the whole block by talking to only one validator.

Approval checkers cannot be known by Dave in advance, meaning he cannot plan upon this working. Yet, if Dave controls proportions q < 1/3 of the collators and p<1/3 of the validators then he waits an expected 6 p^{-16}/q seconds between attempts, roughly 1 year with q=p=1/3. Acceptable risk since stalls are not soundness violations.

There is nothing wrong with either option, so we'll implement which ever looks simplest, almost surely the safer slower option. If everyone behaves then no problem. :) We could later add an opt-in form of the faster option, if we're worried both about this attack and about approval checkers holding blocks too long.


We've the same issues for XCMP messages because parachains must process incoming ones. We create XCMP messages as outputs of the sending chain's PVF, so again they're known by anyone who checks the sending block, i.e. sending block's collator, sending block's backing checkers, or sending block's approval checkers.

In this case however, we've cross parachain logic so there is no simple honest path via which the message arrives usually. Also, receivers only want the message itself, not to rerun the sender's block, so the safer option really sucks.

We'll therefore make receiving collators ask for their messages from sender's backing and approval checkers, so the faster approach. Again stalls remain an acceptable risk, but if abuse happens then we'll code up the safer slower approach as a fall back.

It's kinda interesting that XCMP favors the opposite method, but not problematic.

@bkchr
Copy link
Member

bkchr commented Apr 4, 2021

It's fine if only the next upcoming collator does so, but not good if many do so.

That works as long as we have a consensus that knows who is the next collator. Currently, with the relay chain based consensus we can not know this.

@burdges
Copy link

burdges commented Apr 4, 2021

We could provide detached proofs of being the next collator with any of aura, babe, and sassafras, but..

It's also fine if you think the fast option of asking backing and approval checkers might be less code or might carry enough additional value in terms of simplifying the XCMP implementation. We'll discuss retaining the candidate longer with Rob since that's our only sticky point.

@bkchr
Copy link
Member

bkchr commented Apr 4, 2021

We could provide detached proofs of being the next collator with any of aura, babe, and sassafras, but..

Yeah I know,that will be easy :)

@bkchr bkchr added this to the Beyond the End of the Century milestone May 3, 2021
@bkchr bkchr changed the title Multi-collator parachain DoS attack vector Recovery of PoVs May 3, 2021
@bkchr
Copy link
Member

bkchr commented May 3, 2021

I hijack this issue now ;)

As described above, we need to implement support for PoV recovery through the relay chain availability recovery. The PoV recovery should only be done by collators to not put that much pressure on the relay chain. The rough idea on how it should work:

After we have imported a relay chain block that contains an unknown parachain block, we need to start the recovery:

  • The parachain consensus should be extended for this. It already checks all parachain blocks and is aware if some block is known/unknown.
  • Every node should wait a random amount of time before start the pov recovery. (we don't want to ddos the relay chain)
  • Abort the recovery if we have imported the block in the meantime.

After having recovered the PoV, we can decode it and import the inner block and announce this block to the network.

As we are currently running with 12s block time, we should have enough time to recover the block before the next round starts and a new block needs to be produced.

@burdges
Copy link

burdges commented May 3, 2021

Are parachains nodes meant to first seek the block from the parachain's own network? Or are we already too late for that by the time it appears on the relay chain?

As an aside, after we have upcoming block authorizations or PrePVF or whatever we call them, then I think upcoming block producers could ask random approval checkers for the whole block, under the theory that outing yourself to a random validator causes little censorship risk. We should not be distracted by this sort of thing right now though I suppose.

@bkchr
Copy link
Member

bkchr commented May 3, 2021

We don't have any support for searching the parachain network for a given block :(

As an aside, after we have upcoming block authorizations or PrePVF or whatever we call them, then I think upcoming block producers could ask random approval checkers for the whole block, under the theory that outing yourself to a random validator causes little censorship risk. We should not be distracted by this sort of thing right now though I suppose.

I would still like to see that bakers give all connected collators of a given parachain the chance to download a seconded PoV. If we got support for this, availability recovery should probably only be required if the backers can not give us the data.

@burdges
Copy link

burdges commented May 3, 2021

I meant whether parachains should attempt to gossip blocks internally.

We cleaned up the pre-collation ideas somewhat paritytech/polkadot#2888 (comment) In this, we've some advantages to majority of the parachain seeing the collation as originating from the backing validators, even if it later gets gossiped around the parachain.

@bkchr
Copy link
Member

bkchr commented May 3, 2021

I meant whether parachains should attempt to gossip blocks internally.

The whole point of using the recovery is to recover a PoV that wasn't gossiped in the parachain network. Either because the collator was malicious or the collator crashed before being able to gossip the block to other parachain nodes.

@wischli
Copy link
Contributor

wischli commented May 14, 2021

I was wondering whether this is planned to be fixed before auctions go live on Kusama.

@bkchr
Copy link
Member

bkchr commented May 14, 2021

I have it working in a local branch. I'm currently cleaning it up. I want to have this out asap

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants