Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/optimise consensus sigcheck #4467

Merged
merged 164 commits into from Dec 7, 2022

Conversation

AdoAdoAdo
Copy link
Contributor

@AdoAdoAdo AdoAdoAdo commented Sep 15, 2022

Description of the reasoning behind the pull request

This PR brings several changes on the way block signatures are handler during consensus in order to optimize the CPU usage.
Each signature share used in the aggregation of signatures for the block consensus proof, was previously individually verified.
Assuming all validators are honest, the verification takes ~3ms per signature. In metachain, each block accumulates ~400 signatures which is individually checked by all participants in consensus, taking a cumulative ~1.2s CPU time for each block that passes consensus.

Proposed Changes

This 1.2 seconds for metachain and ~200 ms for shards each round (for the synchronized nodes) can be optimized out, if instead the aggregation is done assuming all signature shares are valid, and verification is done on the aggregation instead.
The cost of the verification of the aggregated signature was optimized by the previous PR #4314 down to ~3ms as well.
This however can only be saved if all aggregated signatures are valid, otherwise, if the aggregated signature does not verify we will have an extra cost of 3ms, as the leader will aggregate first the signatures and then verify the aggregated signature. If the aggregated signature does not verify, the leader will have to check the signatures individually.
In order to avoid the increase and instead always have this optimization, a penalty for providing invalid signatures will be implemented which we call a pseudo slashing, that should provide enough disincentive.

The pseudo slashing is implementing through the addition of an extra message with the proof for received invalid signatures, sent on the consensus. This message is not mandatory to be sent during a consensus round, but if it is sent, the proof will be verified and used by nodes listening to the consensus messages to temporarily blacklist the misbehaving nodes.
While blacklisted, the messages from the nodes in the blacklist will be dropped and this will cause rating drop and the eventual jailing, that penalize the node through the unjailing fee, missing block rewards/fees and queue time for becoming eligible for consensus again.

Testing procedure

  • Standard system test

  • run with log trace for consensus

  • Custom system test with invalid signers

    • there is a separate branch from this feat with invalid signer node optimistic-consensus-sigcheck-malicious-node
    • start a separate node in consensus with this branch
  • TDB

ssd04 and others added 30 commits May 27, 2022 15:42
- do not pass multisigner as parameter since it is not needed anymore
- no need to verify agg sig when created only with valid sigs
…n-on-consensus

Optimistic signature aggregation by leader
@AdoAdoAdo
Copy link
Contributor Author

AdoAdoAdo commented Nov 14, 2022

@ssd04 @AdoAdoAdo
How many nodes can be dropped because of blacklisting at any moment? Could the blacklisting mechanism be somehow exploited to trigger consecutive node drops? (I'm asking without really understanding how the consensus works)

the mechanism shouldn't be susceptible to exploits, of this nature (to trigger node drops) when the nodes behave-follow the protocol, as the blacklisting relies on 2 signatures verification, where one of the signatures (p2p - ecdsa w/ secp256k1 verifies) but the BLS sig is invalid, which is clearly malicious behavior, and cannot happen when nodes follow the protocol.

As for how many nodes can be dropped (jailed due to the blacklisting) - there is no limit, as many as there are. Even if all nodes would be jailed, there is one protection where part of the jailed nodes are still able to participate in consensus in order to maintain/satisfy the 400 nodes/shard constraint

@ssd04 ssd04 changed the base branch from rc/v1.4.0 to rc/v1.5.0 December 5, 2022 15:22
@@ -22,4 +22,5 @@ message Message {
bytes AggregateSignature = 11;
bytes LeaderSignature = 12;
bytes OriginatorPid = 13;
bytes InvalidSigners = 14;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid indentation here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

singleSigner := sr.SingleSigner()
err = singleSigner.Verify(pubKey, cnsMsg.BlockHeaderHash, cnsMsg.SignatureShare)
if err != nil {
log.Trace("verifyInvalidSigner: confirmed that node provided invalid signature",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe turn this to debug level? This should not happen normally, and when it happens we should see the print only once/node/round

case <-ctx.Done():
log.Debug("peerBlacklist's go routine is stopping...")
return
case <-time.After(durationSweepP2PBlacklist):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a timer here

}

// BlacklistPeer will blacklist a peer for a certain amount of time
func (pb *peerBlacklist) BlacklistPeer(peer core.PeerID, duration time.Duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a lot to remember where I'd seen this. Found it in the initP2PAntiFloodComponents function and in the peerTimeCache and p2pAntiflood structs. We might reuse this code in those antiflooding components one day.

…se-consensus-sigcheck

fixes after review: fix indentation + use timer in blacklist
gabi-vuls
gabi-vuls previously approved these changes Dec 6, 2022
@ssd04 ssd04 marked this pull request as ready for review December 6, 2022 14:18
iulianpascalau
iulianpascalau previously approved these changes Dec 6, 2022
ssd04
ssd04 previously approved these changes Dec 6, 2022
@iulianpascalau iulianpascalau dismissed stale reviews from ssd04, gabi-vuls, and themself via 2809726 December 7, 2022 07:54
@iulianpascalau iulianpascalau merged commit 44690a8 into rc/v1.5.0 Dec 7, 2022
@iulianpascalau iulianpascalau deleted the feat/optimise-consensus-sigcheck branch December 7, 2022 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants