Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clique: multiple out-of-turn blocks at same time despite random wiggleTime #22799

Open
thorstenhirsch opened this issue May 3, 2021 · 1 comment
Labels

Comments

@thorstenhirsch
Copy link

System information

Geth version:

Version: 1.9.25-stable
Git Commit: e7872729012a4871397307b12cc3f4772ffcbec6
Git Commit Date: 20201211
Architecture: amd64
Protocol Versions: [65 64 63]
Go Version: go1.15.6
Operating System: linux
GOPATH=
GOROOT=go

OS & Version: RHEL 7.9

Expected behaviour

In case of a persistent network problem or downtime of a sealer node the out-of-turn signing requests solve the problem of a stalled network and every sealer waiting for the missing sealer. After a "wiggle" time the other sealers start sealing a new block. Now in order to not have all sealers do the out-of-turn signing at the same time there's a randomized delay:

wiggle := time.Duration(len(snap.Signers)/2+1) * wiggleTime
delay += time.Duration(rand.Int63n(int64(wiggle)))

Actual behaviour

Unfortunately the randomization doesn't lead to a reliable spread of the delay between the nodes. When out-of-turn signing requests happen often, there are also many "collisions" between the out-of-turn blocks, because different nodes calculate more or less the same delay.

Steps to reproduce the behaviour

Let's assume we have 4 sealers in our network with a block time of 5s and 1 of the sealers is not reachable, then we have out-of-turn signing requests every 20s:

TRACE[05-03|10:40:46.039] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:41:06.541] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:41:26.034] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:41:46.019] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:42:06.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:42:26.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:42:46.034] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:43:06.039] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:43:26.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:43:46.040] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:44:06.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:44:26.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:44:46.035] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:45:06.525] Out-of-turn signing requested            wiggle=1.5s

We can expect that wiggle=1.5s leads to a difference of less than 0.1s in the delays on the 3 nodes for every 5th attempt (not sure if my mental calculation is correct, but I'm pretty sure that it happens pretty often).

Suggestion

Maybe we can use the position of each sealer in the snap.Signers array to get a reliably different delay between the nodes. However we also have to make sure that the 1st sealer in the array doesn't always get the minimum delay in which case it would always be the same node who seals the out-of-turn signing requests. Maybe also do a blockNumber % (number-of-sealers + 1) or something alike.

@thorstenhirsch
Copy link
Author

thorstenhirsch commented May 6, 2021

In the meantime we've out-voted one sealer, so there are three left. Now all sealers have direct connections to each other, but the Out-of-turn signing requests still happen after every round (which is now every 15s):

TRACE[05-06|09:55:44.024] Out-of-turn signing requested            wiggle=1s
TRACE[05-06|09:55:44.526] Out-of-turn signing requested            wiggle=1s
TRACE[05-06|09:55:59.518] Out-of-turn signing requested            wiggle=1s
TRACE[05-06|09:56:14.519] Out-of-turn signing requested            wiggle=1s
TRACE[05-06|09:56:29.531] Out-of-turn signing requested            wiggle=1s

P.S.: I don't know why there are two Out-of-turn signing requests within 500ms. Might be another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants
@thorstenhirsch and others