clique: multiple out-of-turn blocks at same time despite random wiggleTime #22799

thorstenhirsch · 2021-05-03T09:54:21Z

System information

Geth version:

Version: 1.9.25-stable
Git Commit: e7872729012a4871397307b12cc3f4772ffcbec6
Git Commit Date: 20201211
Architecture: amd64
Protocol Versions: [65 64 63]
Go Version: go1.15.6
Operating System: linux
GOPATH=
GOROOT=go

OS & Version: RHEL 7.9

Expected behaviour

In case of a persistent network problem or downtime of a sealer node the out-of-turn signing requests solve the problem of a stalled network and every sealer waiting for the missing sealer. After a "wiggle" time the other sealers start sealing a new block. Now in order to not have all sealers do the out-of-turn signing at the same time there's a randomized delay:

wiggle := time.Duration(len(snap.Signers)/2+1) * wiggleTime
delay += time.Duration(rand.Int63n(int64(wiggle)))

Actual behaviour

Unfortunately the randomization doesn't lead to a reliable spread of the delay between the nodes. When out-of-turn signing requests happen often, there are also many "collisions" between the out-of-turn blocks, because different nodes calculate more or less the same delay.

Steps to reproduce the behaviour

Let's assume we have 4 sealers in our network with a block time of 5s and 1 of the sealers is not reachable, then we have out-of-turn signing requests every 20s:

TRACE[05-03|10:40:46.039] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:41:06.541] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:41:26.034] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:41:46.019] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:42:06.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:42:26.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:42:46.034] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:43:06.039] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:43:26.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:43:46.040] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:44:06.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:44:26.024] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:44:46.035] Out-of-turn signing requested            wiggle=1.5s
TRACE[05-03|10:45:06.525] Out-of-turn signing requested            wiggle=1.5s

We can expect that wiggle=1.5s leads to a difference of less than 0.1s in the delays on the 3 nodes for every 5th attempt (not sure if my mental calculation is correct, but I'm pretty sure that it happens pretty often).

Suggestion

Maybe we can use the position of each sealer in the snap.Signers array to get a reliably different delay between the nodes. However we also have to make sure that the 1st sealer in the array doesn't always get the minimum delay in which case it would always be the same node who seals the out-of-turn signing requests. Maybe also do a blockNumber % (number-of-sealers + 1) or something alike.

The text was updated successfully, but these errors were encountered:

thorstenhirsch · 2021-05-06T08:00:48Z

In the meantime we've out-voted one sealer, so there are three left. Now all sealers have direct connections to each other, but the Out-of-turn signing requests still happen after every round (which is now every 15s):

TRACE[05-06|09:55:44.024] Out-of-turn signing requested            wiggle=1s
TRACE[05-06|09:55:44.526] Out-of-turn signing requested            wiggle=1s
TRACE[05-06|09:55:59.518] Out-of-turn signing requested            wiggle=1s
TRACE[05-06|09:56:14.519] Out-of-turn signing requested            wiggle=1s
TRACE[05-06|09:56:29.531] Out-of-turn signing requested            wiggle=1s

P.S.: I don't know why there are two Out-of-turn signing requests within 500ms. Might be another issue.

thorstenhirsch added the type:bug label May 3, 2021

sjehan mentioned this issue May 23, 2022

Better hardware work requirements for carbon footprint and for security ethereum-pocr/ethereum-pocr.github.io#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clique: multiple out-of-turn blocks at same time despite random wiggleTime #22799

clique: multiple out-of-turn blocks at same time despite random wiggleTime #22799

thorstenhirsch commented May 3, 2021

thorstenhirsch commented May 6, 2021 •

edited

clique: multiple out-of-turn blocks at same time despite random wiggleTime #22799

clique: multiple out-of-turn blocks at same time despite random wiggleTime #22799

Comments

thorstenhirsch commented May 3, 2021

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

Suggestion

thorstenhirsch commented May 6, 2021 • edited

thorstenhirsch commented May 6, 2021 •

edited