New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add random delay to periodic peer update thread #1277
Conversation
We are observing synchronized goroutine spikes causing consensus finalization issues on large committees on benchet, with the same period as this peer update process. This adds a random delay before spawning the thread which requests updates, so that updates are not synchronised.
Codecov Report
@@ Coverage Diff @@
## master #1277 +/- ##
==========================================
- Coverage 54.59% 54.57% -0.03%
==========================================
Files 498 498
Lines 31533 31537 +4
==========================================
- Hits 17217 17212 -5
- Misses 11968 11977 +9
Partials 2348 2348
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
delay := time.Duration(mrand.Int63n(pm.peerUpdateInterval.Nanoseconds())) | ||
pm.unit.LaunchPeriodically(pm.RequestPeerUpdate, pm.peerUpdateInterval, delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This delay becomes a static configuration parameter of the update loop of the node when inserted here.
Would it be possible to instead choose the delay as part of each run through the loop? That way a node isn't "stuck" with a long delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delay only happens once (here), before the static-period ticker starts. The time between each peer update is still 10m, it's just the time before the first peer update that can have a longer delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😲 got it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TYVM!
delay := time.Duration(mrand.Int63n(pm.peerUpdateInterval.Nanoseconds())) | ||
pm.unit.LaunchPeriodically(pm.RequestPeerUpdate, pm.peerUpdateInterval, delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😲 got it!
bors merge |
We are observing synchronized goroutine spikes causing consensus finalization issues on large committees on benchnet, with the same period as this peer update process.
This PR:
engine.Unit
to exit goroutines launched withLaunchPeriodically
immediately, even if they are waiting for the initial delay period when the unit is torn downRun on Benchnet without this fix:
With this fix: