New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add jitter to DKG start and upon processing phase 1 broadcast messages #1303
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1303 +/- ##
==========================================
- Coverage 55.39% 55.39% -0.01%
==========================================
Files 510 510
Lines 31857 31892 +35
==========================================
+ Hits 17648 17667 +19
- Misses 11834 11851 +17
+ Partials 2375 2374 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👌🏼
I have a comment and a question:
- I know 500 micro sec is a default duration and won't be used in the real networks. Shouldn't we also set mainnet and testnet config somewhere as part of this PR ?
- I suggest to make the phase duration dependant on the new config, or at least have a sanity check that the phase duration is at least the max delay plus some constant buffer. Can that be added to this PR?
module/dkg/controller.go
Outdated
// We would like to target spreading this cost over a 20 minute period in the | ||
// average case. | ||
// | ||
// 500µs results in per-broadcast delays of max=11.25s, ave=5.625s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the 500µs? Or are you talking about 500ms that is used as the DefaultBaseHandleBroadcastDelay
// 500µs results in per-broadcast delays of max=11.25s, ave=5.625s. | |
// 500ms results in per-broadcast delays of max=11.25s, ave=5.625s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultBaseHandleBroadcastDelay
is 500 microseconds, not 500 milliseconds (code)
dkgSize = 0 | ||
} | ||
|
||
// m=b*n^2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't quite understand this math. Could you explain ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maximum_delay = base_delay * number_of_dkg_nodes ^ 2
Then we select the actual delay from [0, maximum_delay]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason for * number_of_dkg_nodes ^ 2
is because DKG takes O(N^2) messages?
So an average of each message delay t
would result a total delay of t * n * n
?
In your example, a delay of 500 microsecond
would results in 0.0005 * 150 * 150
= 11.25 seconds total delay?
I don't get the n ^ 2
part in the math. To me, there are n^2 messages to exchange, because each node is sending n-1
messages to other nodes. However, I think each node is sending their message concurrently. So a delay d
added to send each message would only result a total delay of n * d
rather than n * d ^ 2
. no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this PR is trying to do is spread out the CPU-intensive operations required in the DKG for each, not to spread out the actual broadcast messages. The broadcast messages all get sent at more or less the same time. Each node will receive n-1
broadcast messages, then will stagger the processing of them to minimize the impact on the rest of the system.
The reason we use n^2
is because the cost for a node to process an individual broadcast messages scales quadratically with the number of nodes and we want the aggregate amount of jitter introduced to be proportional to the aggregate amount of time spent processing broadcast messages.
The duration is specified in the smart contract config, so it is tricky to add reliable sanity checks. We could for instance add a sanity check to the What I would like to do instead, though I feel it's out of scope for this PR and for the upcoming spork, is to adjust the delays introduced based on the expected number of messages, the number of processed messages, and the number of views until the end of the phase. In other words, make the DKG participant software respond to the DKG parameters it is given as best as it can under the relevant timing constraints, rather than trying to enforce that the DKG parameters are consistent with hard-coded delay values.
My intention was actually to use this value across all networks. This is the coefficient of the delay computation, not the actual delay value (ie. 500µs is |
500 micro seconds gives a max delay of 11.25s, which means processing the broadcasts would spread out over 11.25s. To get a max delay of 20mn, we would need a base |
We are introducing this delay before each message. We don't have one delay that is on average 20mn -- we have many delays which in aggregate average to a total delay of ~20mn From the comments:
|
Now I understand, I was thinking the delays are concurrent and are sampled in [0,20mn]. The issue with the sequential small delays is that we lose the uniformity of delays: consider for instance incoming message number 2. Each node would process it after My suggestion would be to remove the dependency between the sleeps by running them concurrently (one thread for each sleep). In that case, the delay should be sampled in [0, max_delay] where max_delay is something like 20mn. What do you think? |
The reason I'm cautious about that approach is that it's making a stronger assumption about the timing of broadcast messages. Currently, if a few broadcast messages are sent later that expected in phase 1, we will wait at most a few extra seconds before processing those messages. The added delay is unlikely to cause us to not process them before the end of phase 1. If we select a delay in
It is true that the likelihood of processing a specific message is not uniformly distributed over the 20m period. But I'm not sure that's the right thing to measure. We don't care whether node A and node B are processing a particular message M at the same time. We care about whether node A and node B are processing any message at the same time, or how many nodes are processing any message at the same time. Are we optimally spreading the number of nodes processing a message at a given time, over the time available during phase 1 using this per-message delay? That I don't know (though probably not!), but it is close enough to mitigate the problem in the benchmarking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR to unblock the spork, the code having been tested on benchnet and the Grafana logs being checked.
Two improving items remain and could be implemented before a next spork:
- The current PR implement a quadratic delay in the number of nodes. This is not needed with the current jitter heurstic (additive random delays before processing each broadcast). A linear jitter is enough. If we switch to a linear delay, the constant values would need to be adjusted.
- The current delay heuristic leads to multiple nodes being busy at the same time (IMO). This happens only for successive broadcast messages and it happens because sum of uniform delays is not uniform. Two other possible heuristics have been discussed:
- use a blocking delay for each message, sampled uniformly in [0,30mn] (or a larger window) for an async sleep (separate thread) before processing a broadcast.
- use a random delay before the very first broadcast message, sampled randomly in [0,
d
] whered
is relatively larger than a message processing time (2,5s). Use a constant delay before processing subsequent messages to free the CPU to work on Hotstuff. Constant delays would keep nodes processing async, though this only works for broadcasts arriving early (messages arriving late would be processed at the same time by all nodes).
* introduce uniform random delay once, before processing the first dkg message, to avoid the clustering of processings with the previous approach (sum of uniform random delays) * introduce constant random delays between each subsequent dkg messages to avoid long continuous stretches of heavy resource usage by the dkg
add test case for default delay base values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Co-authored-by: Leo Zhang <zhangchiqing@gmail.com>
bors merge |
Introduce random delays prior to expensive operations in the DKG:
Start
- ~700msHandleBroadcastMessage
- ~2.5s / messageResults Link