Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jitter to DKG start and upon processing phase 1 broadcast messages #1303

Merged
merged 26 commits into from Sep 29, 2021

Conversation

jordanschalm
Copy link
Member

@jordanschalm jordanschalm commented Sep 16, 2021

Introduce random delays prior to expensive operations in the DKG:

  • Start - ~700ms
  • HandleBroadcastMessage - ~2.5s / message

Results Link

@jordanschalm jordanschalm marked this pull request as ready for review September 21, 2021 16:22
@codecov-commenter
Copy link

codecov-commenter commented Sep 21, 2021

Codecov Report

Merging #1303 (7edb23f) into master (7925ced) will decrease coverage by 0.00%.
The diff coverage is 89.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1303      +/-   ##
==========================================
- Coverage   55.39%   55.39%   -0.01%     
==========================================
  Files         510      510              
  Lines       31857    31892      +35     
==========================================
+ Hits        17648    17667      +19     
- Misses      11834    11851      +17     
+ Partials     2375     2374       -1     
Flag Coverage Δ
unittests 55.39% <89.18%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
module/dkg/controller_factory.go 0.00% <0.00%> (ø)
module/dkg/controller.go 74.71% <97.05%> (+5.91%) ⬆️
...sus/approvals/assignment_collector_statemachine.go 42.30% <0.00%> (-9.62%) ⬇️
...ngine/common/synchronization/finalized_snapshot.go 68.75% <0.00%> (-4.17%) ⬇️
admin/command_runner.go 78.51% <0.00%> (-1.49%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7925ced...7edb23f. Read the comment docs.

Copy link
Contributor

@tarakby tarakby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👌🏼
I have a comment and a question:

  • I know 500 micro sec is a default duration and won't be used in the real networks. Shouldn't we also set mainnet and testnet config somewhere as part of this PR ?
  • I suggest to make the phase duration dependant on the new config, or at least have a sanity check that the phase duration is at least the max delay plus some constant buffer. Can that be added to this PR?

module/dkg/controller_test.go Show resolved Hide resolved
module/dkg/controller.go Show resolved Hide resolved
cmd/consensus/main.go Outdated Show resolved Hide resolved
// We would like to target spreading this cost over a 20 minute period in the
// average case.
//
// 500µs results in per-broadcast delays of max=11.25s, ave=5.625s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the 500µs? Or are you talking about 500ms that is used as the DefaultBaseHandleBroadcastDelay

Suggested change
// 500µs results in per-broadcast delays of max=11.25s, ave=5.625s.
// 500ms results in per-broadcast delays of max=11.25s, ave=5.625s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultBaseHandleBroadcastDelay is 500 microseconds, not 500 milliseconds (code)

dkgSize = 0
}

// m=b*n^2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't quite understand this math. Could you explain ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maximum_delay = base_delay * number_of_dkg_nodes ^ 2

Then we select the actual delay from [0, maximum_delay]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason for * number_of_dkg_nodes ^ 2 is because DKG takes O(N^2) messages?
So an average of each message delay t would result a total delay of t * n * n?

In your example, a delay of 500 microsecond would results in 0.0005 * 150 * 150 = 11.25 seconds total delay?

I don't get the n ^ 2 part in the math. To me, there are n^2 messages to exchange, because each node is sending n-1 messages to other nodes. However, I think each node is sending their message concurrently. So a delay d added to send each message would only result a total delay of n * d rather than n * d ^ 2. no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this PR is trying to do is spread out the CPU-intensive operations required in the DKG for each, not to spread out the actual broadcast messages. The broadcast messages all get sent at more or less the same time. Each node will receive n-1 broadcast messages, then will stagger the processing of them to minimize the impact on the rest of the system.

The reason we use n^2 is because the cost for a node to process an individual broadcast messages scales quadratically with the number of nodes and we want the aggregate amount of jitter introduced to be proportional to the aggregate amount of time spent processing broadcast messages.

@jordanschalm
Copy link
Member Author

jordanschalm commented Sep 21, 2021

at least have a sanity check that the phase duration is at least the max delay plus some constant buffer

The duration is specified in the smart contract config, so it is tricky to add reliable sanity checks. We could for instance add a sanity check to the finalize step of bootstrapping, but this applies only to the first epoch following the spork -- after that the DKG phase length values come from the contract, so it provides a bit of a false sense of security.

What I would like to do instead, though I feel it's out of scope for this PR and for the upcoming spork, is to adjust the delays introduced based on the expected number of messages, the number of processed messages, and the number of views until the end of the phase. In other words, make the DKG participant software respond to the DKG parameters it is given as best as it can under the relevant timing constraints, rather than trying to enforce that the DKG parameters are consistent with hard-coded delay values.

500 micro sec is a default duration and won't be used in the real networks

My intention was actually to use this value across all networks. This is the coefficient of the delay computation, not the actual delay value (ie. 500µs is b in bn^2. There is reasoning for where this value comes from in the comments

@tarakby

@tarakby
Copy link
Contributor

tarakby commented Sep 21, 2021

@jordanschalm

My intention was actually to use this value across all networks.

500 micro seconds gives a max delay of 11.25s, which means processing the broadcasts would spread out over 11.25s. To get a max delay of 20mn, we would need a base b of 53ms. Did I misunderstand something?

@jordanschalm
Copy link
Member Author

jordanschalm commented Sep 21, 2021

@tarakby

To get a max delay of 20mn, we would need a base b of 53ms. Did I misunderstand something?

We are introducing this delay before each message. We don't have one delay that is on average 20mn -- we have many delays which in aggregate average to a total delay of ~20mn

From the comments:

// 500µs results in per-broadcast delays of max=11.25s, ave=5.625s.
// This results in total delays of max=~28m, ave=~14m, and total time to
// process all phase 1 broadcast messages of max=~34m, ave=~20m.

@tarakby
Copy link
Contributor

tarakby commented Sep 21, 2021

@jordanschalm

Now I understand, I was thinking the delays are concurrent and are sampled in [0,20mn].

The issue with the sequential small delays is that we lose the uniformity of delays: consider for instance incoming message number 2. Each node would process it after delay_0 + processing_time + delay_1 + processing_time. The delay to process message 2 is basically randomized by delay_0 + delay_1. Although delay_0 and delay_1 are uniformly sampled, delay_0 + delay_1 is far from uniform (you can for instance think of the sum of 2 uniform dice throws and notice the result in not uniform). The bias gets worse after each message from 2 till 149. The bias means we loose the impact of the mitigation and we get more nodes processing messages at the same time.

My suggestion would be to remove the dependency between the sleeps by running them concurrently (one thread for each sleep). In that case, the delay should be sampled in [0, max_delay] where max_delay is something like 20mn. What do you think?

@jordanschalm
Copy link
Member Author

jordanschalm commented Sep 22, 2021

the delay should be sampled in [0, max_delay] where max_delay is something like 20mn. What do you think?

The reason I'm cautious about that approach is that it's making a stronger assumption about the timing of broadcast messages. Currently, if a few broadcast messages are sent later that expected in phase 1, we will wait at most a few extra seconds before processing those messages. The added delay is unlikely to cause us to not process them before the end of phase 1. If we select a delay in [0,20m] for every message, late-arriving messages may "get unlucky" with a large delay and will not be processed before the end of phase 1.

The delay to process message 2 is basically randomized by delay_0 + delay_1. Although delay_0 and delay_1 are uniformly sampled, delay_0 + delay_1 is far from uniform (you can for instance think of the sum of 2 uniform dice throws and notice the result in not uniform). The bias gets worse after each message from 2 till 149. The bias means we loose the impact of the mitigation and we get more nodes processing messages at the same time.

It is true that the likelihood of processing a specific message is not uniformly distributed over the 20m period. But I'm not sure that's the right thing to measure. We don't care whether node A and node B are processing a particular message M at the same time. We care about whether node A and node B are processing any message at the same time, or how many nodes are processing any message at the same time. Are we optimally spreading the number of nodes processing a message at a given time, over the time available during phase 1 using this per-message delay? That I don't know (though probably not!), but it is close enough to mitigate the problem in the benchmarking.

Copy link
Contributor

@tarakby tarakby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR to unblock the spork, the code having been tested on benchnet and the Grafana logs being checked.

Two improving items remain and could be implemented before a next spork:

  • The current PR implement a quadratic delay in the number of nodes. This is not needed with the current jitter heurstic (additive random delays before processing each broadcast). A linear jitter is enough. If we switch to a linear delay, the constant values would need to be adjusted.
  • The current delay heuristic leads to multiple nodes being busy at the same time (IMO). This happens only for successive broadcast messages and it happens because sum of uniform delays is not uniform. Two other possible heuristics have been discussed:
    • use a blocking delay for each message, sampled uniformly in [0,30mn] (or a larger window) for an async sleep (separate thread) before processing a broadcast.
    • use a random delay before the very first broadcast message, sampled randomly in [0,d] where d is relatively larger than a message processing time (2,5s). Use a constant delay before processing subsequent messages to free the CPU to work on Hotstuff. Constant delays would keep nodes processing async, though this only works for broadcasts arriving early (messages arriving late would be processed at the same time by all nodes).

module/dkg/controller.go Outdated Show resolved Hide resolved
* introduce uniform random delay once, before processing the first dkg
message, to avoid the clustering of processings with the previous
approach (sum of uniform random delays)
* introduce constant random delays between each subsequent dkg messages
to avoid long continuous stretches of heavy resource usage by the dkg
@jordanschalm jordanschalm mentioned this pull request Sep 29, 2021
add test case for default delay base values
Copy link
Member

@zhangchiqing zhangchiqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

module/dkg/controller.go Outdated Show resolved Hide resolved
@jordanschalm
Copy link
Member Author

bors merge

@bors
Copy link
Contributor

bors bot commented Sep 29, 2021

@bors bors bot merged commit 13ef07a into master Sep 29, 2021
@bors bors bot deleted the jordan/5840-dkg-initial-delays branch September 29, 2021 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants