-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BFT-464: Twins generator #116
Conversation
|
||
/// The number of signatures required for a high-vote, ie. `2 * f + 1` | ||
pub fn subqourum_size(&self) -> usize { | ||
self.num_faulty() * 2 + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be self.num_fault() * 2 / 5 + 1
for it not to degenerate into 1 if num_replicas <= 5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a degenerate case. If you want to change it, you need to prove correctness of the algorithm for all possible values of n.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we start with n = 5f+1
and therefore f = (n-1)/5
. @brunoffranca says the correct formula would be sq = n-3*f
, so let's see:
sq = n-3*f
sq = n - 3 * (n-1)/5
sq = n - 3/5n + 3/5
sq = 2/5n + 3/5
Now let's start from the current sq = 2f+1
:
sq = 2f+1
sq = 2*(n-1)/5 + 1
sq = 2/5*n -2/5 + 1
sq = 2/5n + 3/5
I believe these are equivalent in mathematical terms, however due to the integer arithmetic, for n<6 we have f=0, but that's just an artifact of how computers work.
Similarly how the spec says the commit quorum size is 4f+1, when n=5 it gives 1, which is clearly wrong, for the same reason I think it's wrong for 2f+1, except we use it for the more difficult to understand subquorum, where nobody who sees it first is really sure if it's an optimisation or a required for safety.
Here are the values for the different values of n
:
>>> for t in [(n, f, 2*f+1, 2*n//5+1, n-3*f) for (n, f) in [(n, (n-1)//5) for n in range(1, 15)]]: print("n={} f={} sq_2f1={} sq_25n1={} sq_n3f={}".format(*t))
...
n=1 f=0 sq_2f1=1 sq_25n1=1 sq_n3f=1
n=2 f=0 sq_2f1=1 sq_25n1=1 sq_n3f=2
n=3 f=0 sq_2f1=1 sq_25n1=2 sq_n3f=3
n=4 f=0 sq_2f1=1 sq_25n1=2 sq_n3f=4
n=5 f=0 sq_2f1=1 sq_25n1=3 sq_n3f=5
n=6 f=1 sq_2f1=3 sq_25n1=3 sq_n3f=3
n=7 f=1 sq_2f1=3 sq_25n1=3 sq_n3f=4
n=8 f=1 sq_2f1=3 sq_25n1=4 sq_n3f=5
n=9 f=1 sq_2f1=3 sq_25n1=4 sq_n3f=6
n=10 f=1 sq_2f1=3 sq_25n1=5 sq_n3f=7
n=11 f=2 sq_2f1=5 sq_25n1=5 sq_n3f=5
n=12 f=2 sq_2f1=5 sq_25n1=5 sq_n3f=6
n=13 f=2 sq_2f1=5 sq_25n1=6 sq_n3f=7
n=14 f=2 sq_2f1=5 sq_25n1=6 sq_n3f=8
I think if n-3f
is correct and 2f+1
is correct then the middle ground must be correct too. You can see they don't give the exact same results all across the board, but of course the starkest difference is below n=6.
If it's a safety feature, ie. if there is exactly one highest subquorum then you must repropose it, then I'm a bit worried that if the sq=1, it's more likely that we don't repropose because we see two of them, and lose this security. I don't have an example for it from the top of my head, but it seems more intuitive for the other values of 2f+1 that if there are two of them then there surely wasn't any quorum. For example if n=5 and a leader go isolated and voted for their own proposal, but missed the fact that there was a quorum for a value already in the previous round.
After working through some examples I can't find any counter examples which would leave to a safety violation: if there was enough votes for a QC in a round, at least 2f+1 of them will show up in the next round (via the timeout QC) and there won't be any other SQC; if there are multiple SQCs present then there wasn't any QC. So the only difference is whether we broadcast a fresh payload, or try to repropose the highest view that the SQC has seen.
// Prune partitionings so that all of them have at least one where a quorum is possible. | ||
// Alternatively we could keep all and make sure every scenario has eventually one with a quorum, for liveness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to consider here would be how the network is implemented in the tests. My idea was that I'd look at the view
of the message, and then look at the partitioning in that round, and deliver to all nodes who are in the same partition as the sender.
The ChonkyBFT spec repeats timeouts periodically, however if every node would be in a partition with a size less than 4f+1, then they would never have a TimeoutQuorum, and would never move on to the next view.
Whereas if at least one partition has 4f+1 views, then they can time out, and in the next view they might be broadcasting to their stranded brethren, who at that point can move on to the future view on which they now observe a quorum in the justification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively the network would have to contain some logic to eventually un-partition views, to behave in line with the assumption that all messages are delivered eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The currently implemented network semantics is that all (consensus) messages are delivered eventually, unless overridden by a newer message (see MessagePool for exact semantics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best approach IMO is to unpartition every scenario at the end. Partitions are a temporary network condition anyway, with these tests we want to see if the consensus can recover from a partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Approved!
Co-authored-by: Bruno França <bruno@franca.xyz>
Co-authored-by: Bruno França <bruno@franca.xyz>
What ❔
Implements utilities to support generating Twins scenarios
Liveness
I haven't yet added the the logic for creating rounds where there is no chance for a quorum in any of the partitions. It would be easy to do, but I'd like to ask your opinion on how many to mix in, or whether we should always end with a potential quroum, etc. Perhaps just a probability of a no-quorum round?
Why ❔
So that we can generate the combinations and permutations that we're going to execute with the mock network.