Commit certificate size #215

sergefdrv · 2021-11-30T14:48:23Z

sergefdrv
Nov 30, 2021

Hi, I would like to hear your opinion about the following.

There are two important kinds of quorums that ensure safety and liveness properties in MinBFT. Those are the acceptance quorum and the new-view certificate. The acceptance quorum is a collection of Prepare/Commit messages that is sufficient for request execution. The new-view certificate is a collection of ViewChange messages in a NewView message that is sufficient to justify a transition into a new view.

To ensure safety, any acceptance quorum must intersect with any new-view certificate in at least one replica. The intersection guarantees that any potentially executed request will propagate to the new view, even if the new primary is Byzantine. Moreover, to ensure liveness, the maximum quorum size cannot exceed n-f, since up to f replicas may be faulty and do not generate any message. On the other hand, for the sake of safety, the minimum quorum size cannot be less than f+1, since it must include at least one correct replica.

In summary, the following conditions must be satisfied:

The acceptance quorum size must be in the range of (f, n-f]
The new-view certificate size must be in the range of (f, n-f]
The sum of the acceptance quorum and new-view certificate sizes must be greater than n

The paper only discusses the case when n=2f+1. In that case, the only possible choice in the range of (f, n-f] is f+1, which also satisfies the condition 3. However, in the generic case when n>2f, one could consider the following options:

The acceptance quorum size of f+1 and the new-view certificate size of n-f
The acceptance quorum size of n-f and the new-view certificate size of f+1
The acceptance quorum size of floor(n/2)+1 and the new-view certificate size of floor(n/2)+1

There is no difference in terms of communication complexity in a stable view since all backup replicas act in the same way, i.e. the protocol still follows all-to-all communication pattern. However, the choice may affect view-change overhead, fault tolerance in a stable view, and slow replicas.

The choice 1 would be optimal for graceful execution since it can proceed with the minimal number of received messages in a stable view. Thus it provides minimal latency and even allows the system to withstand up to n-f-1 faulty replicas in a stable view. However it may incur significantly higher overhead in view change due to the bigger new-view certificate size, since each ViewChange message in the new-view certificate carries the protocol history of the corresponding replica.

The choice 2 would be optimal for view change, due to the minimal new-view certificate size, but the latency in a stable view would be limited by (f+1)-th slowest replica. On the other hand, it would give slow replicas best chances to keep up with the protocol execution.

The choice 3 would represent a simple form of compromise. It may give moderately better latency and fault tolerance in a stable view than the choice 2, whereas moderately reducing view-change overhead. On the other hand, it would still give slow replicas less chance to keep up with the protocol execution than the choice 2.

Would you have a preferred choice or any additional concern?

nhoriguchi · 2021-12-01T02:36:41Z

nhoriguchi
Dec 1, 2021

Although the guarantees on safety and liveness are defined by the number of faulty replicas, the performance of MinBFT seems to be guaranteed to be best under more struct condition that the network is stable enough. My point is that if the network is slow/unstable, the MinBFT cluster does not perform well whether view change happens frequently or not, so optimizing view-change overhead might not help us solve the situation alone.

So keeping acceptance quorum minimum (option 1) makes sense most to me (for now).

However it may incur significantly higher overhead in view change due to the bigger new-view certificate size, since each ViewChange message in the new-view certificate carries the protocol history of the corresponding replica.

Maybe we can reduce the size of ViewChange message by controlling checkpoint to keep message log small, although checkpoint is not available now. So will this be solved eventually?

5 replies

sergefdrv Dec 1, 2021
Author

Maybe we can reduce the size of ViewChange message by controlling checkpoint ...

This is a good point. However, I suspect that the checkpointing feature may be quite challenging to implement in MinBFT. My biggest concern is justifying missing certified messages discarded by backup replicas on stable checkpoints, given that UI counter values assigned by different replicas may diverge after view change. It is not quite clear to me how a checkpoint certificate could justify the discarded messages for all replicas when Commit messages from different backup replica corresponding to the same primary's proposal, and therefore the same checkpoint, happen to get assigned distinct UI counter values.

Hybster paper also raises similar concerns:

For that purpose, MinBFT uses the counter values of its trusted subsystem USIG to force even faulty replicas to provide a complete history of all outgoing messages. That way, all potentially executed consensus instances are revealed during a view-change. With a mere equivocation detection and a two-phase ordering, this would result in an arbitrarily complex view-change protocol. If a faulty leader sent different PREPARE messages for one consensus instance, this could only be detected by comparing the histories provided by different replicas. Therefore, MinBFT also uses a form of equivocation prevention for PREPAREs: It does not use dedicated order numbers but relies on the value of the counter certificate to determine the total order. Nevertheless, for all other protocol messages, the described
detection mechanism is used, which may lead to complex situations during the view-change. Also the checkpointing protocol employed to discard obsolete messages becomes more complex since replicas have to ensure that they are always able to provide a complete history of outgoing messages. This, however, makes it difficult to guarantee that these histories remain bounded in size when a view-change takes multiple rounds to find a new leader.

nhoriguchi Dec 2, 2021

Thank you for elaboration,

My biggest concern is justifying missing certified messages discarded by backup replicas on stable checkpoints, given that UI counter values assigned by different replicas may diverge after view change. It is not quite clear to me how a checkpoint certificate could justify the discarded messages for all replicas when Commit messages from different backup replica corresponding to the same primary's proposal, and therefore the same checkpoint, happen to get assigned distinct UI counter values.

Maybe I miss something but I thought that view-change message includes message logs only since the last checkpoint, so any "missing certified messages discarded by backup replicas on stable checkpoint" does not affect view-change. The above your concern seems to mention the situation (sorry if not) that checkpoint happens after some view-change which causes "diverged UI counter" situation? If that's true, I understand that implementing checkpoint is difficult.

Anyway, I'm not sure whether the diverged UI counter issue is potential or real one, and the affected request shouldn't reach consensus and whole consensus process need be retried from the primary of the new view (with the new UI value), so the situation seems to be handled within protocol. Please correct me if I misunderstand.

sergefdrv Dec 2, 2021
Author

It is correct that, according to the paper, ViewChange messages should only include certified messages since the latest stable checkpoint. The ViewChange message should also include the checkpoint certificate to justify the missing older messages.

Until the first view change, the primary replica only generates Prepare messages, whereas backup replicas only generate corresponding Commit messages referring to (embedding) those Prepare messages. Backup replicas process Prepare messages sequentially in order of the UI values assigned by the primary. Therefore they also assign sequential UI values to the corresponding generated Commit message in the same order. Thus UI values of all Prepare and corresponding Commit messages match one-to-one in the initial view.

Given that view change is triggered asynchronously and independently in each replica (upon reception of a quorum of ReqViewChange messages), different replicas may generate ViewChange messages with distinct UI values, i.e. after having generated a different number of Commit messages. In that case, the subsequently generated corresponding NewView/Prepare/Commit messages will diverge in UI values. So there is no guarantee for one-to-one match in the UI values assigned by different replicas to NewView/Prepare/Commit messages after a view change.

This does not create any fundamental problem for the normal execution or view change, and, in fact, is already taken into account in the current code. However, it is not yet quite clear to me how a stable checkpoint certificate could possibly justify the missing discarded messages in every replica's message log in a way that Byzantine replicas are unable to successfully perform message equivocation. The essence of the problem is that, on one hand, UI values may diverge, whereas on the other hand, to guarantee liveness, correct replicas cannot wait until having received and confirmed UI values from all other replicas - they must proceed even if up to f other replicas are faulty. Solving this problem would require more thought.

nhoriguchi Dec 3, 2021

... However, it is not yet quite clear to me how a stable checkpoint certificate could possibly justify the missing discarded messages in every replica's message log in a way that Byzantine replicas are unable to successfully perform message equivocation. The essence of the problem is that, on one hand, UI values may diverge, whereas on the other hand, to guarantee liveness, correct replicas cannot wait until having received and confirmed UI values from all other replicas - they must proceed even if up to f other replicas are faulty. Solving this problem would require more thought.

Thanks for the details, I updated my understanding for the problem (sorry if still irrelevant). According to MinBFT paper, a Checkpoint message conveys UI_latest (latest executed request) and UI_j (UI for Checkpoint message from backup replica). The checkpoint operation proceeds to be executed at the point when f+1 Checkpoint messages are collected, so only message logs for replicas whose Checkpoint messages are included in the checkpoint certificate are justified with the UI_j. The remaining n-f-1 replicas' message logs are not justified.

How we handle the remaining (arrived late) Checkpoint messages seems not documented in the paper, but adding them to the checkpoint certificate afterward and/or updating internal state about other replicas' latest UI values might be a possible direction?

sergefdrv Dec 3, 2021
Author

Since this thread departed from the original topic, I opened a separate discussion to continue: #218.

ynamiki · 2021-12-01T11:30:25Z

ynamiki
Dec 1, 2021

I completely agree with Naoya's comment. It is my understanding that we can expect replicas to work normally in most cases (i.e. a view change does not occur frequently), so it's reasonable for me to choose 1, which is optimal for such situation.

0 replies

sergefdrv · 2021-12-01T15:49:57Z

sergefdrv
Dec 1, 2021
Author

@nhoriguchi @ynamiki Thanks for your feedback!

In fact, the testing code in #214 assumes the option 1 with the new-view certificate quorum size equal to n-f.

On a second thought, it is worth mentioning that there might not be any notable difference among the options in a stable system when no timeout occurs and at most f replicas are faulty or significantly slower. This is because all replicas anyway follow all-to-all communication pattern. The option 1 would only improve performance if there are more than f replicas which are significantly slower. Nevertheless, it would still be able to mask up to n-f-1 faulty replicas in a stable view (which may be more than f strictly required by the protocol guarantees).

Moreover, with the option 1, if more than f replicas are significantly slower and become more and more delayed, they might eventually time out and trigger view change, thus unnecessarily interrupting the stable view. This would less likely happen with the option 2. We could try to overcome this by increasing the size of the view-change certificate (a collection of ReqViewChange messages) to match the new-view certificate size.

2 replies

sergefdrv Dec 1, 2021
Author

From another perspective, if we consider significantly slower replicas as temporarily faulty then the choices seem to represent different trade-offs for additional fault tolerance in a stable view at the price of higher overhead in view change.

sergefdrv Dec 3, 2021
Author

Hmm, we may also want to distinguish replicas delayed because of high latency vs low throughput in communication. In the former case, the delay would not increase over time, but may contribute to the overall latency.

sergefdrv · 2021-12-10T13:55:47Z

sergefdrv
Dec 10, 2021
Author

Apparently, the paper refers to the "acceptance quorum" as "commit certificate", so I adjusted the discussion's subject.

0 replies

sergefdrv · 2021-12-10T14:36:02Z

sergefdrv
Dec 10, 2021
Author

According to my current understanding, if all correct replicas are equally fast then, in any case, there should be no notable performance difference in a stable view. However, smaller commit certificate size may allow the system to temporarily mask more than f faulty backup replicas, though this cannot be guaranteed and users should not rely on this. On the other hand, it determines how many replicas may lag behind: up to n-|Q| replicas can be permanently delayed, where |Q| is the commit certificate size.

Considering potential user's intuition about semantics of the protocol parameters, should we allow more than f correct replicas to permanently lag behind?

0 replies

luthlee · 2021-12-16T07:50:05Z

luthlee
Dec 16, 2021

On the other hand, for the sake of safety, the minimum quorum size cannot be less than f+1, since it must include at least one correct replica.
In summary, the following conditions must be satisfied:
The acceptance quorum size must be in the range of (f, n-f]
The new-view certificate size must be in the range of (f, n-f]
The sum of the acceptance quorum and new-view certificate sizes must be greater than n

The quorum size must be at least ceil(n+1/2) to ensure at least 1 node, whether byzantine or benign, is in the intersection of any 2 quorums. f+1 is the case when n=2f+1

The paper only discusses the case when n=2f+1. In that case, the only possible choice in the range of (f, n-f] is f+1, which also satisfies the condition 3. However, in the generic case when n>2f, one could consider the following options:

May I ask what is the intension of having more nodes than necessary? That is, choosing n nodes where 2f+1 < n < 2(f+1) + 1? The consensus protocol will definitely slower comparing to n=2f+1 and it does not give you more security by tolerating more faulty nodes than f.

There is no difference in terms of communication complexity in a stable view since all backup replicas act in the same way, i.e. the protocol still follows all-to-all communication pattern. However, the choice may affect view-change overhead, fault tolerance in a stable view, and slow replicas.

The quorum size does not influence message sending process in stable view as you said, but it will influence the latency of reaching consensus, as a node returns earlier upon a smaller quorum of COMMIT messages. I think in a network with different connection latencies, a smaller quorum keeps the closely connected nodes moving forward faster than the rest, while a bigger-sized quorum tries to keep more nodes synchronised; this also means view-change may be triggered less likely. Depending on the network connectivity, if all nodes are more or less well synced, then i will go for a smaller quorum.

1 reply

sergefdrv Dec 16, 2021
Author

The quorum size must be at least ceil(n+1/2) to ensure at least 1 node, whether byzantine or benign, is in the intersection of any 2 quorums

We always have the primary on intersection between any two commit certificates from the same view because they necessarily include the primary's Prepare message.

the intention of having more nodes than necessary

Suppose there is an even number of stakeholders (e.g. 4) who want to run MinBFT with equal number of replicas controlled by each stakeholder. If we set f to the highest possible value then we end up with n=2f+2. I think the option 3, with both quorum sizes floor(n/2)+1, would work best in this case since it is independent of f.

If we set n in the range of [2f+1, f2+2] then the difference between all the possible quorum sizes is at most one replica, which is perhaps not significant. However, if our trust assumptions are more optimistic, i.e. we set f well below (n-1)/2, then we may consider choosing between different possible quorum sizes.

a smaller quorum keeps the closely connected nodes moving forward faster than the rest, while a bigger-sized quorum tries to keep more nodes synchronized

I think you summarized the trade-off very well here.

Depending on the network connectivity ...

We might want to make the commit certificate size a configurable parameter in future. If so, the discussion here is about a good default value for that parameter.

sergefdrv · 2021-12-16T11:32:19Z

sergefdrv
Dec 16, 2021
Author

An important outcome from this discussion for me was that not only any new-view certificate must intersect with any commit certificate, but also that any view-change certificate (a collection of ReqViewChange messages) must intersect with any commit certificate. Otherwise, slower replicas could keep triggering view change. The changes in #222 already take this into account.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit certificate size #215

{{title}}

Replies: 7 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Commit certificate size #215

sergefdrv Nov 30, 2021

Replies: 7 comments · 8 replies

nhoriguchi Dec 1, 2021

sergefdrv Dec 1, 2021 Author

nhoriguchi Dec 2, 2021

sergefdrv Dec 2, 2021 Author

nhoriguchi Dec 3, 2021

sergefdrv Dec 3, 2021 Author

ynamiki Dec 1, 2021

sergefdrv Dec 1, 2021 Author

sergefdrv Dec 1, 2021 Author

sergefdrv Dec 3, 2021 Author

sergefdrv Dec 10, 2021 Author

sergefdrv Dec 10, 2021 Author

luthlee Dec 16, 2021

sergefdrv Dec 16, 2021 Author

sergefdrv Dec 16, 2021 Author

sergefdrv
Nov 30, 2021

Replies: 7 comments 8 replies

nhoriguchi
Dec 1, 2021

sergefdrv Dec 1, 2021
Author

sergefdrv Dec 2, 2021
Author

sergefdrv Dec 3, 2021
Author

ynamiki
Dec 1, 2021

sergefdrv
Dec 1, 2021
Author

sergefdrv Dec 1, 2021
Author

sergefdrv Dec 3, 2021
Author

sergefdrv
Dec 10, 2021
Author

sergefdrv
Dec 10, 2021
Author

luthlee
Dec 16, 2021

sergefdrv Dec 16, 2021
Author

sergefdrv
Dec 16, 2021
Author