-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INDY-1290: Design for implementing PBFT view change in indy plenum #673
INDY-1290: Design for implementing PBFT view change in indy plenum #673
Conversation
…blems Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
design/view-change.md
Outdated
in _P_ and _Q_ are for view _v_ or less. If this is correct replica sends | ||
message _VIEW-CHANGE-ACK(j, i, v+1, vd)_ to new primary. | ||
|
||
- New primary collects _VIEW-CHANGE_ + corresponding _2f-1 VIEW-CHANGE-ACK_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be n-f-1
, not 2f-1
since the number of replicas in our pools can be not multiple of 3f+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually n-f-2
in this case. Going to fix this.
design/view-change.md
Outdated
- New primary collects _VIEW-CHANGE_ + corresponding _2f-1 VIEW-CHANGE-ACK_ | ||
messages for each replica _i_, which (when considering _VIEW-CHANGE-ACK_ | ||
which new primary could send to itself) form a view-change certificate, so | ||
it can be sure that _2f+1_ replicas have same _VIEW-CHANGE_ message for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n-f
must be used instead of 2f+1
everywhere
design/view-change.md
Outdated
message each replica sets stable checkpoint to _(cn, cd)_ from _X_ and | ||
records all requests in _X_ as pre-prepared. Backup replicas also broadcast | ||
_PREPARE_ messages for these requests. After that processing goes on in | ||
normal state. There are some caveats however. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does it mean that the View Change assumed to be finished now, and COMMITs will be send and processed in the new View as in normal case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does. Moreover, not only COMMITs, but also PREPAREs will be processed as in normal case.
design/view-change.md
Outdated
resetting ppSeqNo when entering new view, otherwise it will take probably | ||
unacceptable amount of time to prove that modified algorithm is correct, | ||
or go on without such proof, which will seriously undermine confidence in | ||
it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More items to be considered against our implementation:
- How should we deal with optimistically applied 3PC batches when receiving PRE-PREPARES?
Is it OK if we don't reset any applied (uncommitted) batches before View Change and just apply more if we get not seen PRE-PREPARES during the View Change? - What about View Change on backup instances?
Should we do exactly the same procedure?
BTW it looks like one of the main reasons why we use (viewNo, ppSeqNo) tuples and start ppSeqNo from 0 on each view, is because currently we are trying to come to the same state for Master instance only. On backup instances we just set it to (viewNo+1, 0), and this works since backup instances doesn't have state to be kept in sync except last_ordered_3PC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can keep uncommited batches as long as new batches selected into new view have same digests (which is not always true), otherwise we'll still need to reapply them. Given that we perform dynamic validation during pre-prepare this could become a problem, since we'll need to rerun it and it might fail, so further analysis is needed of what should be done in this case. I see several options here:
- (best one for now) prove that if we don't have more than f malicious nodes then dynamic validation cannot fail. In this case we can still run it, but if it fails we should just stop processing, probably try to start a view change and log this event with error or critical level
- if we prove instead that this can happen even if no more than f nodes are malicious then we should analyse if we can drop such transaction (and all later ones, since it and all later ones surely weren't ordered on normal nodes)
- authors of PBFT papers don't rely on dynamic validation during pre-prepare as they solve abstract problem of making sure that all replicas of state machine receive same events in same order. If we also abstract state from BFT protocol we can just move dynamic validation to execution step and treat failures as yet another transition (which doesn't actually change ledger). Also this approach can potentially increase throughput because we'll no longer need to wait for all previous requests during pre-prepare phase before continuing ordering.
- I think we could just keep syncing only master instance during view change as we do it now, the only change is that backup instances should be set to (viewNo+1, ppSeqNo) instead of (viewNo+1, 0).
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
7391c75
to
0ed2e1f
Compare
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
- _v_ - viewNo | ||
- _k_ - ppSeqNo of batch | ||
- _d_ - digest of batch | ||
- _h_ - ppSeqNo of stable checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checkpoints are redunadant and should be removed, the last ordered request can serve as checkpoint.
PBFT requires checkpoints as a mechanism that is triggered periodically to make sure every node has processed (and persisted) the same requests (across batches) in the same order.
We do not require checkpoints since Pre-Prepare contains expected merkle roots (txn and state merkle tr(i/e)e roots after applying the requests of the PrePrepare) and a Pre-Prepare can be only processed if the previous Pre-Prepare was received and processed successfully (not committed necessarily).
Thus it cannot happen that a node skips processing any PrePrepare.
To handle the case where a node gets a Pre-Prepare(s) and maybe Prepares too but is not getting sufficient Commits, a node can use MESSAGE_REQUEST
to request missing 3 phase messages or start catchup once it has seen n
Pre-Prepares (and maybe some Prepares too) but not been able to order them or it has >2f COMMIT(s) for a pp_seq_no
that is greater than its last ordered sequence number by n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this is more efficient and easier than Checkpoints, especially when the network is not stable and we have quite a lot of missing requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that main purpose of checkpoints is to enable garbage collection of message log, which in turn is used for out-of-order request processing and for message resending, so I don't see how checkpoints can be dropped.
Side note: authors of PBFT papers don't rely on dynamic validation during | ||
pre-prepare as they solve abstract problem of making sure that all replicas | ||
of state machine receive same events in same order. If we abstract state | ||
from BFT protocol we can just move dynamic validation to execution step and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a request fails dynamic validation during execution step? Does request still end up in the ledger? If no then other nodes need to be made aware of which request are being rejected and one more round of consensus (a node with outdated code or other differences can end up with wrong result on the ledger). If yes then we end up with invalid requests in the ledger with some marker indicating invalid requests. The latter is/was done in Fabric and some paper like this one.
The objective of doing dynamic validation and request application during consensus that consensus not only guarantees the order of requests but also the final state after the proposal (Pre-Prepare) is processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lovesh no, request will not end up in ledger, it's execution won't modify any state, and reply will indicate failure. Nodes with outdated code that process requests differently should be considered malicious, but I see your point that checking this during pre-prepare phase will catch errors earlier. Anyways this side note was just for evaluation.
design/view-change.md
Outdated
so if node gets commit certificate for some request that it already executed | ||
it just skips it. However indy plenum implementation currently resets | ||
ppSeqNo counter upon entering new view, so it cannot be relied upon. | ||
One possible workaround is to check if request was already ordered using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a mapping to check if a request has been written to the ledger or not, it is stored in seqNoDB
of Node
. It stores the map of sha256(identifier||req_id) -> txn seq_no
design/view-change.md
Outdated
and all related code, which should simplify next changes. | ||
|
||
2. Fix current code so that it doesn't rely on resetting ppSeqNo after | ||
view change. After that we can go back to common route C. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is route C?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashcherbakov That was part of draft that went through, thanks for noticing, I'll fix that
request processing during view change, dropped requests that come after | ||
gaps and so on. Check that this implementation works. If it does, | ||
and performs better than current view change go ahead to we can take | ||
faster route A, otherwise move to route B. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention usage of State Machine and Actor model approach for View Change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashcherbakov these are implementation details, and I going to describe them in next section
request was ordered and disabling all request processing during view | ||
change. | ||
|
||
2. Indy plenum performs dynamic validation of requests during pre-prepare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide more details in how we are going to deal with applying PrePrepare and dynamic validation in the new View Change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashcherbakov done here and here
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
|
||
## Implementation plan (minimal) | ||
|
||
- enable full ordering of batches that were already ordered, make their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do it always, or only when ordering from previous view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashcherbakov I'm afraid this feature will be used in a context where it's impossible to understand whether batch is new or from previous view unless some field is added to messages, so I believe we should do it always. When we stop resetting ppSeqNo in new view this won't be needed, so it's a temporary solution. The only reason I propose to do it now is that I feel it will be more safe to do on current codebase than the other variant.
Unfortunately this is permanent solution as required by PBFT. If it becomes a problem we could try to add some logic to tag somehow messages generated by requests which we try to save from previous view.
Actually I mixed this up with another problem/solution - in order to correctly reapply requests from previous view PBFT uses ppSeqNo which should be unique across views but is reset by current code. Temporary workaround is to use requests idr/digest for detecting reapply.
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
view change simultaneously and each instance should elect new primary | ||
independently. Since indy plenum implementation has simple round robin | ||
election it should be ok to just run view change on master with each backup | ||
just waiting for view change completion on master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do we have some special messages indicating that View Change is finished (like ViewChangeDone in the current protocol), and replicas on backup instance do not order anything new until they receive a quorum of ViewChanageDone from master?
Or do we just have some local state across all instances indicating that view change is finished on master and backup instances can start ordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashcherbakov PBFT has NEW-VIEW message which is sent by just primary but it is aknowledged by quorum of previously received VIEW-CHANGE messages, and if primary sends different NEW-VIEW messages to different replicas it's guaranteed that the'll spot the difference.
No description provided.