INDY-1290: Design for implementing PBFT view change in indy plenum #673

skhoroshavin · 2018-05-10T16:49:35Z

No description provided.

…blems Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov · 2018-05-11T08:02:22Z

design/view-change.md

+  in _P_ and _Q_ are for view _v_ or less. If this is correct replica sends
+  message _VIEW-CHANGE-ACK(j, i, v+1, vd)_ to new primary.
+
+- New primary collects _VIEW-CHANGE_ + corresponding _2f-1 VIEW-CHANGE-ACK_ 


Should be n-f-1, not 2f-1 since the number of replicas in our pools can be not multiple of 3f+1

Actually n-f-2 in this case. Going to fix this.

ashcherbakov · 2018-05-11T08:04:39Z

design/view-change.md

+- New primary collects _VIEW-CHANGE_ + corresponding _2f-1 VIEW-CHANGE-ACK_ 
+  messages for each replica _i_, which (when considering _VIEW-CHANGE-ACK_ 
+  which new primary could send to itself) form a view-change certificate, so 
+  it can be sure that _2f+1_ replicas have same _VIEW-CHANGE_ message for


n-f must be used instead of 2f+1 everywhere

ashcherbakov · 2018-05-11T08:13:00Z

design/view-change.md

+  message each replica sets stable checkpoint to _(cn, cd)_ from _X_ and
+  records all requests in _X_ as pre-prepared. Backup replicas also broadcast
+  _PREPARE_ messages for these requests. After that processing goes on in
+  normal state. There are some caveats however.


So, does it mean that the View Change assumed to be finished now, and COMMITs will be send and processed in the new View as in normal case?

Yes, it does. Moreover, not only COMMITs, but also PREPAREs will be processed as in normal case.

ashcherbakov · 2018-05-11T08:21:23Z

design/view-change.md

+  resetting ppSeqNo when entering new view, otherwise it will take probably
+  unacceptable amount of time to prove that modified algorithm is correct,
+  or go on without such proof, which will seriously undermine confidence in 
+  it.


More items to be considered against our implementation:

How should we deal with optimistically applied 3PC batches when receiving PRE-PREPARES?
Is it OK if we don't reset any applied (uncommitted) batches before View Change and just apply more if we get not seen PRE-PREPARES during the View Change?

What about View Change on backup instances?
Should we do exactly the same procedure?
BTW it looks like one of the main reasons why we use (viewNo, ppSeqNo) tuples and start ppSeqNo from 0 on each view, is because currently we are trying to come to the same state for Master instance only. On backup instances we just set it to (viewNo+1, 0), and this works since backup instances doesn't have state to be kept in sync except last_ordered_3PC.

@ashcherbakov

We can keep uncommited batches as long as new batches selected into new view have same digests (which is not always true), otherwise we'll still need to reapply them. Given that we perform dynamic validation during pre-prepare this could become a problem, since we'll need to rerun it and it might fail, so further analysis is needed of what should be done in this case. I see several options here:

(best one for now) prove that if we don't have more than f malicious nodes then dynamic validation cannot fail. In this case we can still run it, but if it fails we should just stop processing, probably try to start a view change and log this event with error or critical level

if we prove instead that this can happen even if no more than f nodes are malicious then we should analyse if we can drop such transaction (and all later ones, since it and all later ones surely weren't ordered on normal nodes)

authors of PBFT papers don't rely on dynamic validation during pre-prepare as they solve abstract problem of making sure that all replicas of state machine receive same events in same order. If we also abstract state from BFT protocol we can just move dynamic validation to execution step and treat failures as yet another transition (which doesn't actually change ledger). Also this approach can potentially increase throughput because we'll no longer need to wait for all previous requests during pre-prepare phase before continuing ordering.

I think we could just keep syncing only master instance during view change as we do it now, the only change is that backup instances should be set to (viewNo+1, ppSeqNo) instead of (viewNo+1, 0).

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

lovesh · 2018-05-14T04:38:43Z

design/view-change.md

+- _v_ - viewNo
+- _k_ - ppSeqNo of batch
+- _d_ - digest of batch
+- _h_ - ppSeqNo of stable checkpoint


Checkpoints are redunadant and should be removed, the last ordered request can serve as checkpoint.
PBFT requires checkpoints as a mechanism that is triggered periodically to make sure every node has processed (and persisted) the same requests (across batches) in the same order.
We do not require checkpoints since Pre-Prepare contains expected merkle roots (txn and state merkle tr(i/e)e roots after applying the requests of the PrePrepare) and a Pre-Prepare can be only processed if the previous Pre-Prepare was received and processed successfully (not committed necessarily).
Thus it cannot happen that a node skips processing any PrePrepare.
To handle the case where a node gets a Pre-Prepare(s) and maybe Prepares too but is not getting sufficient Commits, a node can use MESSAGE_REQUEST to request missing 3 phase messages or start catchup once it has seen n Pre-Prepares (and maybe some Prepares too) but not been able to order them or it has >2f COMMIT(s) for a pp_seq_no that is greater than its last ordered sequence number by n

I'm not sure that this is more efficient and easier than Checkpoints, especially when the network is not stable and we have quite a lot of missing requests.

I believe that main purpose of checkpoints is to enable garbage collection of message log, which in turn is used for out-of-order request processing and for message resending, so I don't see how checkpoints can be dropped.

lovesh · 2018-05-14T04:44:52Z

design/view-change.md

+  Side note: authors of PBFT papers don't rely on dynamic validation during
+  pre-prepare as they solve abstract problem of making sure that all replicas
+  of state machine receive same events in same order. If we abstract state
+  from BFT protocol we can just move dynamic validation to execution step and


What happens if a request fails dynamic validation during execution step? Does request still end up in the ledger? If no then other nodes need to be made aware of which request are being rejected and one more round of consensus (a node with outdated code or other differences can end up with wrong result on the ledger). If yes then we end up with invalid requests in the ledger with some marker indicating invalid requests. The latter is/was done in Fabric and some paper like this one.
The objective of doing dynamic validation and request application during consensus that consensus not only guarantees the order of requests but also the final state after the proposal (Pre-Prepare) is processed.

@lovesh no, request will not end up in ledger, it's execution won't modify any state, and reply will indicate failure. Nodes with outdated code that process requests differently should be considered malicious, but I see your point that checking this during pre-prepare phase will catch errors earlier. Anyways this side note was just for evaluation.

lovesh · 2018-05-14T04:55:25Z

design/view-change.md

+  so if node gets commit certificate for some request that it already executed
+  it just skips it. However indy plenum implementation currently resets 
+  ppSeqNo counter upon entering new view, so it cannot be relied upon.
+  One possible workaround is to check if request was already ordered using


We have a mapping to check if a request has been written to the ledger or not, it is stored in seqNoDB of Node. It stores the map of sha256(identifier||req_id) -> txn seq_no

ashcherbakov · 2018-05-14T06:17:18Z

design/view-change.md

+   and all related code, which should simplify next changes.
+
+2. Fix current code so that it doesn't rely on resetting ppSeqNo after
+   view change. After that we can go back to common route C.


What is route C?

@ashcherbakov That was part of draft that went through, thanks for noticing, I'll fix that

ashcherbakov · 2018-05-14T06:17:38Z

design/view-change.md

+   request processing during view change, dropped requests that come after
+   gaps and so on. Check that this implementation works. If it does,
+   and performs better than current view change go ahead to we can take
+   faster route A, otherwise move to route B.


Please mention usage of State Machine and Actor model approach for View Change

@ashcherbakov these are implementation details, and I going to describe them in next section

ashcherbakov · 2018-05-14T06:21:52Z

design/view-change.md

+   request was ordered and disabling all request processing during view
+   change.
+
+2. Indy plenum performs dynamic validation of requests during pre-prepare


Please provide more details in how we are going to deal with applying PrePrepare and dynamic validation in the new View Change.

@ashcherbakov done here and here

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov · 2018-05-16T13:28:21Z

design/view-change.md

+
+## Implementation plan (minimal)
+
+- enable full ordering of batches that were already ordered, make their


Should we do it always, or only when ordering from previous view?

@ashcherbakov I'm afraid this feature will be used in a context where it's impossible to understand whether batch is new or from previous view unless some field is added to messages, so I believe we should do it always. When we stop resetting ppSeqNo in new view this won't be needed, so it's a temporary solution. The only reason I propose to do it now is that I feel it will be more safe to do on current codebase than the other variant.

Unfortunately this is permanent solution as required by PBFT. If it becomes a problem we could try to add some logic to tag somehow messages generated by requests which we try to save from previous view.

Actually I mixed this up with another problem/solution - in order to correctly reapply requests from previous view PBFT uses ppSeqNo which should be unique across views but is reset by current code. Temporary workaround is to use requests idr/digest for detecting reapply.

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov · 2018-05-17T13:03:36Z

design/view-change.md

+  view change simultaneously and each instance should elect new primary
+  independently. Since indy plenum implementation has simple round robin
+  election it should be ok to just run view change on master with each backup
+  just waiting for view change completion on master.


So, do we have some special messages indicating that View Change is finished (like ViewChangeDone in the current protocol), and replicas on backup instance do not order anything new until they receive a quorum of ViewChanageDone from master?
Or do we just have some local state across all instances indicating that view change is finished on master and backup instances can start ordering?

@ashcherbakov PBFT has NEW-VIEW message which is sent by just primary but it is aknowledged by quorum of previously received VIEW-CHANGE messages, and if primary sends different NEW-VIEW messages to different replicas it's guaranteed that the'll spot the difference.

INDY-1290: Analysis of PBFT view-change algorithm and integration pro…

793a288

…blems Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov reviewed May 11, 2018

View reviewed changes

Sergey Khoroshavin added 4 commits May 11, 2018 14:02

INDY-1290: Change quorum numbers from 2f+1 to n-f

79b095a

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Add notes on dynamic validation

1c18ebf

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Workaround for resetting ppSeqNo

e7375af

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Summarize deviations from PBFT view change

0ed2e1f

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

skhoroshavin force-pushed the view-change-design branch from 7391c75 to 0ed2e1f Compare May 11, 2018 15:47

INDY-1290: Add rough roadmap for PBFT view change implementation

e9d0228

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

lovesh reviewed May 14, 2018

View reviewed changes

ashcherbakov reviewed May 14, 2018

View reviewed changes

Sergey Khoroshavin added 10 commits May 14, 2018 13:24

INDY-1290: Clean up some draft remainders

1423e42

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Consistent whitespaces

24e22e1

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Very rough draft of implementation details

b5cde9c

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Add description of network, executor and orderer interfaces

c06cedd

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Add subscription to network messages

da44368

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Add checkpointer and viewchanger interfaces

69024b0

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Add description of lost view-change messages recovery

4a14bc1

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Fix typos

cf94e09

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Add minimal implementation plan

f9e107e

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1290: Variation of implementation plan

e4ea931

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov reviewed May 16, 2018

View reviewed changes

INDY-1290: Add analysis of backup instances

7d3b675

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov reviewed May 17, 2018

View reviewed changes

ashcherbakov approved these changes May 17, 2018

View reviewed changes

ashcherbakov merged commit 608e861 into hyperledger:master May 17, 2018

skhoroshavin deleted the view-change-design branch May 18, 2018 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INDY-1290: Design for implementing PBFT view change in indy plenum #673

INDY-1290: Design for implementing PBFT view change in indy plenum #673

skhoroshavin commented May 10, 2018

ashcherbakov May 11, 2018

skhoroshavin May 11, 2018

ashcherbakov May 11, 2018

ashcherbakov May 11, 2018

skhoroshavin May 11, 2018

ashcherbakov May 11, 2018

skhoroshavin May 11, 2018 •

edited

lovesh May 14, 2018

ashcherbakov May 14, 2018

skhoroshavin May 14, 2018

lovesh May 14, 2018

skhoroshavin May 16, 2018 •

edited

lovesh May 14, 2018

ashcherbakov May 14, 2018

skhoroshavin May 14, 2018

ashcherbakov May 14, 2018

skhoroshavin May 14, 2018

ashcherbakov May 14, 2018

skhoroshavin May 16, 2018

ashcherbakov May 16, 2018

skhoroshavin May 16, 2018 •

edited

ashcherbakov May 17, 2018

skhoroshavin May 17, 2018


		## Implementation plan (minimal)

		- enable full ordering of batches that were already ordered, make their

INDY-1290: Design for implementing PBFT view change in indy plenum #673

INDY-1290: Design for implementing PBFT view change in indy plenum #673

Conversation

skhoroshavin commented May 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skhoroshavin May 11, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skhoroshavin May 16, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skhoroshavin May 16, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skhoroshavin May 11, 2018 •

edited

skhoroshavin May 16, 2018 •

edited

skhoroshavin May 16, 2018 •

edited