Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDY-1290: Design for implementing PBFT view change in indy plenum #673

Merged
merged 17 commits into from
May 17, 2018

Conversation

skhoroshavin
Copy link
Member

No description provided.

…blems

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
in _P_ and _Q_ are for view _v_ or less. If this is correct replica sends
message _VIEW-CHANGE-ACK(j, i, v+1, vd)_ to new primary.

- New primary collects _VIEW-CHANGE_ + corresponding _2f-1 VIEW-CHANGE-ACK_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be n-f-1, not 2f-1 since the number of replicas in our pools can be not multiple of 3f+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually n-f-2 in this case. Going to fix this.

- New primary collects _VIEW-CHANGE_ + corresponding _2f-1 VIEW-CHANGE-ACK_
messages for each replica _i_, which (when considering _VIEW-CHANGE-ACK_
which new primary could send to itself) form a view-change certificate, so
it can be sure that _2f+1_ replicas have same _VIEW-CHANGE_ message for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n-f must be used instead of 2f+1 everywhere

message each replica sets stable checkpoint to _(cn, cd)_ from _X_ and
records all requests in _X_ as pre-prepared. Backup replicas also broadcast
_PREPARE_ messages for these requests. After that processing goes on in
normal state. There are some caveats however.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does it mean that the View Change assumed to be finished now, and COMMITs will be send and processed in the new View as in normal case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. Moreover, not only COMMITs, but also PREPAREs will be processed as in normal case.

resetting ppSeqNo when entering new view, otherwise it will take probably
unacceptable amount of time to prove that modified algorithm is correct,
or go on without such proof, which will seriously undermine confidence in
it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More items to be considered against our implementation:

  1. How should we deal with optimistically applied 3PC batches when receiving PRE-PREPARES?
    Is it OK if we don't reset any applied (uncommitted) batches before View Change and just apply more if we get not seen PRE-PREPARES during the View Change?
  2. What about View Change on backup instances?
    Should we do exactly the same procedure?
    BTW it looks like one of the main reasons why we use (viewNo, ppSeqNo) tuples and start ppSeqNo from 0 on each view, is because currently we are trying to come to the same state for Master instance only. On backup instances we just set it to (viewNo+1, 0), and this works since backup instances doesn't have state to be kept in sync except last_ordered_3PC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashcherbakov

  1. We can keep uncommited batches as long as new batches selected into new view have same digests (which is not always true), otherwise we'll still need to reapply them. Given that we perform dynamic validation during pre-prepare this could become a problem, since we'll need to rerun it and it might fail, so further analysis is needed of what should be done in this case. I see several options here:
  • (best one for now) prove that if we don't have more than f malicious nodes then dynamic validation cannot fail. In this case we can still run it, but if it fails we should just stop processing, probably try to start a view change and log this event with error or critical level
  • if we prove instead that this can happen even if no more than f nodes are malicious then we should analyse if we can drop such transaction (and all later ones, since it and all later ones surely weren't ordered on normal nodes)
  • authors of PBFT papers don't rely on dynamic validation during pre-prepare as they solve abstract problem of making sure that all replicas of state machine receive same events in same order. If we also abstract state from BFT protocol we can just move dynamic validation to execution step and treat failures as yet another transition (which doesn't actually change ledger). Also this approach can potentially increase throughput because we'll no longer need to wait for all previous requests during pre-prepare phase before continuing ordering.
  1. I think we could just keep syncing only master instance during view change as we do it now, the only change is that backup instances should be set to (viewNo+1, ppSeqNo) instead of (viewNo+1, 0).

Sergey Khoroshavin added 4 commits May 11, 2018 14:02
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
- _v_ - viewNo
- _k_ - ppSeqNo of batch
- _d_ - digest of batch
- _h_ - ppSeqNo of stable checkpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpoints are redunadant and should be removed, the last ordered request can serve as checkpoint.
PBFT requires checkpoints as a mechanism that is triggered periodically to make sure every node has processed (and persisted) the same requests (across batches) in the same order.
We do not require checkpoints since Pre-Prepare contains expected merkle roots (txn and state merkle tr(i/e)e roots after applying the requests of the PrePrepare) and a Pre-Prepare can be only processed if the previous Pre-Prepare was received and processed successfully (not committed necessarily).
Thus it cannot happen that a node skips processing any PrePrepare.
To handle the case where a node gets a Pre-Prepare(s) and maybe Prepares too but is not getting sufficient Commits, a node can use MESSAGE_REQUEST to request missing 3 phase messages or start catchup once it has seen n Pre-Prepares (and maybe some Prepares too) but not been able to order them or it has >2f COMMIT(s) for a pp_seq_no that is greater than its last ordered sequence number by n

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is more efficient and easier than Checkpoints, especially when the network is not stable and we have quite a lot of missing requests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that main purpose of checkpoints is to enable garbage collection of message log, which in turn is used for out-of-order request processing and for message resending, so I don't see how checkpoints can be dropped.

Side note: authors of PBFT papers don't rely on dynamic validation during
pre-prepare as they solve abstract problem of making sure that all replicas
of state machine receive same events in same order. If we abstract state
from BFT protocol we can just move dynamic validation to execution step and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a request fails dynamic validation during execution step? Does request still end up in the ledger? If no then other nodes need to be made aware of which request are being rejected and one more round of consensus (a node with outdated code or other differences can end up with wrong result on the ledger). If yes then we end up with invalid requests in the ledger with some marker indicating invalid requests. The latter is/was done in Fabric and some paper like this one.
The objective of doing dynamic validation and request application during consensus that consensus not only guarantees the order of requests but also the final state after the proposal (Pre-Prepare) is processed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lovesh no, request will not end up in ledger, it's execution won't modify any state, and reply will indicate failure. Nodes with outdated code that process requests differently should be considered malicious, but I see your point that checking this during pre-prepare phase will catch errors earlier. Anyways this side note was just for evaluation.

so if node gets commit certificate for some request that it already executed
it just skips it. However indy plenum implementation currently resets
ppSeqNo counter upon entering new view, so it cannot be relied upon.
One possible workaround is to check if request was already ordered using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a mapping to check if a request has been written to the ledger or not, it is stored in seqNoDB of Node. It stores the map of sha256(identifier||req_id) -> txn seq_no

and all related code, which should simplify next changes.

2. Fix current code so that it doesn't rely on resetting ppSeqNo after
view change. After that we can go back to common route C.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is route C?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashcherbakov That was part of draft that went through, thanks for noticing, I'll fix that

request processing during view change, dropped requests that come after
gaps and so on. Check that this implementation works. If it does,
and performs better than current view change go ahead to we can take
faster route A, otherwise move to route B.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention usage of State Machine and Actor model approach for View Change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashcherbakov these are implementation details, and I going to describe them in next section

request was ordered and disabling all request processing during view
change.

2. Indy plenum performs dynamic validation of requests during pre-prepare
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide more details in how we are going to deal with applying PrePrepare and dynamic validation in the new View Change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sergey Khoroshavin added 10 commits May 14, 2018 13:24
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

## Implementation plan (minimal)

- enable full ordering of batches that were already ordered, make their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do it always, or only when ordering from previous view?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashcherbakov I'm afraid this feature will be used in a context where it's impossible to understand whether batch is new or from previous view unless some field is added to messages, so I believe we should do it always. When we stop resetting ppSeqNo in new view this won't be needed, so it's a temporary solution. The only reason I propose to do it now is that I feel it will be more safe to do on current codebase than the other variant.

Unfortunately this is permanent solution as required by PBFT. If it becomes a problem we could try to add some logic to tag somehow messages generated by requests which we try to save from previous view.

Actually I mixed this up with another problem/solution - in order to correctly reapply requests from previous view PBFT uses ppSeqNo which should be unique across views but is reset by current code. Temporary workaround is to use requests idr/digest for detecting reapply.

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>
view change simultaneously and each instance should elect new primary
independently. Since indy plenum implementation has simple round robin
election it should be ok to just run view change on master with each backup
just waiting for view change completion on master.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do we have some special messages indicating that View Change is finished (like ViewChangeDone in the current protocol), and replicas on backup instance do not order anything new until they receive a quorum of ViewChanageDone from master?
Or do we just have some local state across all instances indicating that view change is finished on master and backup instances can start ordering?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashcherbakov PBFT has NEW-VIEW message which is sent by just primary but it is aknowledged by quorum of previously received VIEW-CHANGE messages, and if primary sends different NEW-VIEW messages to different replicas it's guaranteed that the'll spot the difference.

@ashcherbakov ashcherbakov merged commit 608e861 into hyperledger:master May 17, 2018
@skhoroshavin skhoroshavin deleted the view-change-design branch May 18, 2018 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants