ashcherbakov edited this page Nov 23, 2018 · 33 revisions

Plenum Byzantine Fault Tolerant Protocol

Byzantine Fault Tolerance

Byzantine fault tolerance is a sub-field of fault tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem.

Byzantine Generals' Problem

Byzantine Generals' Problem is an agreement problem, first proposed by Marshall Pease, Robert Shostak, and Leslie Lamport in 1980. This problem says that, a byzantine army is preparing to attack a fortified city. Each army troop, led by a general, is camped around an enemy city. Communicating only by messenger, the generals must agree upon a common battle plan. However, one or more of them may be traitors who will try to confuse the others. The problem is to find an algorithm to ensure that the loyal generals will reach agreement. It is shown that, using only oral messages, this problem is solvable if and only if more than two-thirds of the generals are loyal; so a single traitor can confound two loyal generals. With unforgeable written messages, the problem is solvable for any number of generals and possible traitors. Applications of the solutions of Byzantine Generals' Problem can be found in the reliable computer systems.

Byzantine Fault Tolerance (BFT) can be achieved if the loyal (non-faulty) generals have a unanimous agreement on their strategy. The objective of the Byzantine Fault Tolerance is to be able to defend against Byzantine failures.

Byzantine Failures

A Byzantine Fault is an arbitrary fault that occurs during the execution of an algorithm by a distributed system. It encompasses these faults which are commonly referred to as "crash failures" and "send and omission failures". Byzantine failures may be loosely categorized as follows:

  • a failure to take another step in the algorithm, also known as a crash failure
  • a failure may be due to arbitrary faults such as accidental (hardware failure), or malicious (compromised network mode)
  • arbitrary execution of a step other than the one indicated by the algorithm

The Plenum protocol

Plenum is an implementation of RBFT, or Redundant Byzantine Fault Tolerance, a consensus algorithm proposed by Pierre-Louis Aublin, Sonia Ben Mokhtar, and Vivien Quéma. As described in their paper, existing BFT protocols use a special replica, called the "primary", which indicates to other replicas the order in which requests should be processed. This primary can be smartly malicious and degrade the performance of the system without being detected by correct replicas. Our evaluation shows that RBFT achieves similar performance as the most robust protocols when there is no failure and that, under faults, its maximum performance degradation is about 3%, whereas it is, at least, equal to 78% for existing protocols."

RBFT implements a new approach whereby multiple instances of the protocol run simultaneously, a Master instance, and one or more Backup instances. All the instances order the requests, but only the requests ordered by the Master instance are actually executed. All nodes monitor the Master and compare its performance with that of the Backup instances. If the Master does not perform acceptably, it is considered malicious and replaced.

Contributors

This is a collaborative effort from the Evernym team. Have something to add? See a typo? Fork and submit a pull request. We've had fun creating it and would welcome your contribution.

Future improvements

We are continuing to add to this protocol. Upcoming improvements include:

  • Pluggable blacklisting strategies
  • Bootstrapping and authentication schemes (clients and nodes)
  • Persistence adapters (ledger/DHT/database)
  • Enhance tests with more Byzantine faults and categorized malicious behaviors
  • Refactoring for readability, separation of concerns, etc.
  • Take advantage of multiple cores (RAET's LaneStack for inter-process communication)
  • Hardening, load testing, and randomized simulations
  • TBD

References

  1. Pierre-Louis Aublin, Sonia Ben Mokhtar, Vivien Quéma: RBFT: Redundant Byzantine Fault Tolerance In Proceedings of the International Conference on Distributed Computing Systems (ICDCS), Philadelphia, USA, July 2013
  2. RAET: Reliable Asynchronous Event Transport Protocol
  3. NaCl: Networking and Cryptography library
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.