Prepare to Nuke Develop (celestiaorg#47)

* state -> step * vote -> v * New version of the algorithm and the proof * New version of the algorithm and the proofs * Added algorithm description * Add algorithm description * Add introduction * Add conclusion * Add conclusion file * fix warnings (caption was defined twice) - only the latter is used anyways (centers captions) - this makes it possible to autom. building the paper * Update grammar * s/state_p/step_p * Address Ismail's comments * intro: language fixes * definitions: language fixes * consensus: various fixes * proof: some fixes * try to improve reviewability * \eq -> = * textwrap to 79 * various minor fixes * proof: fix itemization * proof: more minor fixes * proof: timeouts are functions * proof: fixes to lemma6 * Intro changes and improve title page * Add Marko and Ming to acks * add readme * Format algorithm correctly Clarify condition semantic and timeouts Improve descriptions * patform -> platform * Ensure that rules are mutually exclusive - various clarifications and small improvements * Release v0.6 * small nits for smoother readability
sunriselayer · Sep 3, 2019 · 87abbf7 · 87abbf7
1 parent b362894
commit 87abbf7
Show file tree

Hide file tree

Showing 21 changed files with 1,097 additions and 353 deletions.
diff --git a/README.md b/README.md
@@ -1,25 +1,7 @@
-# Tendermint-spec
+# Tendermint Spec
 
-The repository contains the specification (and the proofs) of the Tendermint
-consensus protocol.
-
-## How to install Latex on Mac OS
-
-MacTex is Latex distribution for Mac OS. You can download it [here](http://www.tug.org/mactex/mactex-download.html).
-
-Popular IDE for Latex-based projects is TexStudio. It can be downloaded 
-[here](https://www.texstudio.org/).
-
-## How to build project 
-
-In order to compile the latex files (and write bibliography), execute 
-
-`$ pdflatex paper` <br/>
-`$ bibtex paper` <br/>
-`$ pdflatex paper` <br/>
-`$ pdflatex paper` <br/>
-
-The generated file is paper.pdf. You can open it with
-
-`$ open paper.pdf`
+This repository contains the paper describing the Tendermint consensus
+algorithm, including formal proofs of its safety and liveness.
 
+For the pdf, see the [latest
+release](https://github.com/tendermint/spec/releases).
diff --git a/consensus.tex b/consensus.tex
diff --git a/IEEEtran.bst → consensus/IEEEtran.bst b/IEEEtran.bst → consensus/IEEEtran.bst
diff --git a/IEEEtran.cls → consensus/IEEEtran.cls b/IEEEtran.cls → consensus/IEEEtran.cls
diff --git a/consensus/README.md b/consensus/README.md
@@ -0,0 +1,25 @@
+# Tendermint-spec
+
+The repository contains the specification (and the proofs) of the Tendermint
+consensus protocol.
+
+## How to install Latex on Mac OS
+
+MacTex is Latex distribution for Mac OS. You can download it [here](http://www.tug.org/mactex/mactex-download.html).
+
+Popular IDE for Latex-based projects is TexStudio. It can be downloaded 
+[here](https://www.texstudio.org/).
+
+## How to build project 
+
+In order to compile the latex files (and write bibliography), execute 
+
+`$ pdflatex paper` <br/>
+`$ bibtex paper` <br/>
+`$ pdflatex paper` <br/>
+`$ pdflatex paper` <br/>
+
+The generated file is paper.pdf. You can open it with
+
+`$ open paper.pdf`
+
diff --git a/algorithmicplus.sty → consensus/algorithmicplus.sty b/algorithmicplus.sty → consensus/algorithmicplus.sty
diff --git a/consensus/conclusion.tex b/consensus/conclusion.tex
@@ -0,0 +1,16 @@
+\section{Conclusion} \label{sec:conclusion}
+
+We have proposed a new Byzantine-fault tolerant consensus algorithm that is the
+core of the Tendermint BFT SMR platform. The algorithm is designed for the wide
+area network with high number of mutually distrusted nodes that communicate
+over gossip based peer-to-peer network. It has only a single mode of execution
+and the communication pattern is very similar to the "normal" case of the
+state-of-the art PBFT algorithm. The algorithm ensures termination with a novel
+mechanism that takes advantage of the gossip based communication between nodes.
+The proposed algorithm and the proofs are simple and elegant, and we believe
+that this makes it easier to understand and implement correctly.   
+
+\section*{Acknowledgment}
+
+We would like to thank Anton Kaliaev, Ismail Khoffi and Dahlia Malkhi for comments on an earlier version of the paper. We also want to thank Marko Vukolic, Ming Chuan Lin, Maria Potop-Butucaru, Sara Tucci, Antonella Del Pozzo and Yackolley Amoussou-Guenou for pointing out the liveness issues
+in the previous version of the algorithm. Finally, we want to thank the Tendermint team members and all project contributors for making Tendermint such a great platform.  
diff --git a/consensus/consensus.tex b/consensus/consensus.tex
diff --git a/consensus/definitions.tex b/consensus/definitions.tex
@@ -0,0 +1,117 @@
+\section{Definitions} \label{sec:definitions}
+
+\subsection{Model}
+
+We consider a system of processes that communicate by exchanging messages.
+Processes can be correct or faulty, where a faulty process can behave in an
+arbitrary way, i.e., Byzantine faults. We assume that each process
+has some amount of voting power (voting power of a process can be $0$).
+Processes in our model are not part of a single administrative domain;
+therefore we cannot enforce a direct network connectivity between all
+processes. Instead, we assume that each process is connected to a subset of
+processes called peers, such that there is an indirect communication channel
+between all correct processes. Communication between processes is established
+using a gossip protocol \cite{Dem1987:gossip}.
+
+Formally, we model the network communication using the \emph{partially
+synchronous system model}~\cite{DLS88:jacm}: in all executions of the system
+there is a bound $\Delta$ and an instant GST (Global Stabilization Time) such
+that all communication among correct processes after GST is reliable and
+$\Delta$-timely, i.e., if a correct process $p$ sends message $m$ at time $t
+\ge GST$ to correct process $q$, then $q$ will receive $m$ before $t +
+\Delta$\footnote{Note that as we do not assume direct communication channels
+    among all correct processes, this implies that before the message $m$
+    reaches $q$, it might pass through a number of correct processes that will
+forward the message $m$ using gossip protocol towards $q$.}. Messages among
+correct processes can be delayed, dropped or duplicated before GST.
+Spoofing/impersonation attacks are assumed to be impossible at all times due to
+the use of public-key cryptography. The bound $\Delta$ and GST are system
+parameters whose values are not required to be known for the safety of our
+algorithm. Termination of the algorithm is guaranteed within a bounded duration
+after GST.  In practice, the algorithm will work correctly in the slightly
+weaker variant of the model where the system alternates between (long enough)
+good periods (corresponds to the \emph{after} GST period where system is
+reliable and $\Delta$-timely) and bad periods (corresponds to the period
+\emph{before} GST during which the system is asynchronous and messages can be
+lost), but consideration of the GST model simplifies the discussion.  
+
+We assume that process steps (which might include sending and receiving
+messages) take zero time.  Processes are equipped with clocks so they can
+measure local timeouts.  All protocol messages are signed, i.e., when a correct
+process $q$ receives a signed message $m$ from its peer, the process $q$ can
+verify who was the original sender of the message $m$.
+
+The details of the Tendermint gossip protocol will be discussed in a separate
+technical report. For the sake of this paper it is sufficient to assume that
+messages are being gossiped between processes and the following property holds
+(in addition to the partial synchrony network assumptions):
+
+\begin{itemize} \item \emph{Gossip communication:} If a correct process $p$
+    receives some message $m$ at time $t$, all correct processes will receive
+    $m$ before $max\{t, GST\} + \Delta$.    \end{itemize}
+
+
+%Messages that are being gossiped are created by the consensus layer. We can
+    %think about consensus protocol as a content creator, which %defines what
+    %messages should be disseminated using the gossip protocol. A correct
+    %process creates the message for dissemination either i) %explicitly, by
+    %invoking \emph{send} function as part of the consensus protocol or ii)
+    %implicitly, by receiving a message from some other %process. Note that in
+    %the case ii) gossiping of messages is implicit, i.e., it happens without
+    %explicit send clause in the consensus algorithm %whenever a correct
+    %process receives some messages in the consensus algorithm\footnote{If a
+    %message is received by a correct process at %the consensus level then it
+    %is considered valid from the protocol point of view, i.e., it has a
+    %correct signature, a proper message structure %and a valid height and
+    %round number.}. 
+
+%\item Processes keep resending messages (in case of failures or message loss)
+    %until all its peers get them. This ensures that every message %sent or
+    %received by a correct process is eventually received by all correct
+    %processes. 
+
+\subsection{State Machine Replication}
+
+State machine replication (SMR) is a general approach for replicating services
+modeled as a deterministic state machine~\cite{Lam78:cacm,Sch90:survey}.  The
+key idea of this approach is to guarantee that all replicas start in the same
+state and then apply requests from clients in the same order, thereby
+guaranteeing that the replicas' states will not diverge.  Following
+Schneider~\cite{Sch90:survey}, we note that the following is key for
+implementing a replicated state machine tolerant to (Byzantine) faults:
+
+\begin{itemize} \item \emph{Replica Coordination.} All [non-faulty] replicas
+    receive and process the same sequence of requests.  \end{itemize}
+
+Moreover, as Schneider also notes, this property can be decomposed into two
+parts, \emph{Agreement} and \emph{Order}: Agreement requires all (non-faulty)
+replicas to receive all requests, and Order requires that the order of received
+requests is the same at all replicas.
+
+There is an additional requirement that needs to be ensured by Byzantine
+tolerant state machine replication: only requests (called transactions in the
+Tendermint terminology) proposed by clients are executed. In Tendermint,
+transaction verification is the responsibility of the service that is being
+replicated; upon receiving a transaction from the client, the Tendermint
+process will ask the service if the request is valid, and only valid requests
+will be processed. 
+
+ \subsection{Consensus} \label{sec:consensus}
+
+Tendermint solves state machine replication by sequentially executing consensus
+instances to agree on each block of transactions that are
+then executed by the service being replicated. We consider a variant of the
+Byzantine consensus problem called Validity Predicate-based Byzantine consensus
+that is motivated by blockchain systems~\cite{GLR17:red-belly-bc}. The problem
+is defined by an agreement, a termination, and a validity property.
+
+ \begin{itemize} \item \emph{Agreement:} No two correct processes decide on
+         different values.  \item \emph{Termination:} All correct processes
+         eventually decide on a value.  \item \emph{Validity:} A decided value
+             is valid, i.e., it satisfies the predefined predicate denoted
+             \emph{valid()}.  \end{itemize}
+
+ This variant of the Byzantine consensus problem has an application-specific
+ \emph{valid()} predicate to indicate whether a value is valid. In the context
+ of blockchain systems, for example, a value is not valid if it does not
+ contain an appropriate hash of the last value (block) added to the blockchain.
diff --git a/homodel.sty → consensus/homodel.sty b/homodel.sty → consensus/homodel.sty
@@ -3,17 +3,30 @@
 \newcommand{\AS}{\mbox{\it AS}}
 \newcommand{\SK}{\mbox{\it SK}}
 \newcommand{\SHO}{\mbox{\it SHO}}
+\newcommand{\AHO}{\mbox{\it AHO}}
+\newcommand{\CONS}{\mbox{\it CONS}}
+\newcommand{\K}{\mbox{\it K}}
+
 \newcommand{\Alg}{\mathcal{A}}
 \newcommand{\Pred}{\mathcal{P}}
 \newcommand{\Spr}{S_p^r}
 \newcommand{\Tpr}{T_p^r}
 \newcommand{\mupr}{\vec{\mu}_p^{\,r}}
 
+\newcommand{\MSpr}{S_p^{\rho}}
+\newcommand{\MTpr}{T_p^{\rho}}
+
+
+
 \newconstruct{\SEND}{$\Spr$:}{}{\ENDSEND}{}
 \newconstruct{\TRAN}{$\Tpr$:}{}{\ENDTRAN}{}
 \newconstruct{\ROUND}{\textbf{Round}}{\!\textbf{:}}{\ENDROUND}{}
+\newconstruct{\VARIABLES}{\textbf{Variables:}}{}{\ENDVARIABLES}{}
+\newconstruct{\INIT}{\textbf{Initialization:}}{}{\ENDINIT}{}
+
+\newconstruct{\MSEND}{$\MSpr$:}{}{\ENDMSEND}{}
+\newconstruct{\MTRAN}{$\MTpr$:}{}{\ENDMTRAN}{}
+
 \newconstruct{\SROUND}{\textbf{Selection Round}}{\!\textbf{:}}{\ENDSROUND}{}
 \newconstruct{\VROUND}{\textbf{Validation Round}}{\!\textbf{:}}{\ENDVROUND}{}
 \newconstruct{\DROUND}{\textbf{Decision Round}}{\!\textbf{:}}{\ENDDROUND}{}
-\newconstruct{\VARIABLES}{\textbf{Variables:}}{}{\ENDVARIABLES}{}
-\newconstruct{\INIT}{\textbf{Initialization:}}{}{\ENDINIT}{}
diff --git a/consensus/intro.tex b/consensus/intro.tex
@@ -0,0 +1,138 @@
+\section{Introduction} \label{sec:tendermint}
+
+Consensus is a fundamental problem in distributed computing. It
+is important because of it's role in State Machine Replication (SMR), a generic
+approach for replicating services that can be modeled as a deterministic state
+machine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is that
+service replicas start in the same initial state, and then execute requests
+(also called transactions) in the same order; thereby guaranteeing that
+replicas stay in sync with each other. The role of consensus in the SMR
+approach is ensuring that all replicas receive transactions in the same order.
+Traditionally, deployments of SMR based systems are in data-center settings
+(local area network), have a small number of replicas (three to seven) and are
+typically part of a single administration domain (e.g., Chubby
+\cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as more
+general forms of failure (in particular, malicious or Byzantine faults) are
+considered to occur with only negligible probability.  
+
+The success of cryptocurrencies and blockchain systems in recent years (e.g.,
+\cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges on
+the design and deployment of SMR based systems: reaching agreement over wide
+area network, among large number of nodes (hundreds or thousands) that are not
+part of the same administrative domain, and where a subset of nodes can behave
+maliciously (Byzantine faults). Furthermore, contrary to the previous
+data-center deployments where nodes are fully connected to each other, in
+blockchain systems, a node is only connected to a subset of other nodes, so
+communication is achieved by gossip-based peer-to-peer protocols. 
+The new requirements demand designs and algorithms that are not necessarily
+present in the classical academic literature on Byzantine fault tolerant
+consensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary 
+focus was different setup. 
+
+In this paper we describe a novel Byzantine-fault tolerant consensus algorithm
+that is the core of the BFT SMR platform called Tendermint\footnote{The
+	Tendermint platform is available open source at
+	https://github.com/tendermint/tendermint.}. The Tendermint platform consists of
+a high-performance BFT SMR implementation written in Go, a flexible interface
+for
+building arbitrary deterministic applications above the consensus, and a suite
+of tools for deployment and management.  
+
+The Tendermint consensus algorithm is inspired by the PBFT SMR
+algorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (the
+Algorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermint
+proceeds in
+rounds\footnote{Tendermint is not presented in the basic round model of
+	\cite{DLS88:jacm}. Furthermore, we use the term round differently than in
+	\cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication
+	steps instead of a single communication step in \cite{DLS88:jacm}.}, where each
+round has a dedicated proposer (also called coordinator or
+leader) and a process proceeds to a new round as part of normal
+processing (not only in case the proposer is faulty or suspected as being faulty
+by enough processes as in PBFT).  
+The communication pattern of each round is very similar to the "normal" case
+of PBFT. Therefore, in preferable conditions (correct proposer, timely and
+reliable communication between correct processes), Tendermint decides in three
+communication steps (the same as PBFT). 
+
+The major novelty and contribution of the Tendermint consensus algorithm is a
+new termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, the
+existing BFT consensus (and SMR) algorithms for the partially synchronous
+system model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm},
+\cite{MA06:tdsc}) typically relies on the communication pattern illustrated in
+Figure~\ref{ch3:fig:coordinator-change} for termination. The
+Figure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged during
+the proposer change when processes start a new round\footnote{There is no
+	consistent terminology in the distributed computing terminology on naming
+	sequence of communication steps that corresponds to a logical unit. It is
+	sometimes called a round, phase or a view.}. It guarantees that eventually (ie.
+after some Global Stabilization Time, GST), there exists a round with a correct
+proposer that will bring the system into a univalent configuration.
+Intuitively, in a round in which the proposed value is accepted
+by all correct processes, and communication between correct processes is
+timely and reliable, all correct processes decide.   
+
+
+\begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering
+	\begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$}
+		\rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$}
+		\round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$}
+		\rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x,
+			[v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag}
+	\vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the
+		new proposer.} \label{ch3:fig:coordinator-change} \end{figure}  
+
+To ensure that a proposed value is accepted by all correct
+processes\footnote{The proposed value is not blindly accepted by correct
+	processes in BFT algorithms. A correct process always verifies if the proposed
+	value is safe to be accepted so that safety properties of consensus are not
+	violated.}
+a proposer will 1) build the global state by receiving messages from other
+processes, 2) select the safe value to propose and 3) send the selected value
+together with the signed messages
+received in the first step to support it. The
+value $v_i$ that a correct process sends to the next proposer normally
+corresponds to a value the process considers as acceptable for a decision: 
+
+\begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is
+	not the value itself but a set of $2f+1$ signed messages with the same
+	value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value
+	itself is being sent.  \end{itemize}
+
+In both cases, using this mechanism in our system model (ie. high
+number of nodes over gossip based network) would have high communication
+complexity that increases with the number of processes: in the first case as
+the message sent depends on the total number of processes, and in the second
+case as the value (block of transactions) is sent by each process. The set of
+messages received in the first step are normally piggybacked on the proposal
+message (in the Figure~\ref{ch3:fig:coordinator-change} denoted with
+$[v_{1..4}]$) to justify the choice of the selected value $x$. Note that
+sending this message also does not scale with the number of processes in the
+system.   
+
+We designed a novel termination mechanism for Tendermint that better suits the
+system model we consider. It does not require additional communication (neither
+sending new messages nor piggybacking information on the existing messages) and
+it is fully based on the communication pattern that is very similar to the
+normal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode of
+execution in Tendermint, i.e., there is no separation between the normal and
+the recovery mode, which is the case in other PBFT-like protocols (e.g.,
+\cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe
+this makes Tendermint simpler to understand and implement correctly. 
+
+Note that the orthogonal approach for reducing message complexity in order to
+improve
+scalability and decentralization (number of processes) of BFT consensus
+algorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS)
+signatures \cite{BLS2001:crypto}) as done for example in SBFT
+\cite{Gue2018:sbft}.  
+
+The remainder of the paper is as follows: Section~\ref{sec:definitions} defines
+the system model and gives the problem definitions. Tendermint
+consensus algorithm is presented in Section~\ref{sec:tendermint} and the
+proofs are given in Section~\ref{sec:proof}. We conclude in
+Section~\ref{sec:conclusion}.  
+
+
+
+
diff --git a/latex8.bst → consensus/latex8.bst b/latex8.bst → consensus/latex8.bst
diff --git a/latex8.sty → consensus/latex8.sty b/latex8.sty → consensus/latex8.sty