Skip to content

Commit

Permalink
Prepare to Nuke Develop (celestiaorg#47)
Browse files Browse the repository at this point in the history
* state -> step

* vote -> v

* New version of the algorithm and the proof

* New version of the algorithm and the proofs

* Added algorithm description

* Add algorithm description

* Add introduction

* Add conclusion

* Add conclusion file

* fix warnings (caption was defined twice)

- only the latter is used anyways (centers captions)
- this makes it possible to autom. building the paper

* Update grammar

* s/state_p/step_p

* Address Ismail's comments

* intro: language fixes

* definitions: language fixes

* consensus: various fixes

* proof: some fixes

* try to improve reviewability

* \eq -> =

* textwrap to 79

* various minor fixes

* proof: fix itemization

* proof: more minor fixes

* proof: timeouts are functions

* proof: fixes to lemma6

* Intro changes and improve title page

* Add Marko and Ming to acks

* add readme

* Format algorithm correctly

Clarify condition semantic and timeouts

Improve descriptions

* patform -> platform

* Ensure that rules are mutually exclusive

- various clarifications and small improvements

* Release v0.6

* small nits for smoother readability
  • Loading branch information
tac0turtle committed Sep 3, 2019
1 parent b362894 commit 87abbf7
Show file tree
Hide file tree
Showing 21 changed files with 1,097 additions and 353 deletions.
28 changes: 5 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,7 @@
# Tendermint-spec
# Tendermint Spec

The repository contains the specification (and the proofs) of the Tendermint
consensus protocol.

## How to install Latex on Mac OS

MacTex is Latex distribution for Mac OS. You can download it [here](http://www.tug.org/mactex/mactex-download.html).

Popular IDE for Latex-based projects is TexStudio. It can be downloaded
[here](https://www.texstudio.org/).

## How to build project

In order to compile the latex files (and write bibliography), execute

`$ pdflatex paper` <br/>
`$ bibtex paper` <br/>
`$ pdflatex paper` <br/>
`$ pdflatex paper` <br/>

The generated file is paper.pdf. You can open it with

`$ open paper.pdf`
This repository contains the paper describing the Tendermint consensus
algorithm, including formal proofs of its safety and liveness.

For the pdf, see the [latest
release](https://github.com/tendermint/spec/releases).
249 changes: 0 additions & 249 deletions consensus.tex

This file was deleted.

File renamed without changes.
File renamed without changes.
25 changes: 25 additions & 0 deletions consensus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Tendermint-spec

The repository contains the specification (and the proofs) of the Tendermint
consensus protocol.

## How to install Latex on Mac OS

MacTex is Latex distribution for Mac OS. You can download it [here](http://www.tug.org/mactex/mactex-download.html).

Popular IDE for Latex-based projects is TexStudio. It can be downloaded
[here](https://www.texstudio.org/).

## How to build project

In order to compile the latex files (and write bibliography), execute

`$ pdflatex paper` <br/>
`$ bibtex paper` <br/>
`$ pdflatex paper` <br/>
`$ pdflatex paper` <br/>

The generated file is paper.pdf. You can open it with

`$ open paper.pdf`

File renamed without changes.
16 changes: 16 additions & 0 deletions consensus/conclusion.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
\section{Conclusion} \label{sec:conclusion}

We have proposed a new Byzantine-fault tolerant consensus algorithm that is the
core of the Tendermint BFT SMR platform. The algorithm is designed for the wide
area network with high number of mutually distrusted nodes that communicate
over gossip based peer-to-peer network. It has only a single mode of execution
and the communication pattern is very similar to the "normal" case of the
state-of-the art PBFT algorithm. The algorithm ensures termination with a novel
mechanism that takes advantage of the gossip based communication between nodes.
The proposed algorithm and the proofs are simple and elegant, and we believe
that this makes it easier to understand and implement correctly.

\section*{Acknowledgment}

We would like to thank Anton Kaliaev, Ismail Khoffi and Dahlia Malkhi for comments on an earlier version of the paper. We also want to thank Marko Vukolic, Ming Chuan Lin, Maria Potop-Butucaru, Sara Tucci, Antonella Del Pozzo and Yackolley Amoussou-Guenou for pointing out the liveness issues
in the previous version of the algorithm. Finally, we want to thank the Tendermint team members and all project contributors for making Tendermint such a great platform.
397 changes: 397 additions & 0 deletions consensus/consensus.tex

Large diffs are not rendered by default.

117 changes: 117 additions & 0 deletions consensus/definitions.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
\section{Definitions} \label{sec:definitions}

\subsection{Model}

We consider a system of processes that communicate by exchanging messages.
Processes can be correct or faulty, where a faulty process can behave in an
arbitrary way, i.e., Byzantine faults. We assume that each process
has some amount of voting power (voting power of a process can be $0$).
Processes in our model are not part of a single administrative domain;
therefore we cannot enforce a direct network connectivity between all
processes. Instead, we assume that each process is connected to a subset of
processes called peers, such that there is an indirect communication channel
between all correct processes. Communication between processes is established
using a gossip protocol \cite{Dem1987:gossip}.

Formally, we model the network communication using the \emph{partially
synchronous system model}~\cite{DLS88:jacm}: in all executions of the system
there is a bound $\Delta$ and an instant GST (Global Stabilization Time) such
that all communication among correct processes after GST is reliable and
$\Delta$-timely, i.e., if a correct process $p$ sends message $m$ at time $t
\ge GST$ to correct process $q$, then $q$ will receive $m$ before $t +
\Delta$\footnote{Note that as we do not assume direct communication channels
among all correct processes, this implies that before the message $m$
reaches $q$, it might pass through a number of correct processes that will
forward the message $m$ using gossip protocol towards $q$.}. Messages among
correct processes can be delayed, dropped or duplicated before GST.
Spoofing/impersonation attacks are assumed to be impossible at all times due to
the use of public-key cryptography. The bound $\Delta$ and GST are system
parameters whose values are not required to be known for the safety of our
algorithm. Termination of the algorithm is guaranteed within a bounded duration
after GST. In practice, the algorithm will work correctly in the slightly
weaker variant of the model where the system alternates between (long enough)
good periods (corresponds to the \emph{after} GST period where system is
reliable and $\Delta$-timely) and bad periods (corresponds to the period
\emph{before} GST during which the system is asynchronous and messages can be
lost), but consideration of the GST model simplifies the discussion.

We assume that process steps (which might include sending and receiving
messages) take zero time. Processes are equipped with clocks so they can
measure local timeouts. All protocol messages are signed, i.e., when a correct
process $q$ receives a signed message $m$ from its peer, the process $q$ can
verify who was the original sender of the message $m$.

The details of the Tendermint gossip protocol will be discussed in a separate
technical report. For the sake of this paper it is sufficient to assume that
messages are being gossiped between processes and the following property holds
(in addition to the partial synchrony network assumptions):

\begin{itemize} \item \emph{Gossip communication:} If a correct process $p$
receives some message $m$ at time $t$, all correct processes will receive
$m$ before $max\{t, GST\} + \Delta$. \end{itemize}


%Messages that are being gossiped are created by the consensus layer. We can
%think about consensus protocol as a content creator, which %defines what
%messages should be disseminated using the gossip protocol. A correct
%process creates the message for dissemination either i) %explicitly, by
%invoking \emph{send} function as part of the consensus protocol or ii)
%implicitly, by receiving a message from some other %process. Note that in
%the case ii) gossiping of messages is implicit, i.e., it happens without
%explicit send clause in the consensus algorithm %whenever a correct
%process receives some messages in the consensus algorithm\footnote{If a
%message is received by a correct process at %the consensus level then it
%is considered valid from the protocol point of view, i.e., it has a
%correct signature, a proper message structure %and a valid height and
%round number.}.

%\item Processes keep resending messages (in case of failures or message loss)
%until all its peers get them. This ensures that every message %sent or
%received by a correct process is eventually received by all correct
%processes.

\subsection{State Machine Replication}

State machine replication (SMR) is a general approach for replicating services
modeled as a deterministic state machine~\cite{Lam78:cacm,Sch90:survey}. The
key idea of this approach is to guarantee that all replicas start in the same
state and then apply requests from clients in the same order, thereby
guaranteeing that the replicas' states will not diverge. Following
Schneider~\cite{Sch90:survey}, we note that the following is key for
implementing a replicated state machine tolerant to (Byzantine) faults:

\begin{itemize} \item \emph{Replica Coordination.} All [non-faulty] replicas
receive and process the same sequence of requests. \end{itemize}

Moreover, as Schneider also notes, this property can be decomposed into two
parts, \emph{Agreement} and \emph{Order}: Agreement requires all (non-faulty)
replicas to receive all requests, and Order requires that the order of received
requests is the same at all replicas.

There is an additional requirement that needs to be ensured by Byzantine
tolerant state machine replication: only requests (called transactions in the
Tendermint terminology) proposed by clients are executed. In Tendermint,
transaction verification is the responsibility of the service that is being
replicated; upon receiving a transaction from the client, the Tendermint
process will ask the service if the request is valid, and only valid requests
will be processed.

\subsection{Consensus} \label{sec:consensus}

Tendermint solves state machine replication by sequentially executing consensus
instances to agree on each block of transactions that are
then executed by the service being replicated. We consider a variant of the
Byzantine consensus problem called Validity Predicate-based Byzantine consensus
that is motivated by blockchain systems~\cite{GLR17:red-belly-bc}. The problem
is defined by an agreement, a termination, and a validity property.

\begin{itemize} \item \emph{Agreement:} No two correct processes decide on
different values. \item \emph{Termination:} All correct processes
eventually decide on a value. \item \emph{Validity:} A decided value
is valid, i.e., it satisfies the predefined predicate denoted
\emph{valid()}. \end{itemize}

This variant of the Byzantine consensus problem has an application-specific
\emph{valid()} predicate to indicate whether a value is valid. In the context
of blockchain systems, for example, a value is not valid if it does not
contain an appropriate hash of the last value (block) added to the blockchain.
17 changes: 15 additions & 2 deletions homodel.sty → consensus/homodel.sty
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,30 @@
\newcommand{\AS}{\mbox{\it AS}}
\newcommand{\SK}{\mbox{\it SK}}
\newcommand{\SHO}{\mbox{\it SHO}}
\newcommand{\AHO}{\mbox{\it AHO}}
\newcommand{\CONS}{\mbox{\it CONS}}
\newcommand{\K}{\mbox{\it K}}

\newcommand{\Alg}{\mathcal{A}}
\newcommand{\Pred}{\mathcal{P}}
\newcommand{\Spr}{S_p^r}
\newcommand{\Tpr}{T_p^r}
\newcommand{\mupr}{\vec{\mu}_p^{\,r}}

\newcommand{\MSpr}{S_p^{\rho}}
\newcommand{\MTpr}{T_p^{\rho}}



\newconstruct{\SEND}{$\Spr$:}{}{\ENDSEND}{}
\newconstruct{\TRAN}{$\Tpr$:}{}{\ENDTRAN}{}
\newconstruct{\ROUND}{\textbf{Round}}{\!\textbf{:}}{\ENDROUND}{}
\newconstruct{\VARIABLES}{\textbf{Variables:}}{}{\ENDVARIABLES}{}
\newconstruct{\INIT}{\textbf{Initialization:}}{}{\ENDINIT}{}

\newconstruct{\MSEND}{$\MSpr$:}{}{\ENDMSEND}{}
\newconstruct{\MTRAN}{$\MTpr$:}{}{\ENDMTRAN}{}

\newconstruct{\SROUND}{\textbf{Selection Round}}{\!\textbf{:}}{\ENDSROUND}{}
\newconstruct{\VROUND}{\textbf{Validation Round}}{\!\textbf{:}}{\ENDVROUND}{}
\newconstruct{\DROUND}{\textbf{Decision Round}}{\!\textbf{:}}{\ENDDROUND}{}
\newconstruct{\VARIABLES}{\textbf{Variables:}}{}{\ENDVARIABLES}{}
\newconstruct{\INIT}{\textbf{Initialization:}}{}{\ENDINIT}{}
138 changes: 138 additions & 0 deletions consensus/intro.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
\section{Introduction} \label{sec:tendermint}

Consensus is a fundamental problem in distributed computing. It
is important because of it's role in State Machine Replication (SMR), a generic
approach for replicating services that can be modeled as a deterministic state
machine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is that
service replicas start in the same initial state, and then execute requests
(also called transactions) in the same order; thereby guaranteeing that
replicas stay in sync with each other. The role of consensus in the SMR
approach is ensuring that all replicas receive transactions in the same order.
Traditionally, deployments of SMR based systems are in data-center settings
(local area network), have a small number of replicas (three to seven) and are
typically part of a single administration domain (e.g., Chubby
\cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as more
general forms of failure (in particular, malicious or Byzantine faults) are
considered to occur with only negligible probability.

The success of cryptocurrencies and blockchain systems in recent years (e.g.,
\cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges on
the design and deployment of SMR based systems: reaching agreement over wide
area network, among large number of nodes (hundreds or thousands) that are not
part of the same administrative domain, and where a subset of nodes can behave
maliciously (Byzantine faults). Furthermore, contrary to the previous
data-center deployments where nodes are fully connected to each other, in
blockchain systems, a node is only connected to a subset of other nodes, so
communication is achieved by gossip-based peer-to-peer protocols.
The new requirements demand designs and algorithms that are not necessarily
present in the classical academic literature on Byzantine fault tolerant
consensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary
focus was different setup.

In this paper we describe a novel Byzantine-fault tolerant consensus algorithm
that is the core of the BFT SMR platform called Tendermint\footnote{The
Tendermint platform is available open source at
https://github.com/tendermint/tendermint.}. The Tendermint platform consists of
a high-performance BFT SMR implementation written in Go, a flexible interface
for
building arbitrary deterministic applications above the consensus, and a suite
of tools for deployment and management.

The Tendermint consensus algorithm is inspired by the PBFT SMR
algorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (the
Algorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermint
proceeds in
rounds\footnote{Tendermint is not presented in the basic round model of
\cite{DLS88:jacm}. Furthermore, we use the term round differently than in
\cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication
steps instead of a single communication step in \cite{DLS88:jacm}.}, where each
round has a dedicated proposer (also called coordinator or
leader) and a process proceeds to a new round as part of normal
processing (not only in case the proposer is faulty or suspected as being faulty
by enough processes as in PBFT).
The communication pattern of each round is very similar to the "normal" case
of PBFT. Therefore, in preferable conditions (correct proposer, timely and
reliable communication between correct processes), Tendermint decides in three
communication steps (the same as PBFT).

The major novelty and contribution of the Tendermint consensus algorithm is a
new termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, the
existing BFT consensus (and SMR) algorithms for the partially synchronous
system model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm},
\cite{MA06:tdsc}) typically relies on the communication pattern illustrated in
Figure~\ref{ch3:fig:coordinator-change} for termination. The
Figure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged during
the proposer change when processes start a new round\footnote{There is no
consistent terminology in the distributed computing terminology on naming
sequence of communication steps that corresponds to a logical unit. It is
sometimes called a round, phase or a view.}. It guarantees that eventually (ie.
after some Global Stabilization Time, GST), there exists a round with a correct
proposer that will bring the system into a univalent configuration.
Intuitively, in a round in which the proposed value is accepted
by all correct processes, and communication between correct processes is
timely and reliable, all correct processes decide.


\begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering
\begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$}
\rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$}
\round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$}
\rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x,
[v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag}
\vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the
new proposer.} \label{ch3:fig:coordinator-change} \end{figure}

To ensure that a proposed value is accepted by all correct
processes\footnote{The proposed value is not blindly accepted by correct
processes in BFT algorithms. A correct process always verifies if the proposed
value is safe to be accepted so that safety properties of consensus are not
violated.}
a proposer will 1) build the global state by receiving messages from other
processes, 2) select the safe value to propose and 3) send the selected value
together with the signed messages
received in the first step to support it. The
value $v_i$ that a correct process sends to the next proposer normally
corresponds to a value the process considers as acceptable for a decision:

\begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is
not the value itself but a set of $2f+1$ signed messages with the same
value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value
itself is being sent. \end{itemize}

In both cases, using this mechanism in our system model (ie. high
number of nodes over gossip based network) would have high communication
complexity that increases with the number of processes: in the first case as
the message sent depends on the total number of processes, and in the second
case as the value (block of transactions) is sent by each process. The set of
messages received in the first step are normally piggybacked on the proposal
message (in the Figure~\ref{ch3:fig:coordinator-change} denoted with
$[v_{1..4}]$) to justify the choice of the selected value $x$. Note that
sending this message also does not scale with the number of processes in the
system.

We designed a novel termination mechanism for Tendermint that better suits the
system model we consider. It does not require additional communication (neither
sending new messages nor piggybacking information on the existing messages) and
it is fully based on the communication pattern that is very similar to the
normal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode of
execution in Tendermint, i.e., there is no separation between the normal and
the recovery mode, which is the case in other PBFT-like protocols (e.g.,
\cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe
this makes Tendermint simpler to understand and implement correctly.

Note that the orthogonal approach for reducing message complexity in order to
improve
scalability and decentralization (number of processes) of BFT consensus
algorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS)
signatures \cite{BLS2001:crypto}) as done for example in SBFT
\cite{Gue2018:sbft}.

The remainder of the paper is as follows: Section~\ref{sec:definitions} defines
the system model and gives the problem definitions. Tendermint
consensus algorithm is presented in Section~\ref{sec:tendermint} and the
proofs are given in Section~\ref{sec:proof}. We conclude in
Section~\ref{sec:conclusion}.




File renamed without changes.
File renamed without changes.

0 comments on commit 87abbf7

Please sign in to comment.