From b6653c3d7bcd5476a5eec35282e1fd348e1a074a Mon Sep 17 00:00:00 2001 From: Victor Grishchenko Date: Sat, 9 Jul 2016 10:34:26 +0300 Subject: [PATCH] orders --- SUMMARY.md | 14 +++++++++----- matrix.md | 6 +++--- order.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 65 insertions(+), 8 deletions(-) create mode 100644 order.md diff --git a/SUMMARY.md b/SUMMARY.md index 242c095..8b28fa7 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -1,11 +1,15 @@ # Table of Contents * [Table of Contents](SUMMARY.md) - this document -* [Base64x64 numbers](64x64.md) - our sacred serialization format -* [Stamps](stamp.md) - event/object ids for a distributed system -* [Specifiers](spec.md) - compound event... descriptors -* [Operations](op.md) - immutable ops are Swarm's blood cells -* [Replicas](replica.md) - database replicas, full and partial +* [Introduction](README.md) - what is the Swarm protocol +* Data replication model + * [Replicas](replica.md) - database replicas, full and partial + * [Order](order.md) - op order (partial, local linear) +* Protocol primitives + * [Base64x64 numbers](64x64.md) - our sacred serialization format + * [Stamps](stamp.md) - event/object ids for a distributed system + * [Specifiers](spec.md) - compound event... descriptors + * [Operations](op.md) - immutable ops are Swarm's blood cells * [Handshakes](handshake.md) - how sync sessions start and end * [Peer-to-peer handshakes](peer_handshake.md) - for full database replicas * [Client handshakes](client_handshake.md) - for clients, to connect to a database diff --git a/matrix.md b/matrix.md index 4b70b71..659c524 100644 --- a/matrix.md +++ b/matrix.md @@ -7,7 +7,7 @@ There is no way to alter or censor the op stream in transit. Still, there is another group of attacks, most notably the famous double-spending attack, that depend on the attacker's ability to broadcast different versions of reality to different peers, i.e. to *lie*. Once the attacker sends out contradictory ops, that creates a swarm split-brain as on the picture `(I)`. -If the swarm is physically permanently separated, the attacker (`A`) can lie to both parts of the network (`O`, `P` peers) regarding its own actions. +If the swarm is physically permanently separated, the attacker `A` can lie to both parts of the network (`O`, `P` peers) regarding its own actions. Note that the attacker can not misrepresent or censor other peer's actions, as those are signed and entangled. Once `P` peers entangle `A`'s lies into their op streams, `A` can no longer relay `P`'s ops to the `O` side, because they are entangled to his own `P`-side lies. Similarly, it can no longer relay `O` ops to the `P` side as they get entangled with `O`-side lies. @@ -23,9 +23,9 @@ Both `O` and `P` peers see the other side going offline. Suppose, the attacker does not control the bottleneck link, like on picture `(II)`. Then, the split-brain becomes transitory. The lie will be detected as soon as both versions of `A`'s actions are known to all peers. -In the general case, that should happen at the [RTT timescale][rtt]). +In the general case, that should happen at the [RTT timescale][rtt]. For example, `R` peers will get the `R`-side lie first, `Q`-side lie second. -The lie will be seen as a *fork* of the `A`'s [*home* op log](crypto.md): a certain op will be followed by two distinct versions of the consequent op. +The lie will be seen as a *fork* of the `A`'s [*home op log*](crypto.md): a certain op will be followed by two distinct versions of the consequent op. So, the options for the attacking peer are quite limited. Still, there is a window of opportunity for the duration of the split-brain. diff --git a/order.md b/order.md new file mode 100644 index 0000000..681f158 --- /dev/null +++ b/order.md @@ -0,0 +1,53 @@ +## Storage and relay orders + +At its base, Swarm is a log replication protocol. +It relays immutable ops while preserving the causal order. +Formally, the protocol's guarantees are: + +* every op is delivered to every peer replica, +* it is delivered exactly once and +* in accordance with the [happened-before order][morelamport]. + +The general op relay rule is that all new ops are stored first, relayed second, in the same order as they were received. +In case two replicas were temporarily disconnected, on reconnection the ops must be replayed in the same order (replay order is the relay order). +Only concurrent ops can go in different orders at different replicas. + +Swarm guarantees are not that much different from TCP guarantees: exactly-once in-order delivery, in a certain scope. +Of course, Swarm's scope is higher in the stack than TCP's. +From the practical standpoint, 80% of the protocol's effort goes into gluing together segments of continuous TCP-like transmissions and continuous log-structured append-only storage writes. +The objective is to give it all an appearance of a single continuous session. That way, Swarm implements the abstraction of a distributed partially-ordered op log. + +Swarm is partially-ordered, so there is no single total op order. +For a given database, op orders vary at different replicas. +Still, there are some useful and important linear orders too. + +*Replica order* is the order of [ops](op.md) generated by a single *origin* [replica](replica.md). A replica is considered a sequential process, its op ids are monotonous (later ops have higher ids). This order is global (same-origin ops go in exactly the same order at every replica). + +*Home peer order* is the order of [client](replica.md) ops as they arrive on their [home peer](replica.md). Essentially, a home peer is used to create a de-facto linear order for all the ops its clients generate. The peer's own ops belong to that order too. This order is consistent at every replica, except client replicas themselves (well, clients don't get the full op log anyway). + +*Arrival order* is the de-facto order of ops as they arrive on a certain replica. When peers sync, they talk in terms of each other's arrival orders. This is the most variable order of all: it is replica-specific. Another term for this is [*delivery order*][crdt]. It was also addressed earlier as relay and replay order -- they are all the same. A single Swarm replica can be seen as an arrival-order op log. + +A Swarm network is peers connected by an arbitrary graph of peering connections. +Clients only connect to their respective home peers. +Peer connections should form a connected graph, at least most of the time. +That graph is not necessarily a tree. +Hence, there is some redundancy in op relay. +Normally, a peer gets every new op from every of its connected peers. +For example, the next picture shows three peers connected in a ring (`a-b-c-a`) and an op propagation diagram for a new op created by `a`. +Every peer receives the op twice, relays once. + + a---b a b c a abc peers + \ / | _ _ op stored + c | \_ \ / op relayed + | / \_ + time V / \ + +Even peer's own ops are echoed back by its connected peers; that serves as an acknowledgement. +Hence, there is an incentive not to make the graph too dense. +That redundancy does not affect the client side: as a client is only connected to its home peer, it gets every operation once. + +Op delivery guarantees can be further hardened by [crypto](crypto.md). +[Cryptographic entanglement](matrix.md) ensures that no op was corrupted, omitted or injected in transit; it further allows all peers to cross-sign all the data and to ensure that every peer sees exactly the same data. + +[morelamport]: http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf +[crdt]: http://hal.upmc.fr/inria-00555588/document