Draft of CAP-0034 (change closetime semantics)

First draft of CAP-0034, intended to address stellar-protocol issue stellar#622, by changing the semantics of the combination of candidates by the SCP ballot protocol from producing an unsigned form of union of the candidates (taking, for example, the maximum closetime of all candidates as the closetime to externalize) to selecting one particular signed, nominated StellarValue (comprising transaction set, closetime, and upgrades) as the one to externalize.
rokopt · Jul 7, 2020 · d415114 · d415114
1 parent 284dd16
commit d415114
Show file tree

Hide file tree

Showing 2 changed files with 296 additions and 0 deletions.
diff --git a/core/README.md b/core/README.md
@@ -63,6 +63,7 @@
 | [CAP-0029](cap-0029.md) | AllowTrust when not AUTH_REQUIRED | Tomer Weller | Draft |
 | [CAP-0032](cap-0032.md) | Trustline Preauthorization | Jonathan Jove | Draft |
 | [CAP-0033](cap-0033.md) | Sponsored Reserve with EphemeralSponsorshipEntry | Jonathan Jove | FCP |
+| [CAP-0034](cap-0034.md) | Single-Candidate Externalized Values | Terence Rokop | Draft |
 
 ### Rejected Proposals
 | Number | Title | Author | Status |

diff --git a/core/cap-0034.md b/core/cap-0034.md
@@ -0,0 +1,295 @@
+## Preamble
+
+```
+CAP: To Be Assigned
+Title: Single-Candidate Externalized Values
+Recommender: Nicolas Barry, Terence Rokop
+Owner: Terence Rokop
+Consulted: Nicolas Barry, Jonathan Jove, David Mazières, Eric Saunders, Leigh McCulloch
+Status: Draft
+Created: 2020-07-06
+Discussion: TBD
+Protocol version: TBD
+Protocol Request: https://github.com/stellar/stellar-protocol/issues/622
+```
+
+## Simple Summary
+To date, the ballot protocol has generated a value to externalize by combining
+all candidates in a deterministic way:  the transaction set comes from a
+"best-transaction-set" heuristic (generally, bigger transaction sets are
+"better"), the closetime is the maximum of all of the candidates' closetimes,
+and the set of ledger upgrades is one ledger upgrade type per type represented
+among all candidates, where multiple upgrades of a given type are combined into
+a single upgrade of that type by a type-specific algorithm (in general --
+though, as it happens, for current upgrade types, all of the algorithms are
+"maximum value", like the closetime combination algorithm).
+
+This CAP proposes to change the externalized-value generation to remove the
+combining and simply select one candidate value from the set of candidates.
+It proposes to keep the selection heuristic for "best transaction set", which
+would result in the closetime and upgrade set being precisely the ones proposed
+in the candidate which contained the "best transaction set".
+
+Because the entire externalized value would therefore be precisely one which
+had been nominated by some particular node, that would also allow us to
+preserve the signature from the nomination, and therefore use a signed
+externalized value (currently we externalize an unsigned, or "basic", value,
+because, as a combination of nominated values, it may not be equal to any one
+nominated value for which we have a signature).  This CAP therefore proposes
+to do that -- it does not propose any specific use of that signature yet, but
+preserving the signature as part of the same protocol change that allows it
+to be preserved will allow us to use it in the future if we discover a way to
+do so.
+
+## Motivation
+With the current protocol, a transaction with an expiration time might make it
+into a candidate `StellarValue`'s transaction set, with its expiration being no
+later than the `closeTime` in that `StellarValue`, but then fail as "too late"
+during ledger close, as the transaction applies during ledger close happen
+after the "current" ledger header, including the `closeTime`, has been updated
+to reflect the externalized `StellarValue`.
+
+When that happens, a transaction is charged a fee, and consumes a sequence
+number, despite having been delayed long enough to exceed its expiration time.
+The fee and sequence number aspects are potentially both problems:
+
+- Transactions submitted by SDKs generally set expiration times so that they
+know when it's safe to re-submit them (with new sequence numbers and perhaps
+higher fees) if they don't seem to be making it into transaction sets with their
+current offered fees (or because of transient network congestion or memory
+constraints on a validator).  It would make more sense if they were only charged
+the fee that they were willing to pay that actually got them accepted into a
+transaction set.
+
+- Probably worse, there is a design pattern in some smart contracts which
+breaks, allowing inconsistent transactions to be committed, if it encounters
+this race:  a smart contract submits two transactions, `A` and `B`, which it
+intends to be mutually exclusive, and which it tries to ensure are mutually
+exclusive by marking `A` as being valid only up to some time `t` (it will fail
+with `txTOO_LATE` if it has not been committed by time `t`) and `B` as being
+valid only after time `t` (it will fail with `txTOO_EARLY` if it is accepted
+into a transaction set before `t`).  If the race that this CAP is intended to
+prevent occurs, as it can in the current protocol, then `A` might consume a
+sequence number but then fail.  But its consumption of the sequence number will
+allow transactions dependent on `A` (which the smart contract chains after `A`)
+to be executed, and if any of them succeeds, then they will perform operations
+which the smart contract intended to be performed only if the operations of `A`
+had been performed.  This race is probably uncommon, because smart contracts
+tend to set expiration times much farther out than SDKs (on the order of
+anywhere from hours to years), but potentially arbitrarily bad for a smart
+contract that did hit it.
+
+Additionally, externalizing transaction sets that contain significant numbers
+of transactions that are doomed to fail with `txTOO_LATE` during ledger close
+might mean that we're choosing sets that we would have considered worse than
+other available candidates if we had taken into account that those transactions
+were doomed.
+
+### Goals Alignment
+* The Stellar Network should be a blockchain that people trust.  Though the
+events that could lead to a smart contract executing inconsistently because of
+the problem that this CAP intends to address might be very rare, they might
+also be of a kind of which even a single event could significantly damage trust.
+* The Stellar Network should be fast and reliable (the stellar-core OKRs contain
+specific targets).  Keeping transactions which are guaranteed to fail later in
+externalized transaction sets is wasteful.
+
+## Abstract
+We change the (deterministic) mechanism by which the ballot protocol computes
+the `StellarValue` to be externalized from producing a value comprising the
+best transaction set (selected by a deterministic heuristic which favors larger
+sets) of any candidate, the maximum `closeTime` of any candidate, and a set of
+"maximal" ledger `upgrades` from all candidates to one which simply produces the
+exact candidate `StellarValue` which contains the best transaction set (the
+heuristic which decides which transaction set is best does not change).
+
+In particular, this affects how soon the core notices when transactions expire,
+and also how soon ledger upgrades are performed (as different proposed upgrades
+from different nodes in the same set of candidate `StellarValues` will no
+longer be performed all at once; only one set will be selected at a time).
+
+## Specification
+
+### XDR
+There are no changes to any XDR in this CAP.  The treatment of the `closeTime`
+and the `upgrades` in the `StellarValue` XDR in some code paths changes, but
+not the XDR itself.
+
+### Semantics
+
+The changes required are limited to stellar-core, and comprise the following:
+
+- The protocol version is bumped (currently anticipated to be to 14).
+- `HerderSCPDriver::validateValueHelper()` changes to expect, in new protocols
+only, that values used by the nomination protocol are `SIGNED`.
+- `HerderSCPDriver::combineCandidates()`, in new protocols only (it preserves
+its old behavior in old protocols), changes in the following ways:
+  - It produces a "combined" `closeTime` and set of `upgrades` directly from
+  the `StellarValue` containing the winning transaction set.  In particular,
+  it does not attempt any trimming of the winning transaction set.  (That set
+  should already be consistent with `closeTime` and `upgrades`, since they
+  all came from one nominated `StellarValue`.)
+  - It returns `SIGNED` `StellarValue`s.
+
+## Design Rationale
+The proposal in this CAP is the third that we have considered as a means of
+fixing the problems described in the "Motivation" section.
+
+- The first idea was to change the point at which the current ledger header's
+`closeTime` was updated from before transactions are applied to afterwards.
+(Ledger `upgrades` are already performed after transaction applies.)  This
+would prevent an increase in the `closeTime` brought about by candidates other
+than the one whose transaction set was selected from causing transactions to
+fail (it would not occur until after they had succeeded).  However, this
+raised the concern that a long delay between ledger closes would lead to
+transactions succeeding despite having expiration times arbitrarily earlier
+than the real time of their committing -- possibly long after clients had
+already assumed that they would expire (and acted on that assumption).
+
+- The second idea was to remove transactions from the best transaction set
+selected by the combine-candidates code in the ballot protocol based on the
+(maximal) `closeTime` just selected by that code. However, that would seem to
+have opened up a new denial-of-service attack on the ledger:
+an attacker could create large numbers of transactions with very high fees,
+making it likely that they would be accepted into transaction sets, but with
+extremely short expiration times, that would almost certainly come before the
+next ledger close.  They could therefore fill up the transaction sets with
+such transactions, yet avoid the transaction fees, leaving the transaction
+sets produced by the ballot protocol with little or no room for legitimate
+transactions.
+
+After rejecting those two possibilities for the reasons given above, we hit
+on the idea in the current proposal of making the ballot protocol simply
+select the `closeTime` from the `StellarValue` containing the best transaction
+set (rather than doing any combining with the `closeTimes` from other
+`StellarValue`s).  A (well-behaved) node that had nominated the ultimately
+winning `StellarValue` would already have left any expired transactions out of
+its transaction set, and those transactions would neither pay fees nor consume
+sequence numbers.
+
+Always choosing a `closeTime` from the same `StellarValue` as the winning
+transaction set raised the possibility, already described in the "Simple
+Summary", of also preserving the _signature_ of that `StellarValue`,
+from the node that nominated it (specifically, returning a `SIGNED` rather
+than `BASIC` `StellarValue` from `HerderSCPDriver::combineCandidates()`).
+Experimenting with implementing that possibility raised a further design choice:
+because the `SIGNED` `StellarValue` also contains a set of ledger `upgrades`,
+we could only start using `SIGNED` `StellarValue`s in the ballot protocol if
+we also changed `upgrades` to come from the single winning `StellarValue`
+rather than being combined in a way similar to the `closeTime`s as they are
+in the current protocol.
+
+Here are some factors we considered before deciding to propose to change the
+`upgrade` selection as well as the `closeTime` selection, and thereby to allow
+(and to propose) preserving `SIGNED` `StellarValue`s in the ballot protocol.
+
+- Upgrades do happen independently of the transactions: we could view the ledger
+closes as an optimization of alternating transaction set applies and ledger
+upgrades. So we did have a free choice between proposing to change upgrades in
+the same way as we were proposing to change closetime, or not doing so and
+letting them work differently (single-candidate-selection for closetimes and
+txsets, candidate-combination for upgrades).
+
+- It's clear that we could gain the new information of `SIGNED` `StellarValues`,
+identifying one particular node which nominated the whole `StellarValue`, only
+if we do change `upgrades` in the same way as `closeTime`s. What isn't clear
+to us yet is how valuable that is (if at all).  If someone were to come up with
+a clearly-valuable way of putting the new information to use, then that would
+provide a push in the direction of changing `upgrades` to match `closeTimes`.
+
+- There isn't any correlation that we've found among the different upgrades that
+we combine, in the sense that today, if changing any given parameter `A` from
+`1` to `2` and changing any given parameter `B` from `1` to `2` would each be
+sensible individually, then it would also be sensible to make both changes at
+once. But if the type of possible upgrades could expand to one in which certain
+combinations of parameter changes would be insensible, then it might no longer
+make sense just to combine each parameter individually -- the simple union of
+individual changes could produce a result which was inconsistent. In that case,
+the right way to combine them consistently would become ambiguous; there could
+be multiple options of which none would include all candidate parameter changes.
+That would not be inherently wrong -- we have the same situation with
+transaction sets, and we make a heuristic but deterministic choice as to which
+is the best. But it might be some argument for choosing a single nominated
+upgrade in the same way as we already choose a single nominated tx set, and are
+proposing to choose a single nominated closetime, just because choosing the best
+single nominated set of upgrades could be simpler (there might be far fewer
+options to choose among) than choosing the best consistent combination of all
+tx sets.
+
+- A risk of not combining upgrades might be that upgrades that some nodes
+proposed, which would be positives from their perspective, and happened to be
+neutral from other nodes' perspectives, and therefore good for the network as a
+whole, might never happen, because the nodes that considered the upgrades
+neutral (and therefore did not propose them) always had their tx sets nominated,
+perhaps because they were much higher-traffic validators. (Open question:  is
+that what would happen, or would the high-traffic validators, upon seeing
+upgrade nominations that they found neutral, start flooding them too on the
+grounds that some nodes' wanting them and the others' not actively not wanting
+them meant that they were a net good for the network as a whole?)
+
+## Protocol Upgrade Transition
+
+### Backwards Incompatibilities
+Semantic changes resulting from this CAP we hope to be limited to the
+correction of the potential inconsistency of smart contract behavior
+described in the "Motivation".  Smart contracts, and other clients, should
+not have _relied_ on that inconsistency, as it requires a race to occur,
+and one which is probably very rare.
+
+### Impacts on layers outside Core
+We do not expect any layers outside of stellar-core to have to make any code
+changes as a result of this CAP.  The principal potentially visible (though
+perhaps not yet seen) effect that we intend is to prevent smart contracts
+structured as described in the "Motivation" section from encountering the race
+also described therein.
+
+### Resource Utilization
+Using `SIGNED` `StellarValue`s in the ballot protocol will result in the ballot
+protocol checking signatures in some code paths where there were previously no
+signatures to check.  However, the signatures will be from nodes that had
+nominated values in the latest round of the nomination protocol, so the
+signature cache might ameliorate any noticeable effects of this.
+
+## Security Concerns
+Two classes of security concerns in particular have come up repeatedly during
+consideration of this proposal:
+
+- Consistency in treatment of transactions across different phases of the
+stellar-core protocol.  The most severe problem that we are attempting to
+address is the result of what smart contracts might at least interpret as an
+inconsistency between the consumption of a sequence number and the rejection
+of a transaction as not having committed soon enough; the problem which caused
+us to reject our first considered solution was what clients might at least
+interpret as an inconsistency between the clock time and the success of a
+transaction (which had been set to expire long before the clock time when the
+client finds out that it has succeeded).
+
+- Denial-of-service attacks on the ledger.  The problem which caused us to
+reject our second considered solution was that it would have allowed attackers
+to spam transactions but avoid fees.
+
+So far, we have not found analogous problems introduced by this third attempt.
+
+## Test Cases
+- The "SCP Driver" test, which, using the current protocol and an older one,
+confirms that a set of candidate `StellarValue`s produces the expected
+combined `closeTime`, changes in three ways:
+  - It tests that old protocols continue to behave the same way, but that
+  in new protocols, the combined `closeTime` is simply that of the
+  `StellarValue` containing the best transaction set, not the maximum
+  `closeTime` of all candidates.
+  - It is enhanced to test for expected `upgrades` as well -- and these, like
+  `closeTime`, behave differently between old and new protocols.
+  - It tests that old protocols produce `BASIC` `StellarValue`s, but new
+  protocols produce `SIGNED` ones.
+- The `PendingEnvelopes recvSCPEnvelope` and `quorum tracker` change to
+  expect `BASIC` `StellarValue`s from old protocols (as before this CAP) but
+  `SIGNED` `StellarValue`s from new protocols (as after this CAP).
+
+## Implementation
+An implementation of this protocol has been written, which as far as the
+author knows is complete and correct in its semantics; it will certainly need
+refactoring, but it might constitute a demonstration of the practicality of
+implementing this proposal, and of a bound on the scope of the changes required,
+as well as a way of experimenting with its consequences:
+    https://github.com/stellar/stellar-core/compare/master...rokopt:proto-622-closetime?w=1