Draft of CAP-0034 (change closetime semantics) (#671)

First draft of CAP-0034, intended to address stellar-protocol issue #622, by changing the semantics of the selection of the closetime when combining candidates in the SCP nomination protocol. The current nomination protocol produces an unsigned ("basic") form of union of the candidates which uses the maximum closetime of all candidates as the closetime to nominate. The protocol proposed by this CAP would produce a signed StellarValue, whose transaction set and upgrades are chosen in the same way as in the current protocol, but whose closetime is precisely the one in the same StellarValue as the best transaction set. (The change to closetime semantics is sufficient by itself to address stellar-protocol issue #622; the change to make the nominated StellarValue signed is an opportunity that we take which the closetime semantics change provides. The signature is on the <txSet, closetime> pair, so by taking the exact closetime from the same StellarValue as the chosen transaction set, we allow ourselves to use the signature from the node that proposed that StellarValue.)
stellar · Jul 9, 2020 · 1a155df · 1a155df
1 parent a0e4114
commit 1a155df
Show file tree

Hide file tree

Showing 2 changed files with 315 additions and 0 deletions.
diff --git a/core/README.md b/core/README.md
@@ -63,6 +63,7 @@
 | [CAP-0029](cap-0029.md) | AllowTrust when not AUTH_REQUIRED | Tomer Weller | Draft |
 | [CAP-0032](cap-0032.md) | Trustline Preauthorization | Jonathan Jove | Draft |
 | [CAP-0033](cap-0033.md) | Sponsored Reserve with EphemeralSponsorshipEntry | Jonathan Jove | FCP |
+| [CAP-0034](cap-0034.md) | Preserve Transaction-Set/Close-Time Affinity During Nomination | Terence Rokop | Draft |
 
 ### Rejected Proposals
 | Number | Title | Author | Status |

diff --git a/core/cap-0034.md b/core/cap-0034.md
@@ -0,0 +1,314 @@
+# CAP-0034: Preserve Transaction-Set/Close-Time Affinity During Nomination
+
+## Preamble
+
+```text
+CAP: To Be Assigned
+Title: Preserve Transaction-Set/Close-Time Affinity During Nomination
+Authors: Terence Rokop, Nicolas Barry
+Status: Draft
+Created: 2020-07-06
+Discussion: https://github.com/stellar/stellar-protocol/issues/622
+Protocol version: TBD
+```
+
+## Simple Summary
+
+Make the nomination protocol select the close time from the single candidate
+whose transaction set it selects (rather than the current protocol's selection
+of the maximum close time of all candidates), and preserve the signature of the
+chosen `<close time, transaction set>` pair.
+
+## Motivation
+
+With the current protocol, the following sequence of events can occur:
+
+- A transaction `T` with an expiration time makes it into a candidate
+`StellarValue`'s transaction set, with its expiration being no later than the
+`closeTime` in that `StellarValue`.
+- That transaction set is nominated, but with a later `closeTime`, as some
+other candidate(s) had a later `closeTime`, and the current protocol selects
+the maximum.
+- The ledger-closing process begins, and the ledger header is updated to reflect
+the newly-chosen `closeTime`.
+- The ledger-closing process reaches the point of applying transactions.
+Each transaction is charged a fee and consumes its sequence number, and is then
+re-validated (and then applied if and only if the re-validation succeeds).
+`T`'s `closeTime` turns out to be before the newly-chosen `closeTime`.  `T`
+therefore returns `txTOO_LATE` without being applied.
+
+So `T` consumes a sequence number (and is charged a fee) but is never applied.
+There is a design pattern in some smart contracts which breaks, allowing
+inconsistent transactions to be committed, if it encounters this race:  a smart
+contract might submit a chain of transactions, with incrementing sequence
+numbers, each intended to be executed only if the previous ones were.  If there
+were transactions that were intended to be constrained by the incrementing
+sequence numbers only to execute if `T` executed, they would do so because `T`
+had consumed its sequence number, even though it was never applied. The
+dependent transactions would then perform operations which the smart contract
+had intended to be performed only if `T` had been executed. This would
+potentially be arbitrarily bad for the smart contract.
+
+A less catastrophic problem, but an inefficiency, is that externalizing
+transaction sets that contain significant numbers of transactions that are
+doomed to return `txTOO_LATE` during ledger close might mean that we're
+choosing sets that we would have considered worse than other available
+candidates if we had taken into account that those transactions were doomed.
+
+More cosmetically, it might also come across as unfair, or at least
+counterintuitive, considering that getting into a transaction set is what a
+client pays a fee for, that the occasional transaction might be charged a fee
+despite being rejected for not having made it into a transaction set in time.
+Although this race is different from the perspective of stellar-core from the
+usual case of `txTOO_LATE` in which the transaction is discovered to have
+expired before being accepted into the transaction set, the race looks the same
+from the client's perspective as any other `txTOO_LATE` -- except that when the
+race is the cause, the client is, inexplicably from its perspective, charged a
+fee.
+
+### Goals Alignment
+
+This CAP is aligned with the following Stellar Network Goals:
+
+- The Stellar Network should be secure and reliable.
+
+- The Stellar Network should run at scale and at low cost to all users.
+
+It also aligns with the following Stellar Network Value:
+
+- The protocol should bias toward simplicity.
+
+## Abstract
+
+Currently, the nomination protocol produces, from a set of candidate
+`StellarValue`s, a nominated `StellarValue` comprising the best transaction
+set (as selected by a deterministic heuristic which favors larger sets) of
+any candidate, the maximum `closeTime` of any candidate, and a set of
+"maximal" ledger `upgrades` from all candidates.
+
+We propose to change the protocol to take the `closeTime` from the same
+candidate `StellarValue` as the best transaction set (the heuristic which
+decides which transaction set is best does not change; nor does the method of
+generating `upgrades`).
+
+In particular, this affects how soon the core notices when transactions expire.
+
+## Specification
+
+There are no changes to any XDR in this CAP.  The treatment of the `closeTime`
+in the `StellarValue` XDR in some code paths changes, as does the use of
+`SIGNED` `StellarValue`s in some places where `BASIC` ones are currently used,
+but the XDR itself does not change.
+
+This CAP proposes to change the externalized-value generation to remove the
+current combining of closetimes and simply select the one from the same
+nominated value as the best transaction set.  (It proposes to preserve the
+selection heuristic for "best transaction set" unchanged.)
+
+This CAP also proposes to make the nomination protocol produce a signed
+value (currently we externalize an unsigned, or "basic", value, because, as a
+combination of candidate values, it may not be equal to any one candidate value
+for which we have a signature). It does not propose any specific use of that
+signature yet, but preserving the signature as part of the same protocol
+change that allows it to be preserved will allow us to use it in the future if
+we discover a way to do so.
+
+As with any protocol change, the new code must remember and maintain
+older protocols' behavior, and continue to use old behavior until after
+the network externalizes the ledger upgrade to the new protocol, through
+consensus.
+
+Once the network has upgraded to the new protocol, the following semantic
+changes occur:
+
+- The selection during the nomination protocol of a `StellarValue` to
+externalize from the set of candidates occurs by choosing the `closeTime`
+from the `StellarValue` which contains the "best" transaction set (according
+to the same heuristic metric that already exists for "best transaction set").
+The signature of that `StellarValue` is preserved, so it remains `SIGNED`.
+The externalized `upgrades` set is still computed as a maximal combination of
+all the candidate upgrade sets, but that does not affect the signature -- we
+were already (even in older protocols) signing only a version of the
+`StellarValue` with the `upgrades` stripped.
+
+- Validation of values during the ballot protocol phase expects to find
+`SIGNED` rather than `BASIC` values, and checks their signatures.
+
+## Design Rationale
+
+The proposal in this CAP is the third that we have considered as a means of
+fixing the problems described in the "Motivation" section.  It has seemed
+from the beginning that the problem could be addressed only through a CAP:
+it is a question of what relationships to maintain among the payment of
+fees, the consumption of sequence numbers, and the validity of transactions,
+that also leads to a question of what value to externalize given a set of
+candidates, and all of those questions must be answered by the Core through
+consensus.
+
+- The first idea was to change the point at which the current ledger header's
+`closeTime` was updated from before transactions are applied to afterwards.
+(Ledger `upgrades` are already performed after transaction applies.)  This
+would prevent an increase in the `closeTime` brought about by candidates other
+than the one whose transaction set was selected from causing transactions to
+fail (it would not occur until after they had been applied).  However, this
+raised the concern that a long delay between ledger closes would lead to
+transactions succeeding despite having expiration times arbitrarily earlier
+than the real time of their committing -- possibly long after clients had
+already assumed that they would expire (and acted on that assumption).
+
+- The second idea was to remove transactions from the best transaction set
+selected by the combine-candidates code in the nomination protocol based on the
+(maximal) `closeTime` just selected by that code. However, that would seem to
+have opened up a new denial-of-service attack on the ledger:
+an attacker could create large numbers of transactions with very high fees,
+making it likely that they would be accepted into transaction sets, but with
+extremely short expiration times, that would almost certainly come before the
+next ledger close.  They could therefore fill up the transaction sets with
+such transactions, yet avoid the transaction fees, leaving the transaction
+sets produced by the nomination protocol with little or no room for legitimate
+transactions.
+
+After rejecting those two possibilities for the reasons given above, we hit
+on the idea in the current proposal of making the nomination protocol simply
+select the `closeTime` from the `StellarValue` containing the best transaction
+set (rather than doing any combining with the `closeTimes` from other
+`StellarValue`s).  A (well-behaved) node that had nominated the ultimately
+winning `StellarValue` would already have left any expired transactions out of
+its transaction set, and those transactions would neither pay fees nor consume
+sequence numbers.
+
+Always choosing a `closeTime` from the same `StellarValue` as the winning
+transaction set raised the possibility, already described in the "Simple
+Summary", of also preserving the _signature_ of that `StellarValue`,
+from the node that nominated it (making it a `SIGNED` rather than a `BASIC`
+`StellarValue`). Experimenting with implementing that possibility raised a
+further design choice: because the `SIGNED` `StellarValue` also contains a set
+of ledger `upgrades`, we could choose additionally to change `upgrades` to come
+from the single winning `StellarValue` rather than being combined in a way
+similar to the `closeTime`s as they are in the current protocol.
+
+Here are some factors we considered before deciding not to propose to change the
+`upgrade` selection while changing the `closeTime` selection.
+
+- Upgrades do happen independently of the transactions: we could view the ledger
+closes as an optimization of alternating transaction set applies and ledger
+upgrades. So we did have a free choice between proposing to change upgrades in
+the same way as we were proposing to change closetime, or not doing so and
+letting them work differently (single-candidate-selection for closetimes and
+txsets, candidate-combination for upgrades).
+
+- There isn't any correlation that we've found among the different upgrades that
+we combine, in the sense that today, if changing any given parameter `A` from
+`1` to `2` and changing any given parameter `B` from `1` to `2` would each be
+sensible individually, then it would also be sensible to make both changes at
+once. But if the type of possible upgrades could expand to one in which certain
+combinations of parameter changes would be insensible, then it might no longer
+make sense just to combine each parameter individually -- the simple union of
+individual changes could produce a result which was inconsistent. In that case,
+the right way to combine them consistently would become ambiguous; there could
+be multiple options of which none would include all candidate parameter changes.
+That would not be inherently wrong -- we have the same situation with
+transaction sets, and we make a heuristic but deterministic choice as to which
+is the best. But it might be some argument for choosing a single nominated
+upgrade in the same way as we already choose a single nominated tx set, and are
+proposing to choose a single nominated closetime, just because choosing the best
+single nominated set of upgrades could be simpler (there might be far fewer
+options to choose among) than choosing the best consistent combination of all
+tx sets.
+
+- The selection of a single `StellarValue` from the set of candidates
+during the nomination protocol reflects the literature on and formal models of
+the Stellar Consensus Protocol better than the current implementation's
+combination of candidates.
+
+- The selection of a single `StellarValue` from the set of candidates
+is the simplest way of ensuring that the combination function, given a
+set of self-consistent candidates, produces a consistent externalized value.
+
+Overall, changing the selection of `upgrades` to match that of `closeTime`
+could be said to make the resulting protocol simpler -- but the CAP change
+itself more complicated.  (We did start with and do some testing of an
+implementation that also changed `upgrades` selection, but in the latest
+implementation we have undone that part and are changing only `closeTime`.)
+
+Regarding potential performance impacts, using `SIGNED` `StellarValue`s in the
+ballot protocol will result in the ballot protocol checking signatures in some
+code paths where there were previously no signatures to check.  However, the
+signatures will be from nodes that had nominated values in the latest round of
+the nomination protocol, so the signature cache might ameliorate any noticeable
+effects of this.
+
+## Backwards Incompatibilities
+
+Semantic changes resulting from this CAP we hope to be limited to the
+correction of the potential inconsistency of smart contract behavior
+described in the "Motivation".  Smart contracts, and other clients, should
+not have _relied_ on that inconsistency, as it requires a race to occur,
+and one which is probably very rare.
+
+## Security Concerns
+
+So far, we have not found new potential issues with safety or denial-of-service
+introduced by the proposal in this CAP, as we did with earlier ideas that we
+then rejected, as described above in the "Design Rationale".
+
+## Test Cases
+
+- The "SCP Driver" test, which, using the current protocol and an older one,
+confirms that a set of candidate `StellarValue`s produces the expected
+combined `closeTime`, changes in three ways:
+  - It tests that old protocols continue to behave the same way, but that
+  in new protocols, the combined `closeTime` is simply that of the
+  `StellarValue` containing the best transaction set, not the maximum
+  `closeTime` of all candidates.
+  - It is enhanced to test for expected `upgrades` as well (but these should
+  behave the same between old and new protocols).
+  - It tests that old protocols produce `BASIC` `StellarValue`s, but new
+  protocols produce `SIGNED` ones.
+- The `PendingEnvelopes recvSCPEnvelope` and `quorum tracker` tests change to
+  expect `BASIC` `StellarValue`s from old protocols (as before this CAP) but
+  `SIGNED` `StellarValue`s from new protocols (as after this CAP).
+- A test must be added to confirm that when a node selects a `StellarValue`
+to nominate, it trims from the selected transaction set precisely those
+transactions which would be expired according to the `closeTime` in that
+`StellarValue`, and no others (in previous protocol versions, it should
+trim those which it estimates are likely to expire by next transaction close).
+- A test must be added to confirm that nominated `StellarValue`s are checked
+for internal consistency between their close times and transaction sets --
+that is, that they contain no transactions which are expired according to
+their own close times (a nominated `StellarValue` that does not meet that
+condition should be rejected by other nodes).
+
+## Implementation
+
+An implementation of this protocol has been written, which as far as the
+author knows is complete and correct in its semantics; it will certainly need
+refactoring, but it might constitute a demonstration of the practicality of
+implementing this proposal, and of a bound on the scope of the changes required,
+as well as a way of experimenting with its consequences:
+
+[Closetime non-whitespace diffs](https://github.com/stellar/stellar-core/compare/master...rokopt:proto-622-closetime?w=1)
+
+The changes therein are limited to stellar-core, and comprise the following:
+
+- The protocol version is bumped (currently anticipated to be to 14).
+- The herder's choice of which transactions to include in the transaction set
+that it is going to nominate to depend, in new protocols only, on the close
+time that it is going to nominate, rather than (as in old protocols) an
+estimate of when the next ledger is likely to close.
+- The herder's validation of `StellarValue`s nominated by other nodes changes
+to expect, in new protocols only, that values used by the ballot protocol
+are `SIGNED`.
+- The herder's validation of `StellarValue`s nominated by other nodes changes
+which `closeTime` to validate the transactions in the `StellarValue` against
+from the last ledger close time to the `closeTime` in the `StellarValue`.
+- The herder's combining of nominated `StellarValue`s into a single value to
+externalize, in new protocols only (it preserves its old behavior in old
+protocols), changes in the following ways:
+  - It chooses a "combined" `closeTime` directly from the `StellarValue`
+  containing the winning transaction set.  In particular, it does not attempt
+  any trimming of the winning transaction set.  (That set should already be
+  consistent with `closeTime`, since it came from one nominated `StellarValue`,
+  which the herder has already validated by the time it combines them.)
+  - It returns `SIGNED` `StellarValue`s.
+- Tests are modified and extended as described above under "Test Cases".