New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocol Request: change how the close time is used when applying transactions #622
Comments
Expanding on this a bit, on why this is annoying. For regular transactions submitted by SDKs that set expiration time to make it easier to know when they can safely re-submit (with a potentially new sequence number) a transaction. This is trying to guard people against clients running in parallel and issues caused by mem pool & network congestion/surge pricing. With this bug, they're charged a transaction fee for no good reason. The larger issue is with smart contracts that today are written as chains of pre-signed transaction, some have "branches" with 2 mutually exclusive "paths" based on time. For example, one of the transaction is only valid before a certain time (let's call it With this bug in the protocol, it's possible for a transaction to be accepted in the candidate tx set even though it will fail during consensus because a future timestamp is used when executing that transaction. So basically Running into this in the context of smart contracts is not super likely as the timescale for expiration for those contracts are in the order of hours or days (or even years); but nevertheless, we should close the possibility of failure. |
Actually there might be a different way to fix this: Right now this function is a no-op; only downside with this is that when we change the transaction set here, that means that we introduce a new value on the network - that said, as it's a deterministic function so nodes that are in sync should actually construct the new transaction set without having to rely on peers. |
Actually this last solution (change the way we combine candidates) doesn't work: if we were to remove transactions at that stage it would imply that somebody spamming the network with "close to expiration" transactions would not pay any transaction fees; worst they could spam with "very large fees" and take over ledgers entirely (only to see those transactions removed for the ballot protocol) so the network would happily close empty ledgers. |
possible alternative: use the ct associated with the txset (instead of max) and punt the requirement on the txset before they get into the candidate set. |
I've started experimenting this, and testing has brought up a question. Nicolas pointed out that if we just take the best txset, with its own specific closetime, rather than combining txsets, and (by taking the maximum of all candidates) their closetimes, then we will have an answer to the question "who proposed this chosen txset?" (currently there is no answer to that question because a txset, in particular its closetime, may be the result of combining several candidates, and not equal to any one specific candidate proposed by any given node), which in turn would allow us to pass around a However, we're currently combining not only the candidates' closetimes (by taking the maximum) to produce the externalized value, but also all the candidates' upgrades (again, by taking various maxima, such as of proposed ledger version and maximum txset size). If we continue to do that, then we still won't be able to identify an externalized StellarValue as having been proposed by a specific node, and therefore still won't be able to sign it, because the upgrades might constitute some combination of the candidates' upgrades that doesn't equal any one candidate's. I have only a tiny amount of context so far with which to consider the question: should we also change the selection of upgrades to be that of the StellarValue containing the best txset, or should we continue to combine upgrades while changing only the closetime to be that of the StellarValue containing the best txset? |
Some musings:
I'm going to start the code and the CAP by proposing to change upgrades as well as closetimes/tx-sets to come from one single specific nomination; I like the idea of being able to identify the unique proposer of a specific ledger, and I like the sound of making the means of choosing a value to externalize consistent across all components of the ledger (closetime, txset, upgrades). But I am extremely far from being a protocol expert, so I would easily be swayed by an argument to the contrary by such an expert. |
First draft of CAP-0034, intended to address stellar-protocol issue stellar#622, by changing the semantics of the combination of candidates by the SCP ballot protocol from producing an unsigned form of union of the candidates (taking, for example, the maximum closetime of all candidates as the closetime to externalize) to selecting one particular signed, nominated StellarValue (comprising transaction set, closetime, and upgrades) as the one to externalize.
First draft of CAP-0034, intended to address stellar-protocol issue stellar#622, by changing the semantics of the combination of candidates by the SCP ballot protocol from producing an unsigned form of union of the candidates (taking, for example, the maximum closetime of all candidates as the closetime to externalize) to selecting one particular signed, nominated StellarValue (comprising transaction set, closetime, and upgrades) as the one to externalize.
First draft of CAP-0034, intended to address stellar-protocol issue stellar#622, by changing the semantics of the combination of candidates by the SCP ballot protocol from producing an unsigned form of union of the candidates (taking, for example, the maximum closetime of all candidates as the closetime to externalize) to selecting one particular signed, nominated StellarValue (comprising transaction set, closetime, and upgrades) as the one to externalize.
See the draft PR for a CAP and the draft implementation. |
Is there a reason that this is not also a problem when a transaction consumes a sequence number but fails for some other reason besides |
This sounds almost tautological, but any good smart contract should be designed such that transactions (Tc) contingent on the success of another transaction (T) are not executable unless the first transaction succeeds. There are a few ways that this can be achieved. Some examples:
The second technique is very powerful, because it allows conditionality on whether T succeeds or fails. If T succeeds, then the sequence number gap is bridged. If T fails, then the next sequence number becomes available. This can permit two different branches of execution (for example, perhaps some kind of cooperative recovery in the event of failure). The issue with |
First draft of CAP-0034, intended to address stellar-protocol issue stellar#622, by changing the semantics of the combination of candidates by the SCP ballot protocol from producing an unsigned form of union of the candidates (taking, for example, the maximum closetime of all candidates as the closetime to externalize) to selecting one particular signed, nominated StellarValue (comprising transaction set, closetime, and upgrades) as the one to externalize.
Thanks -- I think I might be closer to getting this now. But possibly not all the way -- let me see if I can state this in words of my own, and find out whether they're sensible or not. I can see why you call that second approach powerful -- it does sound rigorous to me. I guess that the What I'm not sure I understand is how you can ever use the first approach, even without time bounds. Is the point that the |
First draft of CAP-0034, intended to address stellar-protocol issue stellar#622, by changing the semantics of the selection of the closetime when combining candidates in the SCP nomination protocol. The current nomination protocol produces an unsigned ("basic") form of union of the candidates which uses the maximum closetime of all candidates as the closetime to nominate. The protocol proposed by this CAP would produce a signed StellarValue, whose transaction set and upgrades are chosen in the same way as in the current protocol, but whose closetime is precisely the one in the same StellarValue as the best transaction set. (The change to closetime semantics is sufficient by itself to address stellar-protocol issue stellar#622; the change to make the nominated StellarValue signed is an opportunity that we take which the closetime semantics change provides. The signature is on the <txSet, closetime> pair, so by taking the exact closetime from the same StellarValue as the chosen transaction set, we allow ourselves to use the signature from the node that proposed that StellarValue.)
Indeed, The gist of what you wrote is correct, but it's not how I would state it. I would flip the statement around. It is possible to use operations that can be guaranteed not to fail due to established preconditions (and the required preconditions might be the empty set), and the only exception to that is failures caused by
The first two are kind of silly, but the third is a real example where you can guarantee the preconditions for a payment. Many situations will be simplified with CAP-23. |
First draft of CAP-0034, intended to address stellar-protocol issue #622, by changing the semantics of the selection of the closetime when combining candidates in the SCP nomination protocol. The current nomination protocol produces an unsigned ("basic") form of union of the candidates which uses the maximum closetime of all candidates as the closetime to nominate. The protocol proposed by this CAP would produce a signed StellarValue, whose transaction set and upgrades are chosen in the same way as in the current protocol, but whose closetime is precisely the one in the same StellarValue as the best transaction set. (The change to closetime semantics is sufficient by itself to address stellar-protocol issue #622; the change to make the nominated StellarValue signed is an opportunity that we take which the closetime semantics change provides. The signature is on the <txSet, closetime> pair, so by taking the exact closetime from the same StellarValue as the chosen transaction set, we allow ourselves to use the signature from the node that proposed that StellarValue.)
Adding a few comments from the CAP.
nit: use a numbered list as it makes it a lot easier to reference
I mentioned this in previous conversations. There is a missing bullet here (possible thanks to the previous bullet) that describes how we change the validity criteria for a In fact you're actually already referring to this in the "design rationale", so you just forgot to describe it.
That section should only contain information that supports the existing approach and avoid describing what it's not. In particular:
If you feel that this is adding value you can still include something like this in the CAP (in an appendix/separate sections) - but written in such a way as to support the existing approach (similarly to how other CAPs answer questions like "Why are Signer Sponsoring IDs Stored Separately from Signers?" for CAP-0033). In this particular case, we would be justifying why we don't want to change the behavior later in consensus (either by trimming transactions at the end of nomination or by changing the behavior when externalizing).
I am not sure I understand what this means : the composite function described in the white paper doesn't say anything about this; and we do not propagate any one of the values from nomination into the ballot protocol either.
Same thing, I do not understand what this means.
Based on today's protocol meeting, I think it would be worth expanding on practical implications to contrast the conditions necessary for a transaction to be included (only to fail later during consensus) compared to what happens with the proposed behavior. In particular, I think it can be useful to list out all the things that need to happen "just right" in order to have a practical difference, and put some commentary on how this highlights broken/very precarious situations: |
I've opened up PR #679 , a draft pull request for commentary on the in-progress second draft of the document (the first draft having been merged in PR #671 ). I'll reply to Nicolas's comments here, and commit changes to that branch, and we'll have the option of making further comments in the PR as well. |
I'm not sure whether you meant that you'd like all lists to be numbered, or whether it's just "Semantics" specifically that would benefit from being able to be referred to by number -- for now I've only changed that one, but if you meant the former, I'll change all the lists.
Yes, it's in the rationale -- and also the code! -- but I forgot it in the "Semantics". I've added it now.
Okay, I've reorganized that section to begin with the explanations of what we are proposing, and I've moved the descriptions of rejected ideas to sub-sections at the end of the "Rationale" (like appendixes to the Rationale section). Let me know if that reads well now.
You're right to not understand it; those were me not understanding how the paper and the code corresponded. Having gone back to re-read that part of the paper after having been in the code for a bit, I see that the paper is much more flexible than I remembered it as having been, admitting an arbitrary (pure) compositing function in the nominating phase (which we take advantage of; it's
I'll work on that. I also plan to add one or two tables to try to illustrate the proposed semantic changes in what I hope will be a clearer way (rows being places in the code whose semantics change, with two columns, for old and new behavior -- a sort of document version of a State pattern). |
By the way, it occurs to me that I don't understand why we weren't already validating this. It's true that before my changes, we didn't know exactly what the |
After more thought, that gives me an idea for a whole new proposal for fixing This new proposal combines ideas from three sources, with a couple slight
(It makes no use of the idea from the v1 proposal in the description of this Here are the new set of changes I propose (contingent, again, upon the quoted
|
Here are some of the things that I like about this v4 proposal:
|
Sorry I am not really following. On the quoted paragraph, right now we use
nit: it's not "externalized" (that's what happens at the end of the ballot protocol), it's just the current best candidate. I am not sure I understand what this means. Are you saying that you'd still want to remove transactions from within the As for Of note (not super important, but helps to understand why the solution has to work holistically) : |
Yes.
Right, sorry, I keep doing that. I meant "nominated".
Yes.
That bias would exist, yes, but I don't see how it's a worst case -- specifically, it seems to me that it's strictly better than what we have today. It can't be the worst case -- today's is worse! Here's what I mean: that bias will indeed favor sets with the larger So, yes, I would very much like to have a bias in favor of transactions that have a chance of being applied over transactions that are doomed to fail and (if they're part of smart contracts that behave as you described) potentially trigger catastrophic bugs. :) Therefore, I argue that introducing that bias represents a strict improvement over what we do today. You might argue that it's not as much of an improvement as v3 of the CAP would make in some respects, and I would agree, but I'll get to more detail on that later. I want to note that when you say "I guess a slight variation of it", I think you are acknowledging (and I think it is true) that this v4 proposal is not actually vulnerable to the same spamming issue that you linked to, which is the one that doomed the v2 proposal. The reason is that the additional change of validating against the same
No, that's not just for completeness. I'm specifying the v4 proposal with respect to the current master branch, not with respect to the v3 proposal. The v4 proposal shares code change #2 with v3, and it adds code change #1 with respect to v3, but it also lacks one of the code changes in v3, namely the change in combined
Yes, understood and agreed.
Yes, also agreed. This is a good point to return to my earlier discussion that I paused with "not as much of an improvement as v3 of the CAP would make in some respects". The "maximum The v3 proposal, unlike my v4 proposal, does make an attempt to solve the "maximum Here's what I mean, in case it's not already clear. The v4 proposal is an attempt to fix the And this makes v3 technically safer and cognitively less of a load, and therefore less prone to resistance, because it is or is very close to, as far as I can tell, a strict improvement over the current protocol's behavior. In other words, I think it's close to demonstrable that there's no sequence of events that v4 could be reasonably suggested to make worse. Here's the demonstration, expressed in terms of the two code changes I propose. What I called "#3" above was actually just a note that the code didn't need to change because Siddharth has already done what would otherwise have been necessary. So that one can be ignored in this argument. Also, with apologies, I'm going to reverse the order of #1 and #2, because I realize now that in my argument, the "pure-improvement" quality of what I called #2 above does not depend on #1 having been made, but the "pure-improvement" quality of what I called #1 above does depend on #2 having been made. So I should have flipped them in the first place.
To me, that suggests the following plan:
Then we could fight those two battles separately, and have separably- And if we can win that smaller battle, then we can build upon it a Am I making sense? :) |
I was talking about worst than solution 3 as your comment is a counter to proposal 3.
kind, of: the reason solution 2 was bad was that we could end up with "empty ledgers" (or close to) instead of failed transactions. This achieves the same result. Basically the network does less (potentially close to no) work instead of doing "bad" work like processing failed transactions. The other bad property from 2 (that is preserved in this proposal 4) is that we introduce new values to the network, which may have to be propagated (as the set of accumulated values may not be the same on all nodes until convergence of the nomination protocol), which potentially increases latency quite a bit. Stepping back though: it sounds like you've been looking into solution 4 with the assumption that the controversial bit in this CAP is the choice of close time, it's not. I suggest we sync offline and post back here with the conclusion. |
A few comments on terminology and correctness
Terminology is slightly wrong here. "Nominated" implies confirmed nominated. Here, we're talking about transactions to include in a nominate message at the very beginning of a consensus round (so, we haven't confirmed anything yet). It would be clearer to say "Which transactions to include (not to trim) in StellarValue a node votes to nominate?"
Let's use language consistent with the whitepaper, there is a definition for "compositing candidates into single nominee" - composite value. So you can simply say "Which close time to use in a composite value?"
Having both bullets is redundant here. You are producing a composite value to feed into the ballot protocol, so the answers to 5 and 6 will always be the same. I recommend combining these into a single question: "Expect BASIC or SIGNED StellarValue in a composite value to be used in the ballot protocol"?
This is unrelated to Byzantine fault tolerance, which is needed to detect if some nodes failed based on messages received from other nodes. Nominating a value with maximum close time is valid according to the protocol. Instead, this proposal helps mitigate a specific attack you describe later. |
Thanks -- I've tried to incorporate these in an in-progress third draft, in this commit: rokopt@99e2b0e . |
A few more things, as I spent more time with the implementation and was referring to the CAP for details:
This sounds like the proposal eliminates this attack completely, but I don't think it does. If a transaction set introduced by the misbehaving node ends up being the composite value, then you end up with the same problem. So while this proposal makes it harder to execute the attack, it still seems possible to me.
I had a hard time parsing this sentence, and would really appreciated it if you re-flowed it into multiple simpler sentences A few more terminology nits (which I think are important, nevertheless):
|
See this conversation for exploration of the semantics of |
I've attempted to correct the problems Marta pointed out in her second round of review in this commit: rokopt@4e9feae |
The CAP has also been updated to reflect the latest implementation changes; see PR #691 . |
This commit implements the protocol change documented in CAP-0034: https://github.com/stellar/stellar-protocol/blob/master/core/cap-0034.md (It does not bump the protocol version, because the version has already been bumped since the last stellar-core release.) That proposal came from the discussion of stellar-protocol issue stellar#622: stellar/stellar-protocol#622 The CAP enumerates all the semantic changes in this commit. Here are a couple of implementation notes too specific to the code to be mentioned in the CAP: - The interfaces that had an "upperBoundCloseTimeOffset" added by PR 2608 have now also been extended with "lowerBoundCloseTimeOffset"s, which analogously affect the conditions under which isTooEarly() returns true, allowing the new CAP-34 code to validate minTime as well as maxTime (the latter uses "upperBoundCloseTimeOffset") against the next close time (rather than the last ledger close time). - testSCPDriver() has been not only extended to test new behaviors, but also refactored to allow more test cases to be added by adding more test case parameters to a vector.
This commit implements the protocol change documented in CAP-0034: https://github.com/stellar/stellar-protocol/blob/master/core/cap-0034.md (It does not bump the protocol version, because the version has already been bumped since the last stellar-core release.) That proposal came from the discussion of stellar-protocol issue stellar#622: stellar/stellar-protocol#622 The CAP enumerates all the semantic changes in this commit. Here are a couple of implementation notes too specific to the code to be mentioned in the CAP: - The interfaces that had an "upperBoundCloseTimeOffset" added by PR 2608 have now also been extended with "lowerBoundCloseTimeOffset"s, which analogously affect the conditions under which isTooEarly() returns true, allowing the new CAP-34 code to validate minTime as well as maxTime (the latter uses "upperBoundCloseTimeOffset") against the next close time (rather than the last ledger close time). - testSCPDriver() has been not only extended to test new behaviors, but also refactored to allow more test cases to be added by adding more test case parameters to a vector.
Implemented in stellar/stellar-core#2625 |
What problem does your feature solve?
Right now when applying transactions, we use the close time decided during consensus, which is a different close time than what was used to validate transactions. This creates the possibility that a transaction gets accepted in a ledger only to fail later. This can break certain smart contracts if transactions are submitted at exactly the wrong time.
What would you like to see?
close time should be set for the ledger after transactions get applied, in the same way the protocol version is updated after.
What alternatives are there?
we could try to move the checks upstream, but this may be challenging as "combine candidates" (SCP) can modify the transaction set and close time independently.
The text was updated successfully, but these errors were encountered: