Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail_hard is not working as described #2847

Closed
ximinez opened this issue Feb 5, 2019 · 13 comments
Closed

fail_hard is not working as described #2847

ximinez opened this issue Feb 5, 2019 · 13 comments
Labels
API Change Bug Good First Issue Great issue for a new contributor

Comments

@ximinez
Copy link
Collaborator

ximinez commented Feb 5, 2019

The fail_hard parameter to submit (https://developers.ripple.com/submit.html) is described as

If true, and the transaction fails locally, do not retry or relay the transaction to other servers

This is not the current rippled behavior. Instead, only ter result codes (except for terQUEUED) are not retried or relayed. All other failure codes do retry. This is unlikely to be a problem because those failure codes will not succeed on most retry events, BUT some of them could potentially succeed if the protocol rules change while it's retrying, eg. an amendment goes live.

@ximinez ximinez self-assigned this Feb 5, 2019
@ximinez ximinez added API Change Good First Issue Great issue for a new contributor Bug labels Feb 5, 2019
@ximinez
Copy link
Collaborator Author

ximinez commented Feb 5, 2019

To keep implementation simple for now, continue to treat terQUEUED specially. ie. those transactions will still be relayed, and retried locally if necessary.

@intelliot
Copy link
Collaborator

As a client of rippled, I care about which category the result code falls under:

  1. The transaction has been forwarded (or will be forwarded) to other server(s).
  2. The transaction was NOT forwarded to other servers, and never will be.

In case of (1), I'll watch fully validated ledgers to see if they included my transaction, and if one does, I'll look to see what effect my transaction had on the ledger.

In case of (2), I'll know that my transaction "failed" with finality, so I can feel free to create/submit a new transaction re-using that tx's sequence number.

Currently, I have to assume everything is (1) because even tem txs get held for a few tries. With fail_hard: true, I would like to have an additional field in the submit response that indicates which of the above two categories applies to the transaction. It could be a boolean called hard_failed.

@mDuo13
Copy link
Collaborator

mDuo13 commented Feb 6, 2019

My 2¢:

  • Yeah, terQUEUED should be treated as not a hard failure. All other ter, tem, tef, and tel codes should be hard failures.
  • Maybe another "stronger" option could instruct the server to skip the queue, so that terQUEUED would also be a hard failure.
  • Possibly even tec codes could be treated as hard failures and not relayed (but I guess removing them from the open ledger could be messy?). That way, if you're using your own server, you don't have to destroy XRP for every time you screw up and got tecPATH_DRY because you forgot to include proper Paths or something. You might want to limit this to admins if it's significantly more work than rejecting tel / tef / etc. Or just assume that tec transactions are forwarded and will probably make it into a ledger.
  • temDISABLED should really be more like a terDISABLED because the transaction is (at least somewhat) likely to become valid in the foreseeable future. That said, it doesn't really make sense to keep and retry those unless the amendment in question is nearing the point of becoming enabled. This whole point is probably outside the scope of this issue, but it's related. Obviously we can't guarantee that a tem code is truly final without knowing every future amendment that could ever be invented and enabled, but I'd like to move as close to "tem codes are final" as we can.
    • Crazy-talk: we could totally have a tefDISABLED which means, "The amendment you need for this transaction can't possibly become enabled until after this transaction's LastLedgerSequence has passed." Of course, without knowing how much realtime a flag ledger takes, you would only know this if the LLS is less than 1 flag ledger away. (I'm assuming that the minimum number of ledger versions for an amendment to become enabled is two flag ledgers—one for it to gain a majority, one for it to still have that majority, with two weeks' time in between.)

@ximinez
Copy link
Collaborator Author

ximinez commented Feb 7, 2019

In case of (2), I'll know that my transaction "failed" with finality, so I can feel free to create/submit a new transaction re-using that tx's sequence number.

Just adding a friendly reminder that you can't know 100% that your transaction is in (2), if you don't trust / control the server you're submitting to, or the wire you're submitting over. That aside, you can always attempt to submit a new transaction with the same sequence number. Even if the old transaction is in the local list, the new one may go into the queue or open ledger, which would cause later attempts on the old one to fail with tefPAST_SEQ. As long as you do it before something happens to let that old, "bad" transaction succeed, you're good to go.

Currently, I have to assume everything is (1) because even tem txs get held for a few tries. With fail_hard: true, I would like to have an additional field in the submit response that indicates which of the above two categories applies to the transaction. It could be a boolean called hard_failed.

I would not object to adding such a field, but I'm not sure if it's necessary if we implement things correctly.

  • Yeah, terQUEUED should be treated as not a hard failure. All other ter, tem, tef, and tel codes should be hard failures.

You know... While we're in there making changes, it would be "easy" enough to force those types of failures to never be retried, regardless of the value of fail_hard.

  • Possibly even tec codes could be treated as hard failures and not relayed (but I guess removing them from the open ledger could be messy?). That way, if you're using your own server, you don't have to destroy XRP for every time you screw up and got tecPATH_DRY because you forgot to include proper Paths or something. You might want to limit this to admins if it's significantly more work than rejecting tel / tef / etc. Or just assume that tec transactions are forwarded and will probably make it into a ledger.

I think that's technically possible with flags, so it wouldn't be super difficult to implement. However, it would not be a good idea unless it was limited to admins. On a more public node, we want anybody who can pay a fee to pay the fee to discourage them from experimenting or abusing that node. This merits further discussion. I think this would be a breaking change requiring an amendment, because it changes whether a transaction gets put into the open ledger at all, even if that is only in a local context.

  • temDISABLED should really be more like a terDISABLED because the transaction is (at least somewhat) likely to become valid in the foreseeable future. That said, it doesn't really make sense to keep and retry those unless the amendment in question is nearing the point of becoming enabled. This whole point is probably outside the scope of this issue, but it's related. Obviously we can't guarantee that a tem code is truly final without knowing every future amendment that could ever be invented and enabled, but I'd like to move as close to "tem codes are final" as we can.
  • Crazy-talk: we could totally have a tefDISABLED which means, "The amendment you need for this transaction can't possibly become enabled until after this transaction's LastLedgerSequence has passed." Of course, without knowing how much realtime a flag ledger takes, you would only know this if the LLS is less than 1 flag ledger away. (I'm assuming that the minimum number of ledger versions for an amendment to become enabled is two flag ledgers—one for it to gain a majority, one for it to still have that majority, with two weeks' time in between.)

I agree about temDISABLED being the wrong code, but I disagree that should make any decision based on whether the amendment is about to go active. I think it's fair to have a ruling that if you jump the gun on an amendment, then it's on you to hold on to that transaction and resubmit it yourself when the amendment goes live. Also, on a technical level, making that decision is probably unnecessarily difficult.

@intelliot
Copy link
Collaborator

Just adding a friendly reminder that you can't know 100% that your transaction is in (2), if you don't trust / control the server you're submitting to, or the wire you're submitting over.

That's not a big deal because you always have to use https/wss and trust the server you're using. fail_hard is irrelevant here; if the server is not trusted, then it could lie to you about anything, including transaction results. For example, the server could claim your transaction was finalized as a tec when it was actually tes, tricking you into re-sending money.

You know... While we're in there making changes, it would be "easy" enough to force those types of failures to never be retried, regardless of the value of fail_hard.

That sounds fine.

  • Possibly even tec codes could be treated as hard failures and not relayed (but I guess removing them from the open ledger could be messy?). That way, if you're using your own server, you don't have to destroy XRP for every time you screw up and got tecPATH_DRY because you forgot to include proper Paths or something. You might want to limit this to admins if it's significantly more work than rejecting tel / tef / etc. Or just assume that tec transactions are forwarded and will probably make it into a ledger.

I like this idea. For admins, with fail_hard: true, I would like tec to be a hard failure and not relayed. Basically treating/converting them to a tem (or maybe tef).

I think this would be a breaking change requiring an amendment, because it changes whether a transaction gets put into the open ledger at all, even if that is only in a local context.

No, I don't think this should require an amendment because it should only affect the local context. We can avoid a breaking change by using an API version parameter, called e.g. api_version and using the new behavior for api_version: 2.

I think it's fair to have a ruling that if you jump the gun on an amendment, then it's on you to hold on to that transaction and resubmit it yourself when the amendment goes live.

Agreed.

@JoelKatz
Copy link
Collaborator

JoelKatz commented Feb 8, 2019 via email

@intelliot
Copy link
Collaborator

I would like to emphasize:

It would definitely be nice to have a flag in the response that told you whether or not the server promises not to continue trying to process the transaction.

That would be very helpful.

@sublimator
Copy link
Contributor

sublimator commented Aug 22, 2019 via email

@intelliot
Copy link
Collaborator

I do not think FirstLedgerSequence is relevant here

@sublimator
Copy link
Contributor

sublimator commented Aug 22, 2019 via email

@intelliot
Copy link
Collaborator

When we fix fail_hard, I think we should make it so that tec codes are treated as hard failures and not relayed.

However, it would not be a good idea unless it was limited to admins. On a more public node, we want anybody who can pay a fee to pay the fee to discourage them from experimenting or abusing that node. This merits further discussion.

I understand the reasoning here, as it's an anti-abuse mechanism and theoretically cuts down on spam by claiming transaction fees and destroying XRP. But in practice, I don't think this is is a valuable defense. If a node can be abused by sending invalid transactions to it, then we have bigger problems. This should be solved with rate limiting instead.

After all, there is already processing involved for transactions that end up with tem or tef, and users aren't charged XRP for those. I'm not sure what the existing rate limits are, but we should make sure they are restrictive enough that it's fine for users to submit transactions that ultimately tec without harming the node.

I think this would be a breaking change requiring an amendment, because it changes whether a transaction gets put into the open ledger at all, even if that is only in a local context.

I do not think it should require an amendment since it only affects the local context, and only when fail_hard is set. In general, there will still be tec transactions in the open ledger and in validated ledgers, since not every transaction will be submitted with the fail_hard option.

@JoelKatz
Copy link
Collaborator

JoelKatz commented Sep 16, 2019 via email

@MarkusTeufelberger
Copy link
Collaborator

Well, servers don't provide proof trees so far, so nobody wrote clients that would understand them.

@intelliot intelliot added this to the 2019-10 milestone Oct 15, 2019
undertome added a commit to undertome/rippled that referenced this issue Nov 15, 2019
FIXES: XRPLF#2847

* Transactions that are submitted with the fail_hard flag
  and that result in any TER code besides tesSUCCESS shall
  not be queued or held.
undertome added a commit to undertome/rippled that referenced this issue Nov 15, 2019
FIXES: XRPLF#2847

* Transactions that are submitted with the fail_hard flag
  and that result in any TER code besides tesSUCCESS shall
  be neither queued nor held.
undertome added a commit to undertome/rippled that referenced this issue Jan 9, 2020
FIXES: XRPLF#2847

* Transactions that are submitted with the fail_hard flag
  and that result in any TER code besides tesSUCCESS shall
  be neither queued nor held.
cjcobb23 pushed a commit to cjcobb23/rippled that referenced this issue Jan 10, 2020
FIXES: XRPLF#2847

* Transactions that are submitted with the fail_hard flag
  and that result in any TER code besides tesSUCCESS shall
  be neither queued nor held.

[FOLD] Keep tec results out of the open ledger when fail_hard:

* Improve TransactionStatus const correctness, and remove redundant
  `local` check
* Check open ledger tx count in fail_hard tests
* Fix some wrapping
* Remove duplicate test
@ximinez ximinez removed their assignment Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Change Bug Good First Issue Great issue for a new contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants