New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovering from a restart/crash where an onchain transaction is in the mempool but not yet mined #2933

Closed
LefterisJP opened this Issue Oct 30, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@LefterisJP
Collaborator

LefterisJP commented Oct 30, 2018

Problem Definition

We have seen many cases lately where people's nodes get a TransactionUnderpriced error or similar at restart. This can probably happen when we have an on-chain transaction having been sent, added to the mempool but not yet mined and then we restart or the node crashes and we restart.

It does not seem that we are able to recover from this at the moment as at recovering from a restart we will process all pending transactions which will essentially re-process the pending ContractSendXXX event and resend the on-chain transaction leading to the nonce reuse or transaction underpriced error.

Task

We need to be able to handle this scenario as with the mainnet transactions taking too long to get mined this is really easy to hit.

How can we achieve this?

We should keep the transaction hash of all in flight chain transactions somewhere. Probably either in or close to the pending transactions of the ChainState. So whenever we send a transaction to the proxy the chainstate should already have it inside the pending transactions and we can add the pending transaction hash there.

Then at restarting of the node when we process the pending
transactions we can actually get the transaction hash of each pending transaction. Then for each of them call getTransactionByHash() and if this returns None continue by forwarding it to the raiden event handler for processing. But if on the other hand it returns a transaction then we know that the transaction is still pending to be mined so we don't do anything.

Notes:

  • It will take a bit of work to make the chainstate accessible from inside the proxy. Maybe a better/cleaner approach exists to achieve the same results. Open to suggestions.

  • Another facet of this issue can also be seen at #2801 where the same idea is taken even further by also thinking on canceling or replacing pending on-chain transactions.

@hackaugusto

This comment has been minimized.

Collaborator

hackaugusto commented Oct 30, 2018

We have seen many cases lately where people's nodes get a TransactionUnderpriced error or similar at restart

I have three ideas why this could happen:

  • There is a pending transaction with the same nonce in the pool. Either:
    • The same account is being used by another application.
    • The eth_getTransactionCount RPC call is returning an invalid nonce.
  • The error is also being used for something unrelated to transaction overwriting (what?)

Before working on this I recommend finding out which of the above (or other) reason is causing the underpriced error.

@LefterisJP

This comment has been minimized.

Collaborator

LefterisJP commented Oct 30, 2018

I have three ideas why this could happen:

Why not also the node crashed/restarted while an on-chain transaction is pending to be mined, and when we restart it's still pending?

That's what I think is happening. Because at that point we will try to resend the transaction once we get it out of the pending transactions at restart but we will hit the same nonce since the pending transactions are not yet mined.

I think your comment here:

The eth_getTransactionCount RPC call is returning an invalid nonce.

hits the spot. We write 'pending' there but as far as I recall that does not check the mempool. At least for geth there is an issue open for 2.5 years now for this.

Parity has the same problem in this issue.

@hackaugusto

This comment has been minimized.

Collaborator

hackaugusto commented Oct 31, 2018

Uh-oh, if the node doesn't tell us how many transactions are in the mempool the restart code will not work at all, either we have to switch to remote signing, or store the nonce locally.

@LefterisJP LefterisJP added this to the Red eyes testnet 16 milestone Oct 31, 2018

@LefterisJP

This comment has been minimized.

Collaborator

LefterisJP commented Oct 31, 2018

As discussed in breakfast either the transaction_hash approach or as you say storing the nonce locally would work for the short term perhaps. Let me investigate a bit when I got more time. All input is welcome!

@LefterisJP

This comment has been minimized.

Collaborator

LefterisJP commented Nov 1, 2018

Done by: #2943

@LefterisJP LefterisJP closed this Nov 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment