Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hermes issues related to Tendermint-Go & SDK #1397

Closed
5 of 10 tasks
adizere opened this issue Sep 29, 2021 · 1 comment
Closed
5 of 10 tasks

Hermes issues related to Tendermint-Go & SDK #1397

adizere opened this issue Sep 29, 2021 · 1 comment
Assignees
Labels
A: admin Admin: general administrative & planning issue I: dependencies Internal: related to dependencies
Milestone

Comments

@adizere
Copy link
Member

adizere commented Sep 29, 2021

Summary

This is a meta-issue capturing various problems we encountered in Hermes. Solving them requires some tendermint-go familiarity (or even other dependency), requires further context, and may be potentially blocked until an upstream fix appears in tm-go.

Sub-issues

The issues below are relatively high importance (we don't document here long-standing problems or wishes). Most of them should be fixed within 1-2 months ideally.

The full context for this issue is as follows:

  • Our initial attempt at a solution is here Retry send_tx on account sequence mismatch #1349
    • This PR attempt to solve the problem by this logic:
      1. parsing the error code which the chain replies to Hermes whenever the relayer uses the wrong sequence number, ref here;
      2. upon encountering the predefined error code, Hermes re-fetches the sequence number by contacting the full node
  • This PR, however, does not take into account a more severe problem w.r.t. to the sequence number, which can be reproduced cf. this comment.
  • To solve the more severe problem, it seems we cannot rely on the information which the full node provides, i.e., step (2) in the logic above is not an appropriate fix.
  • Different solutions, such as avoiding caching altogether and always querying the full node for the seq. nr, also also not appropriate. The full node cannot be trusted, because one of Hermes' transactions remains stuck in that node's mempool, thus the node mistakenly believes Hermes should use a higher sequence number.
  • The only fix we know about is as follows: the relayer operator must restart the full node & explicitly flush the mempool, then Hermes has to reconnect to the full node

To summarize, we cannot solve this problem by making Hermes query the full node for the up-to-date seq. number, since the full node provides us with an incorrect value. Hermes cannot rely either on the deliverTx error message (which reflects the seq. number as the rest of the network sees it), because the full node which Hermes talk to will refuse that sequence number. For a principled fix, we need to get a deeper understanding of the mempool, and potentially submit a new feature request to tm-go mempool maintainers. This comment comprises some further ideas, but they are incomplete.


Status:





For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere adizere added the I: dependencies Internal: related to dependencies label Sep 29, 2021
@adizere adizere added this to the 11.2021 milestone Sep 29, 2021
@adizere adizere changed the title Hermes issues related to Tendermint-Go Hermes issues related to Tendermint-Go & SDK Sep 30, 2021
@adizere adizere added the A: admin Admin: general administrative & planning issue label Oct 4, 2021
@romac romac added the A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews label Nov 2, 2021
@adizere adizere modified the milestones: 11.2021, 12.2021 Nov 2, 2021
@adizere adizere removed the A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews label May 4, 2022
@adizere
Copy link
Member Author

adizere commented May 27, 2022

Closing this issue to reduce admin burden. There is only one last outstanding item (#1400) that we're handling individually.

@adizere adizere closed this as completed May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: admin Admin: general administrative & planning issue I: dependencies Internal: related to dependencies
Projects
No open projects
Status: Closed
Development

No branches or pull requests

4 participants