Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence number caching in Hermes is unreliable #1264

Closed
5 tasks
Tracked by #1350 ...
gamarin2 opened this issue Aug 6, 2021 · 18 comments · Fixed by #1349
Closed
5 tasks
Tracked by #1350 ...

Sequence number caching in Hermes is unreliable #1264

gamarin2 opened this issue Aug 6, 2021 · 18 comments · Fixed by #1349
Assignees
Labels
A: bug Admin: something isn't working E: gravity External: related to Gravity DEX I: logic Internal: related to the relaying logic
Milestone

Comments

@gamarin2
Copy link

gamarin2 commented Aug 6, 2021

Crate

Hermes

Summary

Query the state for every chain in config every X sec for the relayer account's seq number, and update cached seq number with it. Make X a configurable param.

Problem Definition

At some point (around height 1008 on Cosmos hub), for some reason, the cached seq number on our relayer instance got incremented, but the actual on-chain seq number wasn't. From there, our relayer hasn't been able to relay, producing the following error:

Aug 06 15:51:41.681  INFO [irishub-1:transfer/channel-12 -> cosmoshub-4] result events:
    ChainErrorEv(deliver_tx on chain cosmoshub-4 for Tx hash EC50AD139769F2585F351CDE69636B0C82F522A5B29ADDD1C279737088F4E098 reports error: code=Err(32), log=Log("account sequence mismatch, expected 1008, got 1358: incorrect account sequence"))

While we don't know what caused this mismatch to happen, we couldn't detect it and thought our relayer was running properly. It would be good to figure out why the mismatch happened, but it would also be good to have a failsafe that updates the cached seq number with the actual on-chain seq number periodically so that the relayer can recover on its own.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@andynog
Copy link
Contributor

andynog commented Aug 6, 2021

I've done a preliminary investigation. Here's some thoughts on what might be happening

The sequence is incremented at the of send_tx here https://github.com/informalsystems/ibc-rs/blob/142c7dc29c7636d1c9ece51ce8fd6f7c50ca3477/relayer/src/chain/cosmos.rs#L384

then the response from the broadcast_tx_sync is returned from send_tx but there are no checks or validation on the response at this stage, so I'm assuming if the response has an "unsuccessful" or "unexpected" response the internal sequence increments anyways so this might potentially cause some situations that the account sequence will mismatch because the internal sequence is ahead of the sequence on chain.

I believe the sequence is only retrieved once and cached here https://github.com/informalsystems/ibc-rs/blob/142c7dc29c7636d1c9ece51ce8fd6f7c50ca3477/relayer/src/chain/cosmos.rs#L521

An account is returned in the logic below and the sequence is retrieved from this account
https://github.com/informalsystems/ibc-rs/blob/142c7dc29c7636d1c9ece51ce8fd6f7c50ca3477/relayer/src/chain/cosmos.rs#L544

I believe this logic to cache the sequence was implemented to improve performance since for every transaction you'd need to make an additional query to get the sequence and could slow things down.

Maybe another approach is to have an internal retry counter or logic to "refresh" the sequence before all retries are exhausted and Hermes gives up on that transaction.

Looks like that if you re-start Hermes then it's able to relay the transaction because the sequence might be retrieved from the chain.

@adizere adizere added A: bug Admin: something isn't working E: gravity External: related to Gravity DEX I: logic Internal: related to the relaying logic labels Aug 10, 2021
@adizere adizere added this to the 09.2021 milestone Aug 10, 2021
@adizere
Copy link
Member

adizere commented Aug 10, 2021

I think what me might try to do in Hermes is to parse the exact error message and update the cached sequence number on-the-fly, only when an error occurs. The message looks as follows, and should be easy for Hermes to interpret and update the cached value:

	ChainErrorEv(deliver_tx on chain test-1 for Tx hash 21C1871B90E00523BD6E4A42E4F5AD747714D760EAB77595946FE31BAC24F4F3 reports error: code=Err(32), log=Log("account sequence mismatch, expected 8, got 7: incorrect account sequence"))

@adizere
Copy link
Member

adizere commented Sep 20, 2021

I managed to reproduce this issue using a slightly more complex scenario, with help from Greg and Mircea. The goal we're trying to reach in the scenario I'll describe below, which mimics production issues, is as follows:

  • Hermes is connected to a full node a, say rpc_addr = 'a', on network X
  • Suppose this node accepts a transaction tx1 from Hermes, and this transaction enters the mempool of node a
  • This transaction tx1 will not be deliver-ed (for reasons we'll detail below), and will remain in the mempool of node a instead
  • Now if Hermes attempts to submit another transaction tx2 to this node, the node will reject this transaction on grounds of having incorrect sequence number; the log will report:
Sep 20 12:36:11.334 DEBUG [a] send_tx: retrieved account sequence=8 number=1
Sep 20 12:36:11.334 DEBUG [a] send_tx: sending 2 messages using nonce 8
Sep 20 12:36:11.351 TRACE [a] send_tx: based on the estimated gas, adjusting fee from Fee { amount: [Coin { denom: "stake", amount: "500" }], gas_limit: 500000, payer: "", granter: "" } to Fee { amount: [Coin { denom: "stake", amount: "500" }], gas_limit: 500000, payer: "", granter: "" }
Sep 20 12:36:11.360 DEBUG [a] send_tx: broadcast_tx_sync: Response { code: Err(32), data: Data([]), log: Log("account sequence mismatch, expected 9, got 8: incorrect account sequence"), hash: transaction::Hash(7BF91E0641594119F1E29BF5E9DBACD7F3152603F94A13727D924188716AD080) }
Sep 20 12:36:11.361  INFO [Async~>a] response(s): 1; Err(32):7BF91E0641594119F1E29BF5E9DBACD7F3152603F94A13727D924188716AD080

Sep 20 12:36:11.361  INFO [b:transfer/channel-0 -> a] success
Sep 20 12:36:11.361 TRACE [a:transfer/channel-0 -> b] putting error response in error event queue: Response { code: Err(32), data: Data([]), log: Log("account sequence mismatch, expected 9, got 8: incorrect account sequence"), hash: transaction::Hash(7BF91E0641594119F1E29BF5E9DBACD7F3152603F94A13727D924188716AD080) }
Sep 20 12:36:11.361 TRACE Packet worker produced relay summary: RelaySummary { events: [ChainError("deliver_tx on chain a for Tx hash 7BF91E0641594119F1E29BF5E9DBACD7F3152603F94A13727D924188716AD080 reports error: code=Err(32), log=Log(\"account sequence mismatch, expected 9, got 8: incorrect account sequence\")")] }

Notice that the sequence number fetched is 8, but then the node rejects the submitted tx reporting that it should use s.n. 9. The explanation is that s.n. 8 is already used by an earlier transaction, which is stuck in the mempool. At this point, Hermes is unable to make further progress, because it cannot rely on the fetched sequence number 8.

The only solution we know to solve this is to flush the mempool of this node. Even this solution will partially fix it, because this same node might have broadcasted the problematic tx (stuck with s.n. 8) to other nodes, which may broadcast it back to our node.

To reproduce this problem, the steps are as follows, using gm.

  1. use this config file as your gm.toml: https://gist.github.com/adizere/fe9542b764ceed891e7a311f40de7c44
  2. gm start and then gm stop to generate all the necessary config and genesis files
  3. set minimum-gas-prices = "0.5stake" in file .gm/a/config/app.toml
    • by default, this should be minimum-gas-prices = ""
  4. gm reset
  5. gm start
  6. gm hermes config
  7. gm hermes keys
  8. hermes create client a b
    • this should fail with "Failed Tx: no confirmation"
    • the explanation is that the full node (called node-1 in the gm.toml) which receives the CreateClient transaction from Hermes will accept this transaction, but the network will reject the transaction
    • the network is configured with a certain minimum gas price (see step 3 above) which the node is not respecting
    • the tx will remain stuck in the node's mempool
  9. hermes create client a b
    • now we should see the sequence number problem occurring:

Sep 20 16:41:04.138 WARN [a] waiting for commit of tx hashes(s) 431E4AF496A1DB8280E745F33EC73BCA032DDA8B978C0501A7A6DC8D7E7B844E
Success: ChainError(
"deliver_tx on chain a for Tx hash 431E4AF496A1DB8280E745F33EC73BCA032DDA8B978C0501A7A6DC8D7E7B844E reports error: code=Err(32), log=Log("account sequence mismatch, expected 1, got 0: incorrect account sequence")",
)

@hu55a1n1
Copy link
Member

Would it be useful if Tendermint full-nodes communicated a cryptographic commitment/hash of their (consensus-related) config to other nodes and warned users if there was a mismatch?

@adizere
Copy link
Member

adizere commented Sep 22, 2021

I think the problem is that there are multiple levels of configurations. The tendermint-specific configuration is not at fault here, but the app.toml -- the application-specific configuration -- is causing transactions hanging in the mempool in the scenario outlined above. Not sure if a commitment proof would help there. But note that there may be different reasons why tx-es could remain stuck in the mempool (connectivity issues, misbehavior).

@adizere adizere modified the milestones: 09.2021, 10.2021 Sep 27, 2021
@adizere adizere changed the title Hermes: Periodic queries to sync state seq with local seq Sequence number caching in Hermes is unreliable Sep 29, 2021
@adizere adizere modified the milestones: 10.2021, 11.2021 Sep 30, 2021
@adizere adizere modified the milestones: 11.2021, 12.2021 Nov 2, 2021
@mircea-c
Copy link

mircea-c commented Dec 3, 2021

Looks like this goes a bit deeper. I've just discovered that the sequence number issue comes up even with a fully synced node. Not sure why. I've attached a couple of log snippets. This log shows the timeline from a successful osmosis node TX to the next TX which now throws a sequence number error.

2021-12-03T16:15:44.623338Z DEBUG ThreadId(60) [osmosis-1] send_tx: sending 2 messages using account sequence 166792
 2021-12-03T16:15:44.623860Z DEBUG ThreadId(60) [osmosis-1] send_tx: max fee, for use in tx simulation: Fee { amount: "501uosmo", gas_limit: 5000000 }
 2021-12-03T16:15:44.966066Z DEBUG ThreadId(60) [osmosis-1] estimate_gas: tx simulation successful, gas amount used: 1358253
 2021-12-03T16:15:44.966130Z DEBUG ThreadId(60) [osmosis-1] send_tx: using 1358253 gas, fee Fee { amount: "150uosmo", gas_limit: 1494079 }
 2021-12-03T16:15:45.155148Z DEBUG ThreadId(60) [osmosis-1] send_tx: broadcast_tx_sync: Response { code: Ok, data: Data([]), log: Log("[]"), hash: transaction::Hash(C819745777EA7921B9E45FB7463C4F8DFEF08F4583E0933EBCB7FB4DC22E5D6C) }
 2021-12-03T16:15:45.155241Z  INFO ThreadId(332) [Async~>osmosis-1] response(s): 1; Ok:C819745777EA7921B9E45FB7463C4F8DFEF08F4583E0933EBCB7FB4DC22E5D6C
 
 2021-12-03T16:15:45.155259Z  INFO ThreadId(332) [sentinelhub-2:transfer/channel-0 -> osmosis-1] success
 2021-12-03T16:15:47.961640Z  INFO ThreadId(68) [core-1:transfer/channel-6 -> osmosis-1] clearing pending packets height=Some(Height { revision: 1, height: 3752199 })
 2021-12-03T16:15:49.326799Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:15:49.328207Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] juno-1 => SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:49.729998Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:49.730049Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-888330)
 2021-12-03T16:15:49.764730Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-888330), delayed by: 34.671901ms [try 1/5]
 2021-12-03T16:15:49.764773Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] prepending Destination client update @ height 1-888330
 2021-12-03T16:15:49.881922Z  INFO ThreadId(68) [core-1:transfer/channel-6 -> osmosis-1] finished scheduling pending packets clearing height=Some(Height { revision: 1, height: 3752199 })
 2021-12-03T16:15:49.956622Z  WARN ThreadId(84) [juno-1 -> osmosis-1:07-tendermint-1457] src header Timestamp(2021-12-03T16:15:42.368880883+00:00) is after dst latest header Timestamp(2021-12-03T16:14:05.968261873+00:00) + client state drift 5s, wait for next height on osmosis-1
 2021-12-03T16:15:50.258558Z ERROR ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] worker: schedule execution encountered error: failed during a client operation: update header from juno-1 with height 1-888330 and time Timestamp(2021-12-03T16:15:42.368880883+00:00) 
is in the future compared with latest header on osmosis-1 with height 1-2241409 and time Timestamp(2021-12-03T16:14:12.379102539+00:00), adjusted with drift 5s
 2021-12-03T16:15:51.258694Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:15:51.260064Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] juno-1 => SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:51.332890Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:51.332928Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-888330)
 2021-12-03T16:15:51.380005Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-888330), delayed by: 47.068267ms [try 1/5]
 2021-12-03T16:15:51.380046Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] prepending Destination client update @ height 1-888330
 2021-12-03T16:15:51.446319Z  INFO ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] generate messages from batch with 1 events
 2021-12-03T16:15:51.494865Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] osmosis-1 => WriteAcknowledgementEv(WriteAcknowledgement - h:1-2241408, seq:27164, path:channel-0/transfer->channel-42/transfer, toh:1-2241550, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:51.495907Z  WARN ThreadId(84) [juno-1 -> osmosis-1:07-tendermint-1457] src header Timestamp(2021-12-03T16:15:42.368880883+00:00) is after dst latest header Timestamp(2021-12-03T16:14:12.379102539+00:00) + client state drift 5s, wait for next height on osmosis-1
 2021-12-03T16:15:51.497732Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] WriteAcknowledgement - h:1-2241408, seq:27164, path:channel-0/transfer->channel-42/transfer, toh:1-2241550, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:15:51.797838Z ERROR ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] worker: schedule execution encountered error: failed during a client operation: update header from juno-1 with height 1-888330 and time Timestamp(2021-12-03T16:15:42.368880883+00:00) 
is in the future compared with latest header on osmosis-1 with height 1-2241410 and time Timestamp(2021-12-03T16:14:18.398713323+00:00), adjusted with drift 5s
 2021-12-03T16:15:52.807991Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:15:52.809328Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] juno-1 => SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:53.367801Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTime
stamp)))
 2021-12-03T16:15:53.367846Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-888330)
 2021-12-03T16:15:53.374377Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-888330), delayed by: 6.520016ms [try 1/5]
 2021-12-03T16:15:53.374429Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] prepending Destination client update @ height 1-888330
 2021-12-03T16:15:53.427802Z  INFO ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] generate messages from batch with 1 events
 2021-12-03T16:15:53.466756Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] osmosis-1 => SendPacketEv(SendPacket - h:1-2241409, seq:34145, path:channel-42/transfer->channel-0/transfer, toh:1-888805, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:53.467068Z  WARN ThreadId(84) [juno-1 -> osmosis-1:07-tendermint-1457] src header Timestamp(2021-12-03T16:15:42.368880883+00:00) is after dst latest header Timestamp(2021-12-03T16:14:18.398713323+00:00) + client state drift 5s, wait for next height on osmosis-1
 2021-12-03T16:15:53.468493Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] SendPacket - h:1-2241409, seq:34145, path:channel-42/transfer->channel-0/transfer, toh:1-888805, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:15:53.768628Z ERROR ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] worker: schedule execution encountered error: failed during a client operation: update header from juno-1 with height 1-888330 and time Timestamp(2021-12-03T16:15:42.368880883+00:00) is in the future compared with latest header on osmosis-1 with height 1-2241411 and time Timestamp(2021-12-03T16:14:25.221697141+00:00), adjusted with drift 5s
 2021-12-03T16:15:54.788764Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:15:54.790183Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] juno-1 => SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:54.890601Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:54.890679Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-888330)
 2021-12-03T16:15:55.132114Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-888330), delayed by: 241.423197ms [try 1/5]
 2021-12-03T16:15:55.132155Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] prepending Destination client update @ height 1-888330
 2021-12-03T16:15:55.228690Z  WARN ThreadId(84) [juno-1 -> osmosis-1:07-tendermint-1457] src header Timestamp(2021-12-03T16:15:42.368880883+00:00) is after dst latest header Timestamp(2021-12-03T16:14:25.221697141+00:00) + client state drift 5s, wait for next height on osmosis-1
 2021-12-03T16:15:55.530461Z ERROR ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] worker: schedule execution encountered error: failed during a client operation: update header from juno-1 with height 1-888330 and time Timestamp(2021-12-03T16:15:42.368880883+00:00) is in the future compared with latest header on osmosis-1 with height 1-2241412 and time Timestamp(2021-12-03T16:14:31.319009583+00:00), adjusted with drift 5s
 2021-12-03T16:15:56.560599Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:15:56.561915Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] juno-1 => SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:56.568240Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:56.568281Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-888330)
 2021-12-03T16:15:56.573545Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-888330), delayed by: 5.255016ms [try 1/5]
 2021-12-03T16:15:56.573583Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] prepending Destination client update @ height 1-888330
 2021-12-03T16:15:56.675096Z  WARN ThreadId(84) [juno-1 -> osmosis-1:07-tendermint-1457] src header Timestamp(2021-12-03T16:15:42.368880883+00:00) is after dst latest header Timestamp(2021-12-03T16:14:37.552683596+00:00) + client state drift 5s, wait for next height on osmosis-1
 2021-12-03T16:15:58.486070Z ERROR ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] worker: schedule execution encountered error: failed during a client operation: update header from juno-1 with height 1-888330 and time Timestamp(2021-12-03T16:15:42.368880883+00:00) is in the future compared with latest header on osmosis-1 with height 1-2241414 and time Timestamp(2021-12-03T16:14:48.558922254+00:00), adjusted with drift 5s
 2021-12-03T16:15:59.526240Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:15:59.527821Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] juno-1 => SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:59.654127Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:59.654176Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-888330)
 2021-12-03T16:15:59.712119Z  INFO ThreadId(68) [core-1:transfer/channel-6 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:15:59.761218Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-888330), delayed by: 107.031714ms [try 1/5]
 2021-12-03T16:15:59.761251Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] prepending Destination client update @ height 1-888330
 2021-12-03T16:15:59.762621Z DEBUG ThreadId(68) [core-1:transfer/channel-6 -> osmosis-1] core-1 => WriteAcknowledgementEv(WriteAcknowledgement - h:1-3752202, seq:16110, path:channel-4/transfer->channel-6/transfer, toh:1-3752345, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:15:59.795814Z DEBUG ThreadId(68) [core-1:transfer/channel-6 -> osmosis-1] WriteAcknowledgement - h:1-3752202, seq:16110, path:channel-4/transfer->channel-6/transfer, toh:1-3752345, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:15:59.892870Z  WARN ThreadId(84) [juno-1 -> osmosis-1:07-tendermint-1457] src header Timestamp(2021-12-03T16:15:42.368880883+00:00) is after dst latest header Timestamp(2021-12-03T16:14:48.558922254+00:00) + client state drift 5s, wait for next height on osmosis-1
 2021-12-03T16:16:00.797795Z ERROR ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] worker: schedule execution encountered error: failed during a client operation: update header from juno-1 with height 1-888330 and time Timestamp(2021-12-03T16:15:42.368880883+00:00) is in the future compared with latest header on osmosis-1 with height 1-2241415 and time Timestamp(2021-12-03T16:14:54.759623082+00:00), adjusted with drift 5s
 2021-12-03T16:16:01.636967Z  INFO ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] generate messages from batch with 1 events
 2021-12-03T16:16:01.638348Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] osmosis-1 => SendPacketEv(SendPacket - h:1-2241414, seq:34146, path:channel-42/transfer->channel-0/transfer, toh:1-888469, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:16:01.640235Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] SendPacket - h:1-2241414, seq:34146, path:channel-42/transfer->channel-0/transfer, toh:1-888469, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:16:01.847953Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:16:01.849410Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] juno-1 => SendPacketEv(SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:16:02.532325Z DEBUG ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] SendPacket - h:1-888329, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:16:02.643907Z  INFO ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] generate messages from batch with 1 events
 2021-12-03T16:16:02.645690Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] osmosis-1 => WriteAcknowledgementEv(WriteAcknowledgement - h:1-2241415, seq:27165, path:channel-0/transfer->channel-42/transfer, toh:1-2241553, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:16:02.648494Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] WriteAcknowledgement - h:1-2241415, seq:27165, path:channel-0/transfer->channel-42/transfer, toh:1-2241553, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:16:07.729850Z  INFO ThreadId(957) [osmosis-1:transfer/channel-4 -> core-1] generate messages from batch with 1 events
 2021-12-03T16:16:07.731401Z DEBUG ThreadId(957) [osmosis-1:transfer/channel-4 -> core-1] osmosis-1 => SendPacketEv(SendPacket - h:1-2241419, seq:16110, path:channel-4/transfer->channel-6/transfer, toh:1-3752345, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:16:07.733539Z DEBUG ThreadId(957) [osmosis-1:transfer/channel-4 -> core-1] SendPacket - h:1-2241419, seq:16110, path:channel-4/transfer->channel-6/transfer, toh:1-3752345, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:16:09.589043Z  INFO ThreadId(322) [osmosis-1:transfer/channel-2 -> sentinelhub-2] generate messages from batch with 1 events
 2021-12-03T16:16:09.591219Z DEBUG ThreadId(322) [osmosis-1:transfer/channel-2 -> sentinelhub-2] osmosis-1 => WriteAcknowledgementEv(WriteAcknowledgement - h:1-2241420, seq:25241, path:channel-0/transfer->channel-2/transfer, toh:1-2241879, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:16:09.593793Z DEBUG ThreadId(322) [osmosis-1:transfer/channel-2 -> sentinelhub-2] WriteAcknowledgement - h:1-2241420, seq:25241, path:channel-0/transfer->channel-2/transfer, toh:1-2241879, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:16:13.968157Z  INFO ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] generate messages from batch with 1 events
 2021-12-03T16:16:13.969958Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] osmosis-1 => WriteAcknowledgementEv(WriteAcknowledgement - h:1-2241423, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:16:13.973030Z DEBUG ThreadId(85) [osmosis-1:transfer/channel-42 -> juno-1] WriteAcknowledgement - h:1-2241423, seq:27166, path:channel-0/transfer->channel-42/transfer, toh:1-2241526, tos:Timestamp(NoTimestamp)) already handled
 2021-12-03T16:16:37.723996Z  INFO ThreadId(1696) [osmosis-1:transfer/channel-6 -> irishub-1] generate messages from batch with 1 events
 2021-12-03T16:16:37.725994Z DEBUG ThreadId(1696) [osmosis-1:transfer/channel-6 -> irishub-1] osmosis-1 => SendPacketEv(SendPacket - h:1-2241429, seq:10340, path:channel-6/transfer->channel-3/transfer, toh:1-12775266, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:16:37.732677Z DEBUG ThreadId(1696) [osmosis-1:transfer/channel-6 -> irishub-1] irishub-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-2241429, seq:10340, path:channel-6/transfer->channel-3/transfer, toh:1-12775266, tos:Timestamp(NoTimestamp)))

 2021-12-03T16:16:37.732714Z  INFO ThreadId(1696) [osmosis-1:transfer/channel-6 -> irishub-1] scheduling op. data with 1 msg(s) for Destination (height 1-2241430)
 2021-12-03T16:16:37.738312Z  INFO ThreadId(1696) [osmosis-1:transfer/channel-6 -> irishub-1] relay op. data of 1 msgs(s) to Destination (height 1-2241430), delayed by: 5.590259ms [try 1/5]
 2021-12-03T16:16:37.738356Z  INFO ThreadId(1696) [osmosis-1:transfer/channel-6 -> irishub-1] prepending Destination client update @ height 1-2241430
 2021-12-03T16:16:37.805367Z DEBUG ThreadId(1696) [osmosis-1 -> irishub-1:07-tendermint-5] MsgUpdateAnyClient from trusted height 1-2241384 to target height 1-2241430
 2021-12-03T16:16:37.805697Z  INFO ThreadId(1696) [osmosis-1:transfer/channel-6 -> irishub-1] assembled batch of 2 message(s)
 2021-12-03T16:16:37.805751Z DEBUG ThreadId(52) send_messages_and_wait_check_tx with 2 messages
 2021-12-03T16:16:37.805771Z DEBUG ThreadId(52) [irishub-1] send_tx: sending 2 messages using account sequence 3308
 2021-12-03T16:16:37.806303Z DEBUG ThreadId(52) [irishub-1] send_tx: max fee, for use in tx simulation: Fee { amount: "2001uiris", gas_limit: 2000000 }
 2021-12-03T16:16:37.839079Z DEBUG ThreadId(52) [irishub-1] estimate_gas: tx simulation successful, gas amount used: 333649
 2021-12-03T16:16:37.839135Z DEBUG ThreadId(52) [irishub-1] send_tx: using 333649 gas, fee Fee { amount: "368uiris", gas_limit: 367014 }
 2021-12-03T16:16:37.844667Z DEBUG ThreadId(52) [irishub-1] send_tx: broadcast_tx_sync: Response { code: Ok, data: Data([]), log: Log("[]"), hash: transaction::Hash(3ABF3F68CE738E940BC7DC75A47753FF83E22A635B74C70E4B26B8F9DB995CD2) }
 2021-12-03T16:16:37.844753Z  INFO ThreadId(1696) [Async~>irishub-1] response(s): 1; Ok:3ABF3F68CE738E940BC7DC75A47753FF83E22A635B74C70E4B26B8F9DB995CD2
 
 2021-12-03T16:16:37.844772Z  INFO ThreadId(1696) [osmosis-1:transfer/channel-6 -> irishub-1] success
 2021-12-03T16:16:47.104938Z  INFO ThreadId(332) [sentinelhub-2:transfer/channel-0 -> osmosis-1] clearing pending packets height=Some(Height { revision: 2, height: 3541699 })
 2021-12-03T16:16:48.847104Z  INFO ThreadId(332) [sentinelhub-2:transfer/channel-0 -> osmosis-1] finished scheduling pending packets clearing height=Some(Height { revision: 2, height: 3541699 })
 2021-12-03T16:17:01.120178Z  INFO ThreadId(1959) [impacthub-3:transfer/channel-4 -> osmosis-1] clearing pending packets height=Some(Height { revision: 3, height: 1601699 })
 2021-12-03T16:17:01.147441Z  INFO ThreadId(1959) [impacthub-3:transfer/channel-4 -> osmosis-1] finished scheduling pending packets clearing height=Some(Height { revision: 3, height: 1601699 })
 2021-12-03T16:17:03.634916Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:17:03.636219Z DEBUG ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] regen-1 => SendPacketEv(SendPacket - h:1-3223686, seq:8378, path:channel-1/transfer->channel-8/transfer, toh:1-2241581, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:17:03.851079Z DEBUG ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-3223686, seq:8378, path:channel-1/transfer->channel-8/transfer, toh:1-2241581, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:17:03.851145Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-3223687)
 2021-12-03T16:17:03.859993Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-3223687), delayed by: 8.834422ms [try 1/5]
 2021-12-03T16:17:03.860036Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] prepending Destination client update @ height 1-3223687
 2021-12-03T16:17:03.930378Z DEBUG ThreadId(1064) [regen-1 -> osmosis-1:07-tendermint-6] MsgUpdateAnyClient from trusted height 1-3223626 to target height 1-3223687
 2021-12-03T16:17:03.930603Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] assembled batch of 2 message(s)
 2021-12-03T16:17:03.930661Z DEBUG ThreadId(60) send_messages_and_wait_check_tx with 2 messages
 2021-12-03T16:17:03.930678Z DEBUG ThreadId(60) [osmosis-1] send_tx: sending 2 messages using account sequence 166793
 2021-12-03T16:17:03.931200Z DEBUG ThreadId(60) [osmosis-1] send_tx: max fee, for use in tx simulation: Fee { amount: "501uosmo", gas_limit: 5000000 }
 2021-12-03T16:17:03.934970Z ERROR ThreadId(60) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: InvalidArgument, message: "account sequence mismatch, expected 166791, got 166793: incorrect account sequence: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
 2021-12-03T16:17:03.935058Z ERROR ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] worker: schedule execution encountered error: failed with underlying error: gRPC call failed with status: status: InvalidArgument, message: "account sequence mismatch, expected 166791, got 166793: incorrect account sequence: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
 2021-12-03T16:17:04.935224Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:17:04.936779Z DEBUG ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] regen-1 => SendPacketEv(SendPacket - h:1-3223686, seq:8378, path:channel-1/transfer->channel-8/transfer, toh:1-2241581, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:17:04.942811Z DEBUG ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:1-3223686, seq:8378, path:channel-1/transfer->channel-8/transfer, toh:1-2241581, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:17:04.942849Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-3223687)
 2021-12-03T16:17:04.948341Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-3223687), delayed by: 5.485438ms [try 1/5]
 2021-12-03T16:17:04.948375Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] prepending Destination client update @ height 1-3223687
 2021-12-03T16:17:05.014692Z DEBUG ThreadId(1064) [regen-1 -> osmosis-1:07-tendermint-6] MsgUpdateAnyClient from trusted height 1-3223626 to target height 1-3223687
 2021-12-03T16:17:05.014876Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] assembled batch of 2 message(s)
 2021-12-03T16:17:05.014934Z DEBUG ThreadId(60) send_messages_and_wait_check_tx with 2 messages
 2021-12-03T16:17:05.014958Z DEBUG ThreadId(60) [osmosis-1] send_tx: sending 2 messages using account sequence 166793
 2021-12-03T16:17:05.015632Z DEBUG ThreadId(60) [osmosis-1] send_tx: max fee, for use in tx simulation: Fee { amount: "501uosmo", gas_limit: 5000000 }
 2021-12-03T16:17:05.020174Z ERROR ThreadId(60) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: InvalidArgument, message: "account sequence mismatch, expected 166791, got 166793: incorrect account sequence: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
 2021-12-03T16:17:05.020264Z ERROR ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] worker: schedule execution encountered error: failed with underlying error: gRPC call failed with status: status: InvalidArgument, message: "account sequence mismatch, expected 166791, got 166793: incorrect account sequence: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }

The second one shows that, even after 35 minutes, Hermes was still getting the same error which in turn was causing all Osmosis tx submissions to fail and relaying to stop. Only after a Hermes restart did the issue clear.

2021-12-03T16:51:20.405416Z ERROR ThreadId(60) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: InvalidArgument, message: "account sequence mismatch, expected 166791, got 166793: incorrect account sequence: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
 2021-12-03T16:51:20.405517Z ERROR ThreadId(68) [core-1:transfer/channel-6 -> osmosis-1] worker: schedule execution encountered error: failed with underlying error: gRPC call failed with status: status: InvalidArgument, message: "account sequence mismatch, expected 166791, got 166793: incorrect account sequence: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
 2021-12-03T16:51:20.412916Z DEBUG ThreadId(65) [crypto-org-chain-mainnet-1:transfer/channel-10 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgAcknowledgement from WriteAcknowledgementEv(WriteAcknowledgement - h:1-3479364, seq:30344, path:channel-5/transfer->channel-10/transfer, toh:1-3479510, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:51:20.412968Z  INFO ThreadId(65) [crypto-org-chain-mainnet-1:transfer/channel-10 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-3479365)
 2021-12-03T16:51:20.413386Z DEBUG ThreadId(1686) [irishub-1:transfer/channel-3 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgAcknowledgement from WriteAcknowledgementEv(WriteAcknowledgement - h:1-12775128, seq:10340, path:channel-6/transfer->channel-3/transfer, toh:1-12775266, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:51:20.413415Z  INFO ThreadId(1686) [irishub-1:transfer/channel-3 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 1-12775129)
 2021-12-03T16:51:20.413912Z DEBUG ThreadId(221) [iov-mainnet-ibc:transfer/channel-2 -> osmosis-1] osmosis-1 <= /ibc.core.channel.v1.MsgRecvPacket from SendPacketEv(SendPacket - h:0-6555675, seq:1567, path:channel-2/transfer->channel-15/transfer, toh:1-2241783, tos:Timestamp(NoTimestamp)))
 2021-12-03T16:51:20.413958Z  INFO ThreadId(221) [iov-mainnet-ibc:transfer/channel-2 -> osmosis-1] scheduling op. data with 1 msg(s) for Destination (height 0-6555676)
 2021-12-03T16:51:20.418821Z  INFO ThreadId(65) [crypto-org-chain-mainnet-1:transfer/channel-10 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-3479365), delayed by: 5.842251ms [try 1/5]
 2021-12-03T16:51:20.418868Z  INFO ThreadId(65) [crypto-org-chain-mainnet-1:transfer/channel-10 -> osmosis-1] prepending Destination client update @ height 1-3479365
 2021-12-03T16:51:20.420975Z  INFO ThreadId(1686) [irishub-1:transfer/channel-3 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-12775129), delayed by: 7.553282ms [try 1/5]
 2021-12-03T16:51:20.421053Z  INFO ThreadId(1686) [irishub-1:transfer/channel-3 -> osmosis-1] prepending Destination client update @ height 1-12775129
 2021-12-03T16:51:20.424912Z  WARN ThreadId(65) [crypto-org-chain-mainnet-1 -> osmosis-1:07-tendermint-4] resolving trusted height from the full list of consensus state heights for target height 1-3479365; this may take a while
 2021-12-03T16:51:20.430265Z  WARN ThreadId(1686) [irishub-1 -> osmosis-1:07-tendermint-5] resolving trusted height from the full list of consensus state heights for target height 1-12775129; this may take a while
 2021-12-03T16:51:21.431067Z  INFO ThreadId(62) [akashnet-2:transfer/channel-9 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:51:22.215780Z  INFO ThreadId(68) [core-1:transfer/channel-6 -> osmosis-1] clearing pending packets height=Some(Height { revision: 1, height: 3752299 })
 2021-12-03T16:51:24.531656Z  INFO ThreadId(84) [juno-1:transfer/channel-0 -> osmosis-1] generate messages from batch with 1 events
 2021-12-03T16:51:25.963152Z  INFO ThreadId(332) [sentinelhub-2:transfer/channel-0 -> osmosis-1] clearing pending packets height=Some(Height { revision: 2, height: 3541999 })
 2021-12-03T16:51:31.636600Z  INFO ThreadId(1064) [regen-1:transfer/channel-1 -> osmosis-1] clearing pending packets height=Some(Height { revision: 1, height: 3223999 })
 2021-12-03T16:51:39.661945Z  INFO ThreadId(01) using default configuration from '/home/hermes/.hermes/config.toml'
 2021-12-03T16:51:39.663034Z  INFO ThreadId(01) starting REST API server listening at http://0.0.0.0:3000
 2021-12-03T16:51:39.668173Z  INFO ThreadId(01) Hermes has started
 2021-12-03T16:51:39.680474Z DEBUG ThreadId(41) [akashnet-2] starting event monitor
 

@mircea-c mircea-c closed this as completed Dec 3, 2021
@mircea-c mircea-c reopened this Dec 3, 2021
@adizere
Copy link
Member

adizere commented Dec 3, 2021

The second one shows that, even after 35 minutes, Hermes was still getting the same error which in turn was causing all Osmosis tx submissions to fail and relaying to stop. Only after a Hermes restart did the issue clear.

You did not try restarting the Osmosis node, right (i.e., the mempool flushing approach)? Only Hermes was restarted.

Looking at the logs, it's possible that the error message is trustworthy:

2021-12-03T16:17:03.934970Z ERROR ThreadId(60) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: InvalidArgument, message: "account sequence mismatch, expected 166791, got 166793: incorrect account sequence: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }

At the top of log you shared shows a transaction that has s.n. 166792. This transaction has hash C819745777EA7921B9E45FB7463C4F8DFEF08F4583E0933EBCB7FB4DC22E5D6C which does not appear on mintscan, hinting that this tx was unsuccessful -- it never left the mempool. So whatever was the underlying problem, it manifested in the full node around the top of the log when this tx was submitted to the mempool, or even earlier.

We might consider reviving Soares' PR #1349 which had a account refresh method (ref). Though it's not clear from these logs if that would fix it.

But this is all speculation. We need to test if we can trust either the query_account gRPC (the solution Soares proposed with this PR), or the error message we receive as response as seen in the snippet above.

@dualsystems
Copy link
Contributor

dualsystems commented Dec 3, 2021

I see several ways:

  1. shift the work of counting account sequence to grpc and updated account sequence before send each txn - transaction in broadcast mode not be faster than new block come. ( I want a discussion about this )
  2. process the event as an error and update the self.account before next txn - refresh on error

@mircea-c
Copy link

mircea-c commented Dec 3, 2021

At the top of log you shared shows a transaction that has s.n. 166792. This transaction has hash C819745777EA7921B9E45FB7463C4F8DFEF08F4583E0933EBCB7FB4DC22E5D6C which does not appear on mintscan, hinting that this tx was unsuccessful -- it never left the mempool. So whatever was the underlying problem, it manifested in the full node around the top of the log when this tx was submitted to the mempool, or even earlier.

The node was not restarted. Only Hermes was restarted and resumed operations normally.

@dualsystems
Copy link
Contributor

dualsystems commented Dec 4, 2021

cosmos/cosmos-sdk#5731

sync: Wait for the tx to pass/fail CheckTx
async: Don't wait for pass/fail CheckTx; send and return tx immediately
block: Wait for the tx to pass/fail CheckTx, DeliverTx, and be committed in a block (not-recommended)
It's best to always use sync.

Why block mode not recomended ?

I still think use increment account sequence is not fully correct with broadcast-mode sync.

@adizere
Copy link
Member

adizere commented Dec 6, 2021

Why block mode not recomended ?

I don't have a complete answer to this question. We simply noticed it was not reliable, so we switched away from block to sync some time ago (in #986). This comment provides some rationale for that change #977 (comment).

I still think use increment account sequence is not fully correct with broadcast-mode sync.

Indeed. Part of the reason is because Hermes can be quite aggressive with submitting transactions, and as this comment hints cosmos/cosmos-sdk#4186 (comment), it is prone to failure, we just haven't found a more robust option yet.

The node was not restarted. Only Hermes was restarted and resumed operations normally.
process the event as an error and update the self.account before next txn - refresh on error

I think we'll resurrect the refresh-on-error approach and give that a go.

@mircea-c
Copy link

mircea-c commented Dec 6, 2021

A bit more context here.

If the node is slow in any way to gossip transactions, Hermes will keep on submitting new ones and increasing the counter in the cache. Even after the mempool is clear, Hermes will still now have an incorrect sequence number and keeps trying to submit txs using it. No way to recover from this unless Hermes is restarted.

@adizere
Copy link
Member

adizere commented Dec 7, 2021

Thanks Mircea! We would like to move forward with a potential solution and we'll prepare a branch for testing that. We'd appreciate if you can help us test it!

@mircea-c
Copy link

Seems that upgrading to IBC 2.0 is causing a lot more sequence number issues. All networks that have been upgraded to the latest version of the SDK now generate these errors where they were not before.

@adizere
Copy link
Member

adizere commented Dec 16, 2021

PR #1349 is ready for testing. Would you consider trying to run from that dev branch Mircea?

We're also waiting on feedback from another relayer operator who was willing to try this branch. Would be great to have you also on it!

@dualsystems
Copy link
Contributor

When restart osmosis node I get a situation when timeout txn do infinite loop for hermes instance:

2021-12-17T12:13:45.430305Z ERROR ThreadId(1510) [osmosis-1:transfer/channel-6 -> irishub-1] timed out while confirming TxHashes: count=1; CAC57E3637C51B34CC371A7E385A532E845A692ABA00D30E24D72A130F32C985
2021-12-17T12:13:45.430380Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-12918776), delayed by: 229629.282217579s [try 1/5]
2021-12-17T12:13:45.430403Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] prepending Destination client update @ height 1-12918776
2021-12-17T12:13:45.433564Z  WARN ThreadId(1510) [irishub-1 -> osmosis-1:07-tendermint-5] resolving trusted height from the full list of consensus state heights for target height 1-12918776; this may take a while
2021-12-17T12:13:45.604013Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] assembled batch of 2 message(s)
2021-12-17T12:13:45.616997Z ERROR ThreadId(22) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:45.617127Z ERROR ThreadId(1510) error processing pending events in destination chain: failed with underlying error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:45.818738Z ERROR ThreadId(1510) [osmosis-1:transfer/channel-6 -> irishub-1] timed out while confirming TxHashes: count=1; CAC57E3637C51B34CC371A7E385A532E845A692ABA00D30E24D72A130F32C985
2021-12-17T12:13:45.818825Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-12918776), delayed by: 229629.670661734s [try 1/5]
2021-12-17T12:13:45.818864Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] prepending Destination client update @ height 1-12918776
2021-12-17T12:13:45.822413Z  WARN ThreadId(1510) [irishub-1 -> osmosis-1:07-tendermint-5] resolving trusted height from the full list of consensus state heights for target height 1-12918776; this may take a while
2021-12-17T12:13:45.988462Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] assembled batch of 2 message(s)
2021-12-17T12:13:45.998275Z ERROR ThreadId(22) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:45.998678Z ERROR ThreadId(1510) error processing pending events in destination chain: failed with underlying error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:46.200827Z ERROR ThreadId(1510) [osmosis-1:transfer/channel-6 -> irishub-1] timed out while confirming TxHashes: count=1; CAC57E3637C51B34CC371A7E385A532E845A692ABA00D30E24D72A130F32C985
2021-12-17T12:13:46.200884Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-12918776), delayed by: 229630.052722239s [try 1/5]
2021-12-17T12:13:46.200906Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] prepending Destination client update @ height 1-12918776
2021-12-17T12:13:46.203637Z  WARN ThreadId(1510) [irishub-1 -> osmosis-1:07-tendermint-5] resolving trusted height from the full list of consensus state heights for target height 1-12918776; this may take a while
2021-12-17T12:13:46.376525Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] assembled batch of 2 message(s)
2021-12-17T12:13:46.389308Z ERROR ThreadId(22) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:46.390305Z ERROR ThreadId(1510) error processing pending events in destination chain: failed with underlying error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:46.591871Z ERROR ThreadId(1510) [osmosis-1:transfer/channel-6 -> irishub-1] timed out while confirming TxHashes: count=1; CAC57E3637C51B34CC371A7E385A532E845A692ABA00D30E24D72A130F32C985
2021-12-17T12:13:46.591917Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-12918776), delayed by: 229630.443755254s [try 1/5]
2021-12-17T12:13:46.591936Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] prepending Destination client update @ height 1-12918776
2021-12-17T12:13:46.594990Z  WARN ThreadId(1510) [irishub-1 -> osmosis-1:07-tendermint-5] resolving trusted height from the full list of consensus state heights for target height 1-12918776; this may take a while
2021-12-17T12:13:46.760441Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] assembled batch of 2 message(s)
2021-12-17T12:13:46.778062Z ERROR ThreadId(22) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:46.778370Z ERROR ThreadId(1510) error processing pending events in destination chain: failed with underlying error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:46.980044Z ERROR ThreadId(1510) [osmosis-1:transfer/channel-6 -> irishub-1] timed out while confirming TxHashes: count=1; CAC57E3637C51B34CC371A7E385A532E845A692ABA00D30E24D72A130F32C985
2021-12-17T12:13:46.980118Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-12918776), delayed by: 229630.831955123s [try 1/5]
2021-12-17T12:13:46.980153Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] prepending Destination client update @ height 1-12918776
2021-12-17T12:13:46.983278Z  WARN ThreadId(1510) [irishub-1 -> osmosis-1:07-tendermint-5] resolving trusted height from the full list of consensus state heights for target height 1-12918776; this may take a while
2021-12-17T12:13:47.146116Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] assembled batch of 2 message(s)
2021-12-17T12:13:47.158125Z ERROR ThreadId(22) [osmosis-1] estimate_gas: failed to simulate tx with non-recoverable error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:47.158215Z ERROR ThreadId(1510) error processing pending events in destination chain: failed with underlying error: gRPC call failed with status: status: Unknown, message: "account sequence mismatch, expected 590, got 591: incorrect account sequence", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "2422098"} }
2021-12-17T12:13:47.359842Z ERROR ThreadId(1510) [osmosis-1:transfer/channel-6 -> irishub-1] timed out while confirming TxHashes: count=1; CAC57E3637C51B34CC371A7E385A532E845A692ABA00D30E24D72A130F32C985
2021-12-17T12:13:47.359899Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] relay op. data of 1 msgs(s) to Destination (height 1-12918776), delayed by: 229631.211736785s [try 1/5]
2021-12-17T12:13:47.359920Z  INFO ThreadId(1510) [irishub-1:transfer/channel-3 -> osmosis-1] prepending Destination client update @ height 1-12918776
2021-12-17T12:13:47.364098Z  WARN ThreadId(1510) [irishub-1 -> osmosis-1:07-tendermint-5] resolving trusted height from the full list of consensus state heights for target height 1-12918776; this may take a while

@dualsystems
Copy link
Contributor

dualsystems commented Dec 17, 2021

Why block mode not recomended ?

I don't have a complete answer to this question. We simply noticed it was not reliable, so we switched away from block to sync some time ago (in #986). This comment provides some rationale for that change #977 (comment).

I still think use increment account sequence is not fully correct with broadcast-mode sync.

Indeed. Part of the reason is because Hermes can be quite aggressive with submitting transactions, and as this comment hints cosmos/cosmos-sdk#4186 (comment), it is prone to failure, we just haven't found a more robust option yet.

The node was not restarted. Only Hermes was restarted and resumed operations normally.
process the event as an error and update the self.account before next txn - refresh on error

I think we'll resurrect the refresh-on-error approach and give that a go.

Thanks for the detailed answer. The reasons for the problems with the sequence number are now more clear.
If use more aggressive method for send txn need use more aggessive methods for update information in the account.

@adizere
Copy link
Member

adizere commented Dec 17, 2021

Thanks for the detailed answer. The reasons for the problems with the sequence number are now more clear.

Since you are noticing the sequence number caching problem (according to your logs above), consider running Hermes from this dev branch #1349, and let us know if it resolves the problem. Appreciate any feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Admin: something isn't working E: gravity External: related to Gravity DEX I: logic Internal: related to the relaying logic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants