Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hermes clears packet out of order, even on ordered channels #2670

Closed
5 tasks done
romac opened this issue Sep 23, 2022 · 7 comments · Fixed by #2687
Closed
5 tasks done

Hermes clears packet out of order, even on ordered channels #2670

romac opened this issue Sep 23, 2022 · 7 comments · Fixed by #2687
Assignees
Labels
A: bug Admin: something isn't working I: logic Internal: related to the relaying logic
Milestone

Comments

@romac
Copy link
Member

romac commented Sep 23, 2022

Summary of Bug

The logs below were provided by @sainoe after trying to clear packets stuck on an ordered channel in the Interchain Security testnet. He ran the following command:

$ hermes clear packets --chain consumer --channel channel-0 --port consumer
2022-09-22T13:40:36.408616Z  INFO ThreadId(01) using default configuration from '/root/.hermes/config.toml'
2022-09-22T13:40:36.434076Z  INFO ThreadId(01) PacketRecvCmd{src_chain=consumer src_port=consumer src_channel=channel-0 dst_chain=provider}: 33 unreceived packets found: [ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 ] 

... (see full logs below)

There are 33 unreceived packets with packet sequences: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36.

In the logs, we can see the following:

  • Line 7-9: Hermes builds a SendPacket event for sequence 4
  • Line 10-13: Hermes builds a SendPacket event for sequence 33
  • Line 10-21: Hermes builds a SendPacket events for sequences 34, 35, 36
  • Line 22-24: Hermes builds a SendPacket event for sequence 5
  • Line 25-105: Hermes builds SendPacket events for sequence 6 to 32
  • Line 116: Hermes gets back an error during the simulate tx step of gas estimation:

    failed to execute message; message index: 2: receive packet verification failed: packet sequence ≠ next receive sequence (33 ≠ 5): packet sequence is out of order

Full logs

It appears that Hermes is clearing the packets out of order. Namely, it relays SendPacket events for packets 4 and 5, but then jumps to packet 33 and the latter is thus rejected by the chain because it is out of order, which is invalid on an ordered channel.

Version

Hermes v1.0.0 from my branch romac/block-search-evidence.

Steps to Reproduce

TBD

In the meantime, see the logs linked above.

Acceptance Criteria

Hermes clears packet in order, at the very least on ordered channels but there is no reason not to do it as well on unordered channels.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@romac romac added the A: bug Admin: something isn't working label Sep 23, 2022
@romac romac self-assigned this Sep 23, 2022
@romac romac added the I: logic Internal: related to the relaying logic label Sep 23, 2022
@romac romac added this to the v1.1 milestone Sep 23, 2022
@romac romac changed the title Hermes clears packet out of order which fails on ordered channels Hermes clears packet out of order, even on ordered channels Sep 23, 2022
@romac
Copy link
Member Author

romac commented Sep 29, 2022

So I think figured out the reason the packets were cleared out of order:

In the logs linked above, we see the following on lines 2-5:

2. 2022-09-22T13:40:36.434076Z  INFO ThreadId(01) PacketRecvCmd{src_chain=consumer src_port=consumer src_channel=channel-0 dst_chain=provider}: 33 unreceived packets found: [ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 ] 
3. 2022-09-22T13:40:36.598826Z TRACE ThreadId(01) PacketRecvCmd{src_chain=consumer src_port=consumer src_channel=channel-0 dst_chain=provider}:query_send_packet_events{h=0-170293}: start_block_events [SendPacket(SendPacket { packet: Packet { sequence: Sequence(4), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 105, 100, 97, 116, 111, 114, 34, 58, 123, 34, 97, 100, 100, 114, 101, 115, 115, 34, 58, 34, 107, 120, 107, 68, 85, 119, 72, 97, 85, 109, 122, 72, 106, 99, 56, 88, 83, 107, 101, 110, 84, 52, 70, 65, 69, 112, 69, 61, 34, 44, 34, 112, 111, 119, 101, 114, 34, 58, 34, 49, 34, 125, 44, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 55, 57, 57, 53, 34, 44, 34, 105, 110, 102, 114, 97, 99, 116, 105, 111, 110, 34, 58, 34, 73, 78, 70, 82, 65, 67, 84, 73, 79, 78, 95, 84, 89, 80, 69, 95, 68, 79, 85, 66, 76, 69, 95, 83, 73, 71, 78, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 283 }, time: Time { hour: 19, minute: 56, second: 54, nanosecond: 534408050 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(33), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 105, 100, 97, 116, 111, 114, 34, 58, 123, 34, 97, 100, 100, 114, 101, 115, 115, 34, 58, 34, 107, 120, 107, 68, 85, 119, 72, 97, 85, 109, 122, 72, 106, 99, 56, 88, 83, 107, 101, 110, 84, 52, 70, 65, 69, 112, 69, 61, 34, 44, 34, 112, 111, 119, 101, 114, 34, 58, 34, 49, 34, 125, 44, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 55, 57, 57, 53, 34, 44, 34, 105, 110, 102, 114, 97, 99, 116, 105, 111, 110, 34, 58, 34, 73, 78, 70, 82, 65, 67, 84, 73, 79, 78, 95, 84, 89, 80, 69, 95, 68, 79, 87, 78, 84, 73, 77, 69, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 287 }, time: Time { hour: 16, minute: 9, second: 1, nanosecond: 722859949 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(34), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 105, 100, 97, 116, 111, 114, 34, 58, 123, 34, 97, 100, 100, 114, 101, 115, 115, 34, 58, 34, 108, 122, 106, 88, 79, 120, 68, 117, 77, 53, 49, 51, 120, 76, 117, 47, 102, 67, 112, 43, 102, 71, 56, 78, 97, 49, 81, 61, 34, 44, 34, 112, 111, 119, 101, 114, 34, 58, 34, 49, 48, 34, 125, 44, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 49, 52, 52, 51, 50, 55, 34, 44, 34, 105, 110, 102, 114, 97, 99, 116, 105, 111, 110, 34, 58, 34, 73, 78, 70, 82, 65, 67, 84, 73, 79, 78, 95, 84, 89, 80, 69, 95, 68, 79, 87, 78, 84, 73, 77, 69, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 293 }, time: Time { hour: 12, minute: 38, second: 48, nanosecond: 77914079 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(35), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 105, 100, 97, 116, 111, 114, 34, 58, 123, 34, 97, 100, 100, 114, 101, 115, 115, 34, 58, 34, 117, 43, 53, 87, 85, 79, 49, 83, 105, 88, 70, 75, 74, 56, 118, 80, 97, 71, 80, 121, 48, 53, 81, 122, 71, 81, 52, 61, 34, 44, 34, 112, 111, 119, 101, 114, 34, 58, 34, 49, 54, 34, 125, 44, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 49, 52, 52, 51, 50, 55, 34, 44, 34, 105, 110, 102, 114, 97, 99, 116, 105, 111, 110, 34, 58, 34, 73, 78, 70, 82, 65, 67, 84, 73, 79, 78, 95, 84, 89, 80, 69, 95, 68, 79, 87, 78, 84, 73, 77, 69, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 293 }, time: Time { hour: 12, minute: 43, second: 59, nanosecond: 235887832 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(36), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 105, 100, 97, 116, 111, 114, 34, 58, 123, 34, 97, 100, 100, 114, 101, 115, 115, 34, 58, 34, 69, 78, 54, 82, 54, 52, 89, 107, 53, 73, 88, 88, 56, 51, 105, 111, 119, 67, 47, 65, 112, 107, 67, 86, 114, 57, 73, 61, 34, 44, 34, 112, 111, 119, 101, 114, 34, 58, 34, 52, 34, 125, 44, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 49, 52, 52, 51, 50, 55, 34, 44, 34, 105, 110, 102, 114, 97, 99, 116, 105, 111, 110, 34, 58, 34, 73, 78, 70, 82, 65, 67, 84, 73, 79, 78, 95, 84, 89, 80, 69, 95, 68, 79, 87, 78, 84, 73, 77, 69, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 293 }, time: Time { hour: 12, minute: 43, second: 59, nanosecond: 235887832 } })) } } })]
4. 2022-09-22T13:40:36.599089Z TRACE ThreadId(01) PacketRecvCmd{src_chain=consumer src_port=consumer src_channel=channel-0 dst_chain=provider}:query_send_packet_events{h=0-170293}: tx_events []
5. 2022-09-22T13:40:36.599102Z TRACE ThreadId(01) PacketRecvCmd{src_chain=consumer src_port=consumer src_channel=channel-0 dst_chain=provider}:query_send_packet_events{h=0-170293}: end_block_events [SendPacket(SendPacket { packet: Packet { sequence: Sequence(5), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 50, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 12, minute: 39, second: 43, nanosecond: 657006488 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(6), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 51, 56, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 12, minute: 50, second: 8, nanosecond: 783353271 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(7), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 50, 52, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 12, minute: 54, second: 22, nanosecond: 933134201 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(8), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 49, 48, 51, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 25, second: 28, nanosecond: 990289481 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(9), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 50, 48, 52, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 30, second: 39, nanosecond: 287339772 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(10), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 50, 50, 54, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 31, second: 45, nanosecond: 701601135 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(11), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 50, 55, 55, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 34, second: 28, nanosecond: 565335278 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(12), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 51, 48, 48, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 35, second: 44, nanosecond: 517870761 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(13), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 52, 48, 48, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 40, second: 54, nanosecond: 799962855 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(14), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 52, 48, 56, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 41, second: 20, nanosecond: 180084234 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(15), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 52, 52, 52, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 43, second: 12, nanosecond: 77474556 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(16), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 53, 56, 51, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 50, second: 24, nanosecond: 14030808 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(17), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 54, 49, 54, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 52, second: 5, nanosecond: 384470143 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(18), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 55, 48, 54, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 56, second: 49, nanosecond: 689032809 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(19), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 50, 55, 53, 51, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 14, minute: 59, second: 12, nanosecond: 97616457 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(20), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 48, 54, 52, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 15, second: 29, nanosecond: 424852695 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(21), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 49, 50, 49, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 18, second: 32, nanosecond: 393831664 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(22), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 49, 51, 57, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 19, second: 23, nanosecond: 58159851 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(23), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 50, 52, 56, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 25, second: 13, nanosecond: 814154173 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(24), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 50, 55, 51, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 26, second: 35, nanosecond: 211667523 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(25), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 50, 56, 49, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 27, second: 0, nanosecond: 562352675 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(26), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 50, 57, 49, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 27, second: 30, nanosecond: 972795163 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(27), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 51, 54, 54, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 31, second: 40, nanosecond: 371802144 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(28), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 53, 56, 50, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 43, second: 22, nanosecond: 172124031 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(29), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 51, 56, 48, 54, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 15, minute: 55, second: 50, nanosecond: 89294638 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(30), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 53, 51, 48, 54, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 17, minute: 18, second: 52, nanosecond: 461784643 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(31), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 53, 52, 48, 52, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 17, minute: 24, second: 22, nanosecond: 979847788 } })) } } }), SendPacket(SendPacket { packet: Packet { sequence: Sequence(32), source_port: PortId("consumer"), source_channel: ChannelId("channel-0"), destination_port: PortId("provider"), destination_channel: ChannelId("channel-0"), data: [123, 34, 118, 97, 108, 115, 101, 116, 95, 117, 112, 100, 97, 116, 101, 95, 105, 100, 34, 58, 34, 55, 57, 57, 53, 34, 125], timeout_height: Never, timeout_timestamp: Timestamp { time: Some(Time(PrimitiveDateTime { date: Date { year: 2022, ordinal: 285 }, time: Time { hour: 19, minute: 49, second: 8, nanosecond: 457825333 } })) } } })]

On line 2 we find 33 unreceived packets, listed in order from 4 to 36.

Then we fetch the corresponding events for these sequence numbers with query_send_packet_events but we don't find any events using query_txs

let tx_events: Vec<IbcEvent> = src_chain
.query_txs(QueryTxRequest::Packet(query.clone()))
.map_err(LinkError::relayer)?
.into_iter()
.map(|ev_with_height| ev_with_height.event)
.collect();

so we fallback to query_blocks to extract events emitted at StartBlock and EndBlock.

let (start_block_events, end_block_events) = if !query.sequences.is_empty() {
src_chain
.query_blocks(QueryBlockRequest::Packet(query))
.map_err(LinkError::relayer)?
} else {
Default::default()
};

In StartBlock, we get events for packet sequences 4, 33, 34, 35, 36 and in EndBlock for packet sequences 5 to 32.

The vectors containing the StartBlock and EndBlock events are then appended together, with StartBlock events coming first and the EndBlock events coming last, meaning that the sequence numbers of the RecvPacket events are now out of order, ie. 4, 33, 34, 35, 36, 5, 6, ..., 32.

// events must be ordered in the following fashion -
// start-block events followed by tx-events followed by end-block events
events_result.extend(start_block_events);
events_result.extend(tx_events);
events_result.extend(end_block_events);

It is not clear to me why we are doing this this way, and this comment does not give any rationale for this.

@romac
Copy link
Member Author

romac commented Sep 29, 2022

I have not figured out how to reproduce this on my local setup yet. Nevertheless, even if we had a fix for this, eg. by (partially) ordering the events by packet sequence, there are more issues with packet clearing on ordered channels.

Hold on to your hats for what's coming!

Using our fork of Gaia v6 with support for ordered channels on the transfer port, I have tried sending lots of packets on an ordered channels with tx ft-transfer (without the relayer running) and then tried to clear them using clear packets.

For example, if I send 300 packets on a fresh ordered channel, and then trigger a packet clear, Hermes will correctly see that there are 300 unreceived packets with sequence numbers 1 to 300.

Note: The commands below are ran with max_msg_num = 30 and tx_confirmation = true.

❯ hermes tx ft-transfer --dst-chain ordered-1 --src-chain ordered-0 --src-port transfer --src-channel channel-2 --amount 42 --timeout-height-offset 1000 --number-msgs 300
❯ hermes clear packets --chain ordered-0 --channel channel-2 --port transfer

Hermes will then pull 300 SendPacket events for these sequences and submit the corresponding RecvPacket messages.

2022-09-29T12:18:50.403629Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=ordered-0 src_port=transfer src_channel=channel-2 dst_chain=ordered-1}: 300 unreceived packets found: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300 ]

Issue 1

If sequential_batch_tx is not set, then Hermes will submit a bunch of txs concurrently to the node.

It might work the first time:

2022-09-29T12:18:51.134538Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=ordered-0 src_port=transfer src_channel=channel-2 dst_chain=ordered-1}: pulled packet data for 50 events: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 ]; events_total=300 events_left=250
2022-09-29T12:18:54.728512Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=ordered-0 src_port=transfer src_channel=channel-2 dst_chain=ordered-1}:relay{odata=packet-recv -> destination @ 0-4442; len=50}: assembled batch of 51 message(s)
2022-09-29T12:18:54.737410Z DEBUG ThreadId(29) relay_recv_packet_and_timeout_messages{src_chain=ordered-0 src_port=transfer src_channel=channel-2 dst_chain=ordered-1}:relay{odata=packet-recv -> destination @ 0-4442; len=50}:send_messages_and_wait_commit{chain=ordered-1 tracking_id=packet-recv}: sending 51 messages as 2 batches to chain ordered-1 in parallel
2022-09-29T12:18:57.049838Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=ordered-0 src_port=transfer src_channel=channel-2 dst_chain=ordered-1}:relay{odata=packet-recv -> destination @ 0-4442; len=50}: submitted

Here, Hermes submitted two txs, one for the first 29 events (1-29) and one for events 30-50 (51 events in total, due to the UpdateClient).

But at some point:

2022-09-29T12:18:57.778676Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=ordered-0 src_port=transfer src_channel=channel-2 dst_chain=ordered-1}: pulled packet data for 50 events: [ 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 ]; events_total=300 events_left=200
2022-09-29T12:18:57.946636Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=ordered-0 src_port=transfer src_channel=channel-2 dst_chain=ordered-1}:relay{odata=packet-recv -> destination @ 0-4442; len=50}: assembled batch of 51 message(s)
2022-09-29T12:18:57.980274Z ERROR ThreadId(29) relay_recv_packet_and_timeout_messages{src_chain=ordered-0 src_port=transfer src_channel=channel-2 dst_chain=ordered-1}:relay{odata=packet-recv -> destination @ 0-4442; len=50}:send_messages_and_wait_commit{chain=ordered-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=ordered-1 account.sequence=159}: gas estimation failed or encountered another unrecoverable error error=
   0: gRPC call failed with status: status: Unknown, message: "failed to execute message; message index: 0: receive packet verification failed: packet sequence ≠ next receive sequence (80 ≠ 51): packet sequence is out of order", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "4442"} }

Here, Hermes submitted two txs, one for events 50-79 and one for 80-100 (not counting the UpdateClient), but the chain chose to execute the tx containing events 80-100 before the tx containing events 50-79, which results in the following error:

failed to execute message; message index: 0: receive packet verification failed: packet sequence ≠ next receive sequence (80 ≠ 51): packet sequence is out of order

Solution 1

This can be fixed by setting sequential_batch_tx = true, which we could enable automatically when clearing packets.

I went on to set sequential_batch_tx = true manually in the config for both chain and re-ran clear packets...

Issue 2

But it failed again, albeit at a different stage this time and for another reason.

Hermes managed to clear all SendPacket messages by submitting the corresponding RecvPacket messages, which yielded 300 WriteAck events.

Hermes then checks for any unreceived acks, and find the 300 sequence numbers corresponding to 300 WriteAck events previously yielded.

It then pulls the corresponding WriteAck events using query_send_packet_events, and everything goes well for the first 250 events, pulled in batches of 50.

2022-09-29T12:31:10.548427Z  INFO ThreadId(01) relay_ack_packet_messages{src_chain=ordered-1 src_port=transfer src_channel=channel-2 dst_chain=ordered-0}: pulled packet data for 50 events: [ 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250 ]; events_total=300 events_left=50

...

2022-09-29T12:31:14.287717Z DEBUG ThreadId(11) relay_ack_packet_messages{src_chain=ordered-1 src_port=transfer src_channel=channel-2 dst_chain=ordered-0}:relay{odata=packet-ack -> destination @ 1-4576; len=50}:send_messages_and_wait_commit{chain=ordered-0 tracking_id=packet-ack}: wait_for_block_commits: waiting for commit of tx hashes(s) 297EA94A59F4EF679F80A3083E07E37BECF409F264960232BE9FD019704395B4, 6F4BDAAC2D1FCDEEDF18A85B83EB8A13BFAAA953958B61F0E22409CF302F2595 id=ordered-0
2022-09-29T12:31:19.352566Z  INFO ThreadId(01) relay_ack_packet_messages{src_chain=ordered-1 src_port=transfer src_channel=channel-2 dst_chain=ordered-0}:relay{odata=packet-ack -> destination @ 1-4576; len=50}: [Sync->ordered-0] result events:
	UpdateClient(UpdateClient { common: Attributes { client_id: 07-tendermint-0, client_type: ClientType(07-tendermint), consensus_height: 1-4577 }, header: None }) at height 0-4587
	AcknowledgePacket(AcknowledgePacket { packet: seq:201, path:channel-2/transfer->channel-2/transfer, toh:1-5432, tos:Timestamp(NoTimestamp))}) at height 0-4587
	AcknowledgePacket(AcknowledgePacket { packet: seq:202, path:channel-2/transfer->channel-2/transfer, toh:1-5432, tos:Timestamp(NoTimestamp))}) at height 0-4587
...
	AcknowledgePacket(AcknowledgePacket { packet: seq:249, path:channel-2/transfer->channel-2/transfer, toh:1-5432, tos:Timestamp(NoTimestamp))}) at height 0-4588
	AcknowledgePacket(AcknowledgePacket { packet: seq:250, path:channel-2/transfer->channel-2/transfer, toh:1-5432, tos:Timestamp(NoTimestamp))}) at height 0-4588

2022-09-29T12:31:19.354162Z  INFO ThreadId(01) relay_ack_packet_messages{src_chain=ordered-1 src_port=transfer src_channel=channel-2 dst_chain=ordered-0}:relay{odata=packet-ack -> destination @ 1-4576; len=50}: submitted

But then, when pulling the packet data for the last 50 sequence numbers (250-300):

2022-09-29T12:31:19.383166Z  INFO ThreadId(01) relay_ack_packet_messages{src_chain=ordered-1 src_port=transfer src_channel=channel-2 dst_chain=ordered-0}: pulled packet data for 21 events: [ 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300 ]; events_total=300 events_left=0

Hermes only gets the WriteAck events for 21 sequence numbers instead of 50 ("pulled packet data for 21 events")...

Specifically, it only gets the events for sequence numbers 280 to 300, but not for 250 to 279.

So, when it builds a tx with the acks for sequence 280 to 300:

2022-09-29T12:31:19.456315Z  INFO ThreadId(01) relay_ack_packet_messages{src_chain=ordered-1 src_port=transfer src_channel=channel-2 dst_chain=ordered-0}:relay{odata=packet-ack -> destination @ 1-4576; len=21}: assembled batch of 22 message(s)
ack: 280
ack: 281
ack: 282
ack: 283
ack: 284
ack: 285
ack: 286
ack: 287
ack: 288
ack: 289
ack: 290
ack: 291
ack: 292
ack: 293
ack: 294
ack: 295
ack: 296
ack: 297
ack: 298
ack: 299
ack: 300
2022-09-29T12:31:19.458952Z DEBUG ThreadId(11) relay_ack_packet_messages{src_chain=ordered-1 src_port=transfer src_channel=channel-2 dst_chain=ordered-0}:relay{odata=packet-ack -> destination @ 1-4576; len=21}:send_messages_and_wait_commit{chain=ordered-0 tracking_id=packet-ack}: sending 22 messages as 1 batches to chain ordered-0 in serial
2022-09-29T12:31:19.465753Z ERROR ThreadId(11) relay_ack_packet_messages{src_chain=ordered-1 src_port=transfer src_channel=channel-2 dst_chain=ordered-0}:relay{odata=packet-ack -> destination @ 1-4576; len=21}:send_messages_and_wait_commit{chain=ordered-0 tracking_id=packet-ack}:send_tx_with_account_sequence_retry{chain=ordered-0 account.sequence=241}:estimate_gas: failed to simulate tx. propagating error to caller: gRPC call failed with status: status: Unknown, message: "failed to execute message; message index: 1: acknowledge packet verification failed: packet sequence ≠ next ack sequence (280 ≠ 251): packet sequence is out of order", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "x-cosmos-block-height": "4588"} }

The chain rejects the tx with:

failed to execute message; message index: 1: acknowledge packet verification failed: packet sequence ≠ next ack sequence (280 ≠ 251): packet sequence is out of order

Tentative 2.1

I initially thought that this was happening because some of the events were emitted at BeginEnd or EndBlock, and we don't handle this when pulling WriteAck events, unlike what we do for SendPacket events.

But that was not it, I couldn't find the missing events using query_blocks either...

Tentative 2.2

Hoping that this was instead a transient issue with the node, I modified the code to repeat the query_txs step until we got as many events as expected (see code below), but to no avail. These events just appear to be missing and we never get them via either query_txs or query_blocks.

Click here to show code
fn query_until<R, E>(
    do_query: impl Fn() -> Result<R, E>,
    pred: impl Fn(&R) -> bool,
) -> Result<R, E> {
    loop {
        let result = do_query()?;

        if pred(&result) {
            return Ok(result);
        } else {
            tracing::debug!("did not get enough events, retrying...");
            std::thread::sleep(std::time::Duration::from_millis(500));
        }
    }
}

/// Returns an iterator on batches of packet events.
pub fn query_packet_events_with<'a, ChainA>(
    sequences: &'a [Sequence],
    query_height: Height,
    src_chain: &'a ChainA,
    path: &'a PathIdentifiers,
    query_fn: impl Fn(&ChainA, &PathIdentifiers, &[Sequence], Height) -> Result<Vec<IbcEvent>, LinkError>
        + 'a,
) -> impl Iterator<Item = Vec<IbcEventWithHeight>> + 'a
where
    ChainA: ChainHandle,
{
    let events_total = sequences.len();
    let mut events_left = events_total;

    sequences
        .chunks(QUERY_RESULT_LIMIT)
        .map_while(move |chunk| {
            let events = query_until(
                || query_fn(src_chain, path, chunk, query_height),
                |events| events.len() == chunk.len(),
            );

Preliminary conclusion

It seems to me that this is a chain bug, for reasons totally unclear at the moment. But I am not ruling out the possibility that Hermes is doing something wrong when clearing SendPacket events.

The search continues...

@romac
Copy link
Member Author

romac commented Sep 29, 2022

@plafer
Copy link
Contributor

plafer commented Sep 29, 2022

It seems to me that this is a chain bug, for reasons totally unclear at the moment.

Not sure if this ibc-go bug I reported a while back has to do with any of this?

@ancazamfir
Copy link
Collaborator

The vectors containing the StartBlock and EndBlock events are then appended together, with StartBlock events coming first and the EndBlock events coming last, meaning that the sequence numbers of the RecvPacket events are now out of order, ie. 4, 33, 34, 35, 36, 5, 6, ..., 32.

hermes/crates/relayer/src/link/packet_events.rs

Lines 106 to 110 in 1f15e6b

// events must be ordered in the following fashion -
// start-block events followed by tx-events followed by end-block events
events_result.extend(start_block_events);
events_result.extend(tx_events);
events_result.extend(end_block_events);
It is not clear to me why we are doing this this way, and this comment does not give any rationale for this.

Ah, there is a bug here in the code. Within a given block the ordering above is correct. However I believe in this case we end up with events from multiple blocks that contain packets with those sequences. So let's say those packets were included in blocks b1 and b2. We return the events in this order:
b1.start_block_events + b2.start_block_events + b1.tx_events + b2.tx_events + b1.end_block_events + b2.end_block_events.

Instead it should be:
b1.start_block_events + b1.tx_events + b1.end_block_events + b2.start_block_events + b2.tx_events + b2.end_block_events.

For unordered channels the issue is not visible but we do probably some unfair processing (we may relay packets in a different order than they were sent).
For ordered channels we see the problem described in this issue. The quick fix for ordered channels is to reorder the events based on the sequence numbers.
Long term we should review the existing code and try to find a proper fix.

Regarding Issue 1 and 2 you describe, I believe the root cause is the same as the one described in #2249. This fixed the simulation failure for account sequence mismatch. First thought is to identify this error and use default gas here as well. But it will be a hack as is the case in #2298 (the fix for #2249).

The root cause here is that the simulate_tx gRPC is allowed to be called after a block is created and before tendermint/ mempool finishes recheck tx (details in #2249). It should rather block until recheck tx is finished. There was an SDK issue for this, will try to find it.

@ancazamfir
Copy link
Collaborator

ancazamfir commented Oct 3, 2022

For Issue 2 with the acks, there is a bug here:

// Call to /block_results doesn't get SendPacket events, so an additional tx_search is required.
// Without this check, WriteAcknowledgment will be sent twice.
if result.is_empty() {

It should be something like

if result.len() < request.sequences.len() {
    for seq in &request.sequences {
        // only do the tx_search below if seq is not in result
       ... 
       // query first (and only) Tx that includes the event specified in the query request
       let response = rpc_client
           tx_search(

and then sort the events in result by sequence.

I am also not sure anymore what we are trying to achieve with querying for events in the block at query height. We optimize a bit at query height but not for events that happened at previous heights. I tried to remove these lines:

let tm_height = match request.height {
QueryHeight::Latest => tendermint::block::Height::default(),
QueryHeight::Specific(h) => {
tendermint::block::Height::try_from(h.revision_height()).unwrap()
}
};
let height = Height::new(chain_id.version(), u64::from(tm_height))
.map_err(|_| Error::invalid_height_no_source())?;
let exact_block_results = rpc_client
.block_results(tm_height)
.await
.map_err(|e| Error::rpc(rpc_address.clone(), e))?
.txs_results;
if let Some(txs) = exact_block_results {
for tx in txs.iter() {
let tx_copy = tx.clone();
result.append(
&mut tx_copy
.events
.into_iter()
.filter_map(|e| filter_matching_event(e, &request, &request.sequences))
.map(|e| IbcEventWithHeight::new(e, height))
.collect(),
)
}
}
// Call to /block_results doesn't get SendPacket events, so an additional tx_search is required.
// Without this check, WriteAcknowledgment will be sent twice.
if result.is_empty() {
and all looks good.

This code that gets the packet events (send and ack) has been reworked quite a bit and maybe we missed something or it was always wrong. I think query_send_packet_events() and query_write_ack_events() should look the same (they can actually be merged). query_txs() should not call block_results() as this is called from query_blocks().

Maybe we should clean this up and look at optimizations later.

@ancazamfir
Copy link
Collaborator

Looks like #2512 introduced "Issue 2". So when fixing it we need to also look at the new CLI option introduced there and make sure it still works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Admin: something isn't working I: logic Internal: related to the relaying logic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants