discovery: use block ranges from ReplyChannelRange to determine end marker#3836
Conversation
cfromknecht
left a comment
There was a problem hiding this comment.
solid changes, just some minor comments
halseth
left a comment
There was a problem hiding this comment.
Great PR, easy to follow changes. LGTM 💯
Shouldn't the logic to support legacy |
Roasbeef
left a comment
There was a problem hiding this comment.
Solid fix, will be nice to finally (lol) have proper cross-implementation compatibility for the gossip queries feature. Main comments concern the alleged lack of compatibility with eclair, and if we're properly handling manipulation+interpretation of the NumBlocks field for legacy nodes.
There was a problem hiding this comment.
For cases where the range requested fits into a single response, won't this trigger a false positive? I think this only applies to our new message pattern as we'll no longer use max uint32.
There was a problem hiding this comment.
Good point, but we don't know if our request will span multiple replies at this point.
There was a problem hiding this comment.
However, if the change from max uint32 was reverted, then this should properly fall into the legacy replay bucket.
There was a problem hiding this comment.
Why are both variatns allowed? Feels like the spec should be stricter in its requirements here (either overlapping ranges are allowed or they aren't). Or is this an artefact from resolving the issues with the implicit inclusivity bounds?
There was a problem hiding this comment.
This can happen if there are too many short channel IDs for a given range that exceed the maximum we're allowed to send in a single message.
FilterChannelRange takes an inclusive range, so it was possible for us to return channels for an additional block that was not requested.
In order to properly adhere to the spec, when handling a QueryChannelRange message, we must reply with a series of ReplyChannelRange messages, that when consumed together cover the entirety of the block range requested.
It's not possible to send another reply once all replies have been sent without another request. The purpose of the check is also done within another test, TestGossipSyncerReplyChanRangeQueryNoNewChans, so it can be removed from here.
We move from our legacy way of interpreting ReplyChannelRange messages which was incorrect. Previously, we'd rely on the Complete field of the ReplyChannelRange message to determine when our peer had sent all of their replies. Now, we properly adhere to the specification by interpreting the block ranges of these messages as intended. Due to the large number of nodes deployed with the previous method, we still maintain and detect when we are communicating with them, such that we are still able to sync with them for backwards compatibility.
The message now shows the block range the reply spans, which is a lot more useful.
The legacy way is based on the same query channel range message always being sent, whereas eclair implements it correctly by replying with a query channel range message that corresponds to the short channel IDs sent, but their replies don't cover the complete requested range. |
In this PR, we make a series of changes to properly adhere to the specification with the
QueryChannelRangeandReplyChannelRangemessages. Previously, we'd look at theCompletefield of aReplyChannelRangemessage to determine if we had received all replies to a query. This isn't the use of the field however, and instead we should be inspecting the block range of eachReplyChannelRangemessage to determine this.One of the main takeaways that motivates most of the changes outlined in this PR is the following:
Due to the large number of nodes out there that rely on the previous mechanism for graph sync, we still maintain it for backwards compatibility. In the future, once most nodes have upgraded, we can consider removing this.
I did a series of graph sync tests and I was able to successfully sync against other
lndnodes with and without this change, andc-lightningnodes. Syncs againsteclairfailed as they don't adhere to the point above, their responses don't cover the complete requested range.This is a follow-up of #3785 and fixes #3728.