-
Notifications
You must be signed in to change notification settings - Fork 2.2k
discovery: use block ranges from ReplyChannelRange to determine end marker #3836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discovery: use block ranges from ReplyChannelRange to determine end marker #3836
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solid changes, just some minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🤘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR, easy to follow changes. LGTM 💯
Shouldn't the logic to support legacy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid fix, will be nice to finally (lol) have proper cross-implementation compatibility for the gossip queries feature. Main comments concern the alleged lack of compatibility with eclair, and if we're properly handling manipulation+interpretation of the NumBlocks
field for legacy nodes.
func isLegacyReplyChannelRange(query *lnwire.QueryChannelRange, | ||
reply *lnwire.ReplyChannelRange) bool { | ||
|
||
return reply.QueryChannelRange == *query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cases where the range requested fits into a single response, won't this trigger a false positive? I think this only applies to our new message pattern as we'll no longer use max uint32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but we don't know if our request will span multiple replies at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if the change from max uint32 was reverted, then this should properly fall into the legacy replay bucket.
// The current reply can either start from the previous' | ||
// reply's last block, if there are still more channels | ||
// for the same block, or the block after. | ||
if msg.FirstBlockHeight != prevReplyLastHeight && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are both variatns allowed? Feels like the spec should be stricter in its requirements here (either overlapping ranges are allowed or they aren't). Or is this an artefact from resolving the issues with the implicit inclusivity bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen if there are too many short channel IDs for a given range that exceed the maximum we're allowed to send in a single message.
FilterChannelRange takes an inclusive range, so it was possible for us to return channels for an additional block that was not requested.
In order to properly adhere to the spec, when handling a QueryChannelRange message, we must reply with a series of ReplyChannelRange messages, that when consumed together cover the entirety of the block range requested.
It's not possible to send another reply once all replies have been sent without another request. The purpose of the check is also done within another test, TestGossipSyncerReplyChanRangeQueryNoNewChans, so it can be removed from here.
We move from our legacy way of interpreting ReplyChannelRange messages which was incorrect. Previously, we'd rely on the Complete field of the ReplyChannelRange message to determine when our peer had sent all of their replies. Now, we properly adhere to the specification by interpreting the block ranges of these messages as intended. Due to the large number of nodes deployed with the previous method, we still maintain and detect when we are communicating with them, such that we are still able to sync with them for backwards compatibility.
The message now shows the block range the reply spans, which is a lot more useful.
The legacy way is based on the same query channel range message always being sent, whereas eclair implements it correctly by replying with a query channel range message that corresponds to the short channel IDs sent, but their replies don't cover the complete requested range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🥂
In this PR, we make a series of changes to properly adhere to the specification with the
QueryChannelRange
andReplyChannelRange
messages. Previously, we'd look at theComplete
field of aReplyChannelRange
message to determine if we had received all replies to a query. This isn't the use of the field however, and instead we should be inspecting the block range of eachReplyChannelRange
message to determine this.One of the main takeaways that motivates most of the changes outlined in this PR is the following:
Due to the large number of nodes out there that rely on the previous mechanism for graph sync, we still maintain it for backwards compatibility. In the future, once most nodes have upgraded, we can consider removing this.
I did a series of graph sync tests and I was able to successfully sync against other
lnd
nodes with and without this change, andc-lightning
nodes. Syncs againsteclair
failed as they don't adhere to the point above, their responses don't cover the complete requested range.This is a follow-up of #3785 and fixes #3728.