Skip to content

discovery: use block ranges from ReplyChannelRange to determine end marker #3836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 8, 2020
Merged

discovery: use block ranges from ReplyChannelRange to determine end marker #3836

merged 6 commits into from
Jan 8, 2020

Conversation

wpaulino
Copy link
Contributor

In this PR, we make a series of changes to properly adhere to the specification with the QueryChannelRange and ReplyChannelRange messages. Previously, we'd look at the Complete field of a ReplyChannelRange message to determine if we had received all replies to a query. This isn't the use of the field however, and instead we should be inspecting the block range of each ReplyChannelRange message to determine this.

One of the main takeaways that motivates most of the changes outlined in this PR is the following:

  - MUST respond with one or more `reply_channel_range` whose combined range
	cover the requested `first_blocknum` to `first_blocknum` plus
	`number_of_blocks` minus one.

Due to the large number of nodes out there that rely on the previous mechanism for graph sync, we still maintain it for backwards compatibility. In the future, once most nodes have upgraded, we can consider removing this.

I did a series of graph sync tests and I was able to successfully sync against other lnd nodes with and without this change, and c-lightning nodes. Syncs against eclair failed as they don't adhere to the point above, their responses don't cover the complete requested range.

This is a follow-up of #3785 and fixes #3728.

@wpaulino wpaulino added bug fix gossip interop interop with other implementations v0.9.0 labels Dec 14, 2019
@wpaulino wpaulino added this to the 0.9.0 milestone Dec 14, 2019
@halseth halseth removed their request for review December 16, 2019 09:20
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solid changes, just some minor comments

@wpaulino wpaulino requested a review from cfromknecht December 18, 2019 19:56
@halseth halseth self-requested a review December 19, 2019 20:06
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤘

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, easy to follow changes. LGTM 💯

@Roasbeef
Copy link
Member

Syncs against eclair failed as they don't adhere to the point above, their responses don't cover the complete requested range.

Shouldn't the logic to support legacy lnd nodes also cover the quirks in eclair's implementation?

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix, will be nice to finally (lol) have proper cross-implementation compatibility for the gossip queries feature. Main comments concern the alleged lack of compatibility with eclair, and if we're properly handling manipulation+interpretation of the NumBlocks field for legacy nodes.

func isLegacyReplyChannelRange(query *lnwire.QueryChannelRange,
reply *lnwire.ReplyChannelRange) bool {

return reply.QueryChannelRange == *query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases where the range requested fits into a single response, won't this trigger a false positive? I think this only applies to our new message pattern as we'll no longer use max uint32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but we don't know if our request will span multiple replies at this point.

Copy link
Member

@Roasbeef Roasbeef Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if the change from max uint32 was reverted, then this should properly fall into the legacy replay bucket.

// The current reply can either start from the previous'
// reply's last block, if there are still more channels
// for the same block, or the block after.
if msg.FirstBlockHeight != prevReplyLastHeight &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are both variatns allowed? Feels like the spec should be stricter in its requirements here (either overlapping ranges are allowed or they aren't). Or is this an artefact from resolving the issues with the implicit inclusivity bounds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen if there are too many short channel IDs for a given range that exceed the maximum we're allowed to send in a single message.

FilterChannelRange takes an inclusive range, so it was possible for us
to return channels for an additional block that was not requested.
In order to properly adhere to the spec, when handling a
QueryChannelRange message, we must reply with a series of
ReplyChannelRange messages, that when consumed together cover the
entirety of the block range requested.
It's not possible to send another reply once all replies have been sent
without another request. The purpose of the check is also done within
another test, TestGossipSyncerReplyChanRangeQueryNoNewChans, so it can
be removed from here.
We move from our legacy way of interpreting ReplyChannelRange messages
which was incorrect. Previously, we'd rely on the Complete field of the
ReplyChannelRange message to determine when our peer had sent all of
their replies. Now, we properly adhere to the specification by
interpreting the block ranges of these messages as intended.

Due to the large number of nodes deployed with the previous method, we
still maintain and detect when we are communicating with them, such that
we are still able to sync with them for backwards compatibility.
The message now shows the block range the reply spans, which is a lot
more useful.
@wpaulino
Copy link
Contributor Author

wpaulino commented Jan 6, 2020

Syncs against eclair failed as they don't adhere to the point above, their responses don't cover the complete requested range.

Shouldn't the logic to support legacy lnd nodes also cover the quirks in eclair's implementation?

The legacy way is based on the same query channel range message always being sent, whereas eclair implements it correctly by replying with a query channel range message that corresponds to the short channel IDs sent, but their replies don't cover the complete requested range.

@wpaulino wpaulino requested a review from Roasbeef January 6, 2020 22:50
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥂

@Roasbeef Roasbeef merged commit b8f6a55 into lightningnetwork:master Jan 8, 2020
@wpaulino wpaulino deleted the interpret-query-channel-range branch January 8, 2020 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix gossip interop interop with other implementations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LND mis-implements gossip_queries
4 participants