Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discovery: use block ranges from ReplyChannelRange to determine end marker #3836

Merged

Conversation

wpaulino
Copy link
Collaborator

@wpaulino wpaulino commented Dec 14, 2019

In this PR, we make a series of changes to properly adhere to the specification with the QueryChannelRange and ReplyChannelRange messages. Previously, we'd look at the Complete field of a ReplyChannelRange message to determine if we had received all replies to a query. This isn't the use of the field however, and instead we should be inspecting the block range of each ReplyChannelRange message to determine this.

One of the main takeaways that motivates most of the changes outlined in this PR is the following:

  - MUST respond with one or more `reply_channel_range` whose combined range
	cover the requested `first_blocknum` to `first_blocknum` plus
	`number_of_blocks` minus one.

Due to the large number of nodes out there that rely on the previous mechanism for graph sync, we still maintain it for backwards compatibility. In the future, once most nodes have upgraded, we can consider removing this.

I did a series of graph sync tests and I was able to successfully sync against other lnd nodes with and without this change, and c-lightning nodes. Syncs against eclair failed as they don't adhere to the point above, their responses don't cover the complete requested range.

This is a follow-up of #3785 and fixes #3728.

@wpaulino wpaulino force-pushed the interpret-query-channel-range branch from faadeba to 341cb1b Compare Dec 14, 2019
@wpaulino wpaulino added bug fix gossip interop v0.9.0 labels Dec 14, 2019
@wpaulino wpaulino added this to WIP in v0.9.0-beta via automation Dec 14, 2019
@wpaulino wpaulino added this to the 0.9.0 milestone Dec 14, 2019
@wpaulino wpaulino moved this from WIP to Needs Review in v0.9.0-beta Dec 14, 2019
@halseth halseth removed their request for review Dec 16, 2019
Copy link
Collaborator

@cfromknecht cfromknecht left a comment

solid changes, just some minor comments

discovery/syncer.go Outdated Show resolved Hide resolved
peer.go Outdated Show resolved Hide resolved
@wpaulino wpaulino force-pushed the interpret-query-channel-range branch from 341cb1b to 2e9e7a7 Compare Dec 18, 2019
@wpaulino wpaulino requested a review from cfromknecht Dec 18, 2019
@halseth halseth self-requested a review Dec 19, 2019
Copy link
Collaborator

@cfromknecht cfromknecht left a comment

LGTM 🤘

Copy link
Collaborator

@halseth halseth left a comment

Great PR, easy to follow changes. LGTM 💯

lnwire/query_channel_range.go Show resolved Hide resolved
v0.9.0-beta automation moved this from Needs Review to Approved Dec 21, 2019
@Roasbeef
Copy link
Member

@Roasbeef Roasbeef commented Dec 23, 2019

Syncs against eclair failed as they don't adhere to the point above, their responses don't cover the complete requested range.

Shouldn't the logic to support legacy lnd nodes also cover the quirks in eclair's implementation?

Copy link
Member

@Roasbeef Roasbeef left a comment

Solid fix, will be nice to finally (lol) have proper cross-implementation compatibility for the gossip queries feature. Main comments concern the alleged lack of compatibility with eclair, and if we're properly handling manipulation+interpretation of the NumBlocks field for legacy nodes.

discovery/syncer.go Show resolved Hide resolved
discovery/syncer_test.go Outdated Show resolved Hide resolved
discovery/syncer_test.go Show resolved Hide resolved
discovery/syncer.go Outdated Show resolved Hide resolved
func isLegacyReplyChannelRange(query *lnwire.QueryChannelRange,
reply *lnwire.ReplyChannelRange) bool {

return reply.QueryChannelRange == *query
Copy link
Member

@Roasbeef Roasbeef Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases where the range requested fits into a single response, won't this trigger a false positive? I think this only applies to our new message pattern as we'll no longer use max uint32.

Copy link
Collaborator Author

@wpaulino wpaulino Jan 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but we don't know if our request will span multiple replies at this point.

Copy link
Member

@Roasbeef Roasbeef Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if the change from max uint32 was reverted, then this should properly fall into the legacy replay bucket.

discovery/syncer.go Outdated Show resolved Hide resolved
discovery/syncer.go Outdated Show resolved Hide resolved
// The current reply can either start from the previous'
// reply's last block, if there are still more channels
// for the same block, or the block after.
if msg.FirstBlockHeight != prevReplyLastHeight &&
Copy link
Member

@Roasbeef Roasbeef Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are both variatns allowed? Feels like the spec should be stricter in its requirements here (either overlapping ranges are allowed or they aren't). Or is this an artefact from resolving the issues with the implicit inclusivity bounds?

Copy link
Collaborator Author

@wpaulino wpaulino Jan 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen if there are too many short channel IDs for a given range that exceed the maximum we're allowed to send in a single message.

peer.go Show resolved Hide resolved
lnwire/query_channel_range.go Show resolved Hide resolved
v0.9.0-beta automation moved this from Approved to Needs Review Dec 23, 2019
@wpaulino wpaulino force-pushed the interpret-query-channel-range branch from 2e9e7a7 to f397ec7 Compare Jan 6, 2020
wpaulino added 6 commits Jan 6, 2020
FilterChannelRange takes an inclusive range, so it was possible for us
to return channels for an additional block that was not requested.
In order to properly adhere to the spec, when handling a
QueryChannelRange message, we must reply with a series of
ReplyChannelRange messages, that when consumed together cover the
entirety of the block range requested.
It's not possible to send another reply once all replies have been sent
without another request. The purpose of the check is also done within
another test, TestGossipSyncerReplyChanRangeQueryNoNewChans, so it can
be removed from here.
We move from our legacy way of interpreting ReplyChannelRange messages
which was incorrect. Previously, we'd rely on the Complete field of the
ReplyChannelRange message to determine when our peer had sent all of
their replies. Now, we properly adhere to the specification by
interpreting the block ranges of these messages as intended.

Due to the large number of nodes deployed with the previous method, we
still maintain and detect when we are communicating with them, such that
we are still able to sync with them for backwards compatibility.
The message now shows the block range the reply spans, which is a lot
more useful.
@wpaulino wpaulino force-pushed the interpret-query-channel-range branch from f397ec7 to 9a7f66f Compare Jan 6, 2020
@wpaulino
Copy link
Collaborator Author

@wpaulino wpaulino commented Jan 6, 2020

Syncs against eclair failed as they don't adhere to the point above, their responses don't cover the complete requested range.

Shouldn't the logic to support legacy lnd nodes also cover the quirks in eclair's implementation?

The legacy way is based on the same query channel range message always being sent, whereas eclair implements it correctly by replying with a query channel range message that corresponds to the short channel IDs sent, but their replies don't cover the complete requested range.

@wpaulino wpaulino requested a review from Roasbeef Jan 6, 2020
Copy link
Member

@Roasbeef Roasbeef left a comment

LGTM 🥂

v0.9.0-beta automation moved this from Needs Review to Approved Jan 8, 2020
@Roasbeef Roasbeef merged commit b8f6a55 into lightningnetwork:master Jan 8, 2020
1 of 2 checks passed
v0.9.0-beta automation moved this from Approved to Done Jan 8, 2020
@wpaulino wpaulino deleted the interpret-query-channel-range branch Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix gossip interop v0.9.0
Projects
No open projects
v0.9.0-beta
  
Done
Development

Successfully merging this pull request may close these issues.

4 participants