Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discovery+rpc: expose graph synced status within GetInfo #3355

Merged
merged 3 commits into from Aug 13, 2019

Conversation

wpaulino
Copy link
Collaborator

@wpaulino wpaulino commented Jul 29, 2019

Fixes #3289.

One thing to note is that the current implementation will lead to GetNetworkInfo blocking if it's called while lnd is waiting for the backend to be considered synced. This is because we don't initialize lnd until that's done first. This can be addressed, but I've decided to leave it out unless we think it's something definitely worth addressing.

@wpaulino wpaulino requested a review from halseth as a code owner July 29, 2019 22:06
@wpaulino wpaulino added this to the 0.8.0 milestone Jul 29, 2019
@Roasbeef Roasbeef added discovery Peer and route discovery / whisper protocol related issues/PRs enhancement Improvements to existing features / behaviour v0.8 labels Jul 29, 2019
@Roasbeef
Copy link
Member

One thing to note is that the current implementation will lead to GetNetworkInfo blocking if it's called while lnd is waiting for the backend to be considered synced.

This'll have an affect on tools such as lndmon, as they won't be able to plot the increase in chans known over time. Still need to dive into the diff, but can't we just return false in the scenario that would cause us to block?

discovery/sync_manager.go Outdated Show resolved Hide resolved
rpcserver.go Outdated
@@ -4029,6 +4029,7 @@ func (r *rpcServer) GetNetworkInfo(ctx context.Context,
MaxChannelSize: int64(maxChannelSize),
MedianChannelSizeSat: int64(medianChanSize),
NumZombieChans: numZombies,
SyncedToGraph: r.server.authGossiper.SyncManager().IsGraphSynced(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make the return value channel, then we don't need to block here and can fall through to false in that case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with a different approach instead.

@cfromknecht
Copy link
Contributor

for some reason i was under the impression we wanted to expose a percentage rather than just a boolean, was that not the case? other the code changes look pretty good

@wpaulino
Copy link
Collaborator Author

@cfromknecht I went for the simplest approach possible as exposing a percentage seemed a bit more involved, though I'm not opposed to doing that.

Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👨‍🚀 We can defer on exposing a percentage-based indicator if it proves useful

@sergioabril
Copy link

@cfromknecht IMO, a percentage would be much more useful. The boolean is ok, but feels incomplete to me.

@wpaulino
Copy link
Collaborator Author

wpaulino commented Aug 1, 2019

@sergioabril could you provide more insight for your use case of the percentage? Would it just be to expose the percentage to users in a higher level application? With the boolean, it should quickly flip to true unless it's your first time syncing the graph or you've been offline for a while.

@sergioabril
Copy link

sergioabril commented Aug 1, 2019 via email

@cfromknecht
Copy link
Contributor

@wpaulino

With the boolean, it should quickly flip to true unless it's your first time syncing the graph or you've been offline for a while.

If we connect to a peer that doesn't respond, will this continue to show false indefinitely?

@wpaulino
Copy link
Collaborator Author

wpaulino commented Aug 1, 2019

If we connect to a peer that doesn't respond, will this continue to show false indefinitely?

@cfromknecht wasn't the case previously, but should be fixed in the latest version!

@cfromknecht
Copy link
Contributor

@cfromknecht wasn't the case previously, but should be fixed in the latest version!

just looked at the latest diff, how is this solved? if we send a query_channel_range and the initial peer we connect to is slow to reply, then is_synced will continue to show false. in the case of the app, that means a peer not responding can prevent you from making payments

@wpaulino
Copy link
Collaborator Author

wpaulino commented Aug 2, 2019

See fcb6a8b. We’ll pick up on the status of the next historical sync if our initial one hasn’t completed by then. Did you have another solution in mind?

@cfromknecht
Copy link
Contributor

See fcb6a8b. We’ll pick up on the status of the next historical sync if our initial one hasn’t completed by then.

i don't think we're talking about the same issue. the is_synced field can only become true if a peer responds to our query_channel_range message. if none of our peers respond, then it will always be false. if we gate payments in the app based on this field, then you will never be able to make a payment if you connect to poor/malicious peers.

Did you have another solution in mind?

one option is to add a deadline after which it will flip to true regardless, that way if a peer doesn't respond we're still covered.

in my mind, there's really three states:

  1. UNKNOWN
  2. SYNCING
  3. SYNCED

when we come up, we start at UNKNOWN. after receiving any reply from our historical syncer[s] we then transition to:

  • SYNCING, if they have unknown channels, transitioning to SYNCED once a syncer completes
  • SYNCED, if they don't

my proposal is to add a timeout where if we stay in UNKNOWN for too long, we'll transition to SYNCED anyway and is_synced becomes true.

alternatively, these enums could be exposed over rpc to allow higher-level applications to act on. for example, if an app sees that we're still in UNKNOWN after some timeout, it can make the judgement call as to whether to allow payments or not

@wpaulino
Copy link
Collaborator Author

wpaulino commented Aug 2, 2019

if we gate payments in the app based on this field, then you will never be able to make a payment if you connect to poor/malicious peers.

Why not just alert users rather than preventing them from completely making payments? Higher level applications can choose to interpret this field as they wish. They can decide to only gate payments up to some timeout as you suggested, gate payments regardless of how long it takes, alert users, or something else entirely.

when we come up, we start at UNKNOWN. after receiving any reply from our historical syncer[s] we then transition to:

  • SYNCING, if they have unknown channels, transitioning to SYNCED once a syncer completes
  • SYNCED, if they don't

my proposal is to add a timeout where if we stay in UNKNOWN for too long, we'll transition to > SYNCED anyway and is_synced becomes true.

Transitioning to true even though we haven't synced the graph seems to defeat the purpose of the flag, which is just to give an indication that we have done our part to sync the graph with at least one peer fully and will continue to do so through the consequent historical syncs. Just because they replied once doesn't mean we can guarantee they'll continue to do so, so we can still be in the SYNCING state. If they don't reply, do we flip back to UNKNOWN? Do we try another peer after a timeout? The additional complexity doesn't seem worth it.

alternatively, these enums could be exposed over rpc to allow higher-level applications to act on. for example, if an app sees that we're still in UNKNOWN after some timeout, it can make the judgement call as to whether to allow payments or not

They can already do this with how things are at the moment, i.e. synced_to_chain is false after some timeout, though I guess they wouldn't be able to differentiate between being stalled and actually syncing.

continue
}

// Otherwise, we'll track the peer we've performed a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. So this is really an existing bug, but it wouldn't really matter that the initialHistoricalSyncCompleted was never set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It would matter as it could prevent us from transitioning to active syncers.

@wpaulino wpaulino requested a review from halseth August 6, 2019 21:05
discovery/sync_manager.go Show resolved Hide resolved
lnrpc/rpc.proto Outdated Show resolved Hide resolved
…l sync

This ensures that the graph synced status is marked true at some point
once a historical sync has completed. Before this commit, a stalled
historical sync could cause us to never mark the graph as synced.
@Roasbeef
Copy link
Member

Roasbeef commented Aug 7, 2019

On the question of if we need to care about a peer stalling: that seems to be more of an issue for our peer selection, and eclipse attack mitigations, no? If a peer isn't responding, then we can move to time them out, and rotate to another peer.

Our initial use case for this is also pretty simple, in just greying things out, or displaying this as an active indicator. Also practically, in the case of the application, they're connected to mostly known "golden" peers as well.

@wpaulino wpaulino changed the title discovery+rpc: expose graph synced status within GetNetworkInfo discovery+rpc: expose graph synced status within GetInfo Aug 7, 2019
@halseth halseth removed their request for review August 7, 2019 09:11
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTB 💰

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🌋

@Roasbeef Roasbeef merged commit 4e62e8a into lightningnetwork:master Aug 13, 2019
@wpaulino wpaulino deleted the is-graph-synced branch August 13, 2019 01:26
@githorray
Copy link
Contributor

I get "synced_to_graph": true via the getinfo rpc call, but not via lncli getinfo.

@wpaulino
Copy link
Collaborator Author

Thanks @githorray, fixed with #3400.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discovery Peer and route discovery / whisper protocol related issues/PRs enhancement Improvements to existing features / behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

discovery+rpc: expose rough graph sync progress externally
6 participants