Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discovery: introduce gossiper syncManager subsystem #2740

Merged
merged 13 commits into from Apr 3, 2019

Conversation

Projects
None yet
4 participants
@wpaulino
Copy link
Collaborator

wpaulino commented Mar 7, 2019

Gossip Sync Manager

In this PR, we introduce a new subsystem for the gossiper: the SyncManager. This subsystem is a major overhaul on the way the daemon performs the graph query sync state machine with peers.

Along with this subsystem, we also introduce the concept of an active syncer. An active syncer is simply a GossipSyncer currently operating under an ActiveSync sync type. Before this commit, all GossipSyncer's would act as active syncers, which means that we were receiving new graph updates from all of them. This isn't necessary, as it greatly increases bandwidth usage as the network grows. The SyncManager changes this by requiring a specific number of active syncers. Once we reach this specified number, any future peers will have a GossipSyncer with a PassiveSync sync type.

Responsibilities

It is responsible for three main things:

  1. Choosing different peers randomly to receive graph updates from to ensure we don't only receive them from the same set of peers.
  2. Choosing different peers to force a historical sync with to ensure we have as much of the public network as possible.
  3. Managing an in-order queue of active syncers where the next cannot be started until the current one has completed its state machine to ensure they don't overlap and request the same set of channels, which significantly reduces bandwidth usage and addresses a number of issues.

Gossip Syncer Sync Transitions

In order for it to achieve its first two responsibilities, GossipSyncer sync transitions were necessary.
In this PR, we also introduce the ability for them to do so. This allows us to be more flexible with our query graph sync logic, e.g., we can now request a GossipSyncer to force a historical sync of the graph with the remote peer to ensure we can converge on the same graph with them.

We introduce two different sync types:

  1. ActiveSync: gossip syncers operating under this sync type should exercise their default behavior. This includes converging on the set of missing graph updates with the remote peer and receiving new updates from them.

  2. PassiveSync: gossip syncers operating under this sync type should not attempt to converge on the set of missing graph updates with the remote peer or receive new updates from them, but it should respond to queries from the remote peer.

It's possible to transition from any of these types to any other. Certain transitions require some additional wire messages to be sent, like in the case of an ActiveSync GossipSyncer transitioning to a PassiveSync type.

Gossip Syncer Synced Signal

In order for the SyncManager to achieve its third responsibility, we introduce another feature to the GossipSyncer in which it can deliver a signal to an external caller once it reaches its terminal chansSynced state. This is necessary for our new round-robin sync mechanism, as we need to wait for one active syncer to finish syncing before moving on to the next.

CLI Changes

Since the SyncManager now relies on being told how many active syncers it should strive to maintain, a new CLI flag has been added to allow users to manually tune this to their preference: --numgraphsyncpeers. This flag replaces the current --nochanupdates and can be used to specify the number of peers the daemon should receive new graph updates from.

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from 25ecc29 to a51a80a Mar 7, 2019

@wpaulino wpaulino requested review from halseth and Roasbeef Mar 7, 2019

@wpaulino wpaulino added this to the 0.6 milestone Mar 7, 2019

Show resolved Hide resolved discovery/gossiper.go Outdated
Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/sync_manager.go Outdated
Show resolved Hide resolved config.go
Show resolved Hide resolved discovery/reliable_sender_test.go
Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/sync_manager.go Outdated
Show resolved Hide resolved discovery/sync_manager.go Outdated
RotateTicker: ticker.NewForce(DefaultSyncerRotationInterval),
FullSyncTicker: ticker.NewForce(DefaultFullSyncInterval),
ActiveSyncerTimeoutTicker: ticker.NewForce(DefaultActiveSyncerTimeout),
NumActiveSyncers: 3,

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Mar 15, 2019

Collaborator

would you mind adding a commit to move node pubkey into Config?

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 21, 2019

Author Collaborator

Fixed.

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Apr 2, 2019

Collaborator

fwiw i think the commit got lost

This comment has been minimized.

Copy link
@wpaulino

wpaulino Apr 2, 2019

Author Collaborator

Should be this: 0de8463

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Apr 2, 2019

Collaborator

That’s for the gossip syncer, not the gossipper

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch 5 times, most recently from 23aa6f2 to 4397a14 Mar 21, 2019

@Roasbeef
Copy link
Member

Roasbeef left a comment

Excellent work! I've completed a first pass, but still need to complete my mental model w.r.t the various state transitions, and if some of the current imposed roles match up to reality and my initial mental mode of this change. Starting to run this on on of my testnet nodes now with logging cranked up so I can get a feel for how it fares in the wild.

Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/syncer.go
// horizon to ensure we don't miss any newer
// items.
updateHorizon := time.Now().Add(-time.Hour * 1)
updateHorizon := time.Now()

This comment has been minimized.

Copy link
@Roasbeef

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Mar 22, 2019

Member

FWIW, when forcing a full sync, we still might miss some past updates since the sync is based off of the channel ID's, and not also the channel updates themselves. ACINQ has a proposal up to modify the query chan range response to also include the timestamps of each of the channel updates for a proper full sync. In the end, for routing nodes, it isn't a big deal to have out of date data or not even attempt to sync the updates as all. For the clients, there're bandwidth/error tradeoffs to be made.


// syncTransitionTimeout is the default timeout in which we'll wait up
// to when attempting to perform a sync transition.
syncTransitionTimeout = 5 * time.Second

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Mar 22, 2019

Member

Timeout for them to respond to and for us to finish processing all the past updates? This might be insufficient in the case of a neutrino node that's pre-empty chain view.

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 23, 2019

Author Collaborator

I think your comment is referring to the timeout used when starting the next syncer? That one is 5 minutes.

syncTransitionTimeout is just a timeout to ensure the syncManager doesn't get blocked when attempting to transaction a syncer that is not in a chansSynced state.

This comment has been minimized.

Copy link
@halseth

halseth Mar 27, 2019

Collaborator

What could cause such blocking?

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 27, 2019

Author Collaborator

If the GossipSyncer transitions away from chansSynced for whatever reason between the time we're filtering for valid candidates and actually transitioning the syncer.

Show resolved Hide resolved discovery/syncer.go Outdated
// Our current active syncer has reached its terminal
// chansSynced state, so we'll proceed to starting the next
// pending active syncer if there is one.
case <-startNext:

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Mar 22, 2019

Member

Why do we care if active syncers reach a synced state? They should start in this state more or less, since passive->active is just a matter of setting a new gossip timestamp.

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 23, 2019

Author Collaborator

Why do we care if active syncers reach a synced state?

This is how the round-robin works to ensure we only have one syncer negotiating at a time.

They should start in this state more or less, since passive->active is just a matter of setting a new gossip timestamp.

passiveSync -> activeSync also triggers the state machine to start from syncingChans.

This comment has been minimized.

Copy link
@halseth

halseth Mar 27, 2019

Collaborator

Couldn't it be a problem if we start a new syncer that (almost) immediately goes to chansSynced, triggering us to start a new one, same thing happens again etc...? Possibly need an active syncer stay our chosen one for a while.

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 27, 2019

Author Collaborator

I don't see a problem with that. The round-robin is just to ensure that we do the sync exchange with one active syncer at a time. New graph updates can be received from any active syncer, regardless of their order in the round-robin.

This comment has been minimized.

Copy link
@halseth

halseth Mar 28, 2019

Collaborator

Imagine there are no new updates, and all nodes are perfectly in sync. The problem now is that you'll choose an active syncer, but since you are already in sync you'll immediately move to chansSynced and trigger the startNext signal. Now you'll continue rapidly switching active syncers. Or is there something here that prevents that scenario?

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 28, 2019

Author Collaborator

Again, the round-robin is just to ensure we only query one peer at a time for historical updates. Once a syncer reaches chansSynced, we can receive new updates from the peer at any point.

Show resolved Hide resolved discovery/sync_manager.go Outdated
Show resolved Hide resolved discovery/sync_manager.go
Show resolved Hide resolved discovery/sync_manager.go
Show resolved Hide resolved discovery/syncer.go
@Roasbeef

This comment has been minimized.

Copy link
Member

Roasbeef commented Mar 22, 2019

We''ll also need to test out this new behavior in the wild to ensure that the new operating modes play nicely with CL and eclair.

@Roasbeef

This comment has been minimized.

Copy link
Member

Roasbeef commented Mar 22, 2019

If we expose some more status information from the sync manager, then we can have the server be able to query for the gossip syncer status of each peer. With this information, we can add some annotations to listpeers in the form of new fields. Alternatively, the rpc server can query and populate this information itself.

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from 4397a14 to d377535 Mar 23, 2019

@wpaulino

This comment has been minimized.

Copy link
Collaborator Author

wpaulino commented Mar 23, 2019

@Roasbeef @halseth @cfromknecht PTAL, comments addressed. I've also expanded the listpeers RPC with the current sync type.

Show resolved Hide resolved discovery/syncer.go
Show resolved Hide resolved discovery/gossiper.go Outdated
@halseth
Copy link
Collaborator

halseth left a comment

This is starting to look very good! Only a few comments from me (nits really), will start testing this to see the effect of more streamlined gossip 👍

Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/syncer_test.go Outdated
Show resolved Hide resolved discovery/syncer_test.go
Show resolved Hide resolved discovery/sync_manager.go Outdated
Show resolved Hide resolved discovery/sync_manager.go
Show resolved Hide resolved discovery/sync_manager_test.go Outdated
Show resolved Hide resolved config.go
Show resolved Hide resolved lnrpc/rpc.proto
Show resolved Hide resolved rpcserver.go Outdated

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from d377535 to 7e2fda2 Mar 25, 2019

@cfromknecht
Copy link
Collaborator

cfromknecht left a comment

Few more pending comments (forgot to submit over the weekend)

Show resolved Hide resolved discovery/reliable_sender_test.go Outdated
Show resolved Hide resolved discovery/sync_manager_test.go Outdated
Show resolved Hide resolved discovery/syncer.go
Show resolved Hide resolved config.go
Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/sync_manager_test.go Outdated
@wpaulino

This comment has been minimized.

Copy link
Collaborator Author

wpaulino commented Mar 25, 2019

Comments addressed, PTAL @halseth @cfromknecht.

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from 7e2fda2 to aa8ed14 Mar 25, 2019

@cfromknecht
Copy link
Collaborator

cfromknecht left a comment

Latest batch of changes looking really nice. Have been running this on my node a bit, here's some bandwidth stats after letting this run for a few hours:

PEER TABLE (46)
bytes_sent | bytes_recvd
---------- | ----------- 
      4000 |        5140 
   3695057 |     1307719 
   4234013 |        6226 
   4232219 |       13636 
   4234427 |        4516 
   4234979 |        2944 
   4233522 |        9313 
   2542379 |      225669 
   4233047 |       10216 
   3523226 |        1277 
   4234542 |        5201 
   4207241 |      116806 
   4233875 |        7504 
   4234703 |        3946 
   4234151 |        5656 
   4232761 |       12360 
   4234289 |        5794 
   4234151 |        6226 
   4234289 |        5086 
   4233599 |        7936 
   4223387 |       52396 
   4233047 |       10216 
   4234979 |        2806 
   4360194 |       11926 
   4234583 |        6796 
   4234289 |        6934 
   4232434 |        2275 
   4234289 |        6364 
   4224628 |        2921 
   4233737 |       10354 
   4226891 |        3376 
      1862 |     1392605 
   4232771 |       11356 
   4218695 |       90724 
   4225272 |        4389 
   4233461 |        8506 
   4235209 |        3036 
   4233461 |        8506 
   4069103 |     1288717 
   4234151 |        5656 
   4233185 |        9646 
   4234013 |        6796 
   4234289 |        5086 
   4227029 |        2806 
   4233599 |       10786 
   4234013 |        6226 
 --------- | -----------
 183233041 |     4734376

LND-LND connections have a very obvious bandwidth footprint, the bytes_sent/bytes_recvd columns are very symmetric and uniform for all stable LND-LND peers. Using bytes_sent as an approximation for bytes_recvd, this PR reduces the amount of data requested from my node's 46 peers by 97.5%, from 183 MB to 4.73 MB. Once this is widely deployed on the network, we can probably expect similar reductions in bytes_sent, at least for LND-LND connections, once a large portion start to employ this more intelligent syncing paradigm.

Some additional small comments are left inline

Show resolved Hide resolved peer.go Outdated
Show resolved Hide resolved lnrpc/rpc.proto Outdated
Show resolved Hide resolved rpcserver.go Outdated

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from aa8ed14 to a90df3e Mar 27, 2019

Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/syncer.go Outdated
Show resolved Hide resolved discovery/syncer.go Outdated

// syncTransitionTimeout is the default timeout in which we'll wait up
// to when attempting to perform a sync transition.
syncTransitionTimeout = 5 * time.Second

This comment has been minimized.

Copy link
@halseth

halseth Mar 27, 2019

Collaborator

What could cause such blocking?

Show resolved Hide resolved discovery/syncer.go
Show resolved Hide resolved discovery/syncer.go Outdated
// Our current active syncer has reached its terminal
// chansSynced state, so we'll proceed to starting the next
// pending active syncer if there is one.
case <-startNext:

This comment has been minimized.

Copy link
@halseth

halseth Mar 27, 2019

Collaborator

Couldn't it be a problem if we start a new syncer that (almost) immediately goes to chansSynced, triggering us to start a new one, same thing happens again etc...? Possibly need an active syncer stay our chosen one for a while.

Show resolved Hide resolved discovery/sync_manager.go Outdated
// syncer's turn, the transition will happen. If this transition fails,
// then any new gossip syncers will be used as active syncers until
// reaching NumActiveSyncers.
if newSyncType == ActiveSync {

This comment has been minimized.

Copy link
@halseth

halseth Mar 27, 2019

Collaborator

Since this method doesn't actually transitions syncers to the active state, consider handling that transition by calling enqueueActiveSyncer directly.

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 27, 2019

Author Collaborator

What's wrong with the current approach? Would rather have one call in which a syncer can perform any transition.

This comment has been minimized.

Copy link
@halseth

halseth Mar 28, 2019

Collaborator

What is wrong is that the syncerHandler is sending messages to itself with the current approach, which complicates the logic.

Maybe I'm confused about the distinction between active and pending-active syncers, but could it be enough to just let rotateActiveSyncerCandidates shuffle the pending-active map without doing any transitioning?

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 28, 2019

Author Collaborator

Even if we do that, we'll still need to transition it once the ticker fires, so it'll send itself a message anyway. I agree that the handler sending messages to itself can be somewhat weird, but it seems like the only alternative is to have a separate handler responsible for just the tickers.

// necessary, a sync transition to an ActiveSync sync type will be
// performed to ensure we receive new graph updates from this peer.
startSyncer := func(s *activeSyncer) error {
if s.transition {

This comment has been minimized.

Copy link
@halseth

halseth Mar 27, 2019

Collaborator

Is this transition bool really needed? We could rather start all syncers in the passive state, and transition them always to active when using them.

// If we've yet to reach our desired number of active syncers, then
// we'll use this one.
if len(m.activeSyncers) < m.cfg.NumActiveSyncers {
s.syncType = uint32(ActiveSync)

This comment has been minimized.

Copy link
@halseth

halseth Mar 27, 2019

Collaborator

As mentioned, I think it is easier to just let this be Passive, then let the call to enqueueActiveSyncer transition it into Active.

Same for HistoricalSync above: let it start Passive, then signal to the syncerHandler to force a historical sync.

This comment has been minimized.

Copy link
@wpaulino

wpaulino Mar 27, 2019

Author Collaborator

The transition requires the syncer to be in a chansSynced state, so if we go with this approach they'll have to go through the state machine twice (once the first time they're started, another after the transition).

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from a90df3e to 6f20017 Mar 27, 2019

@wpaulino

This comment has been minimized.

Copy link
Collaborator Author

wpaulino commented Apr 3, 2019

Comments addressed, PTAL @Roasbeef @halseth.

The default time of 6hrs for peer rotation is too long IMO, it can be much shorter (like 10-20 mins maybe?) given that after the first real exchange, it's just a spot check, and we want to ensure for light clients that we converge to the known graph as soon as possible

I ran some manual light client tests and noticed historical syncs can take some time to completely finish (10-15 mins), so I settled with 20 mins.

@halseth
Copy link
Collaborator

halseth left a comment

Alright, this is getting really close now IMO! 😀

@@ -1079,6 +1099,7 @@ func (g *GossipSyncer) handleSyncTransition(req *syncTransitionReq) error {
firstTimestamp = time.Now()
timestampRange = math.MaxUint32
newState = syncingChans
g.genHistoricalChanRangeQuery = false

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

This needs a comment

This comment has been minimized.

Copy link
@wpaulino

wpaulino Apr 3, 2019

Author Collaborator

Ended up removing the line completely. If it's done a historical sync before, then it shouldn't be a problem to do it again as it'll most likely be a NOP.

// transitions any pending active syncers.
func (m *SyncManager) signalNewActiveSyncer() {
select {
case m.newActiveSyncer <- struct{}{}:

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

If we buffer this channel with 1, we can just select on it and default here, making sure it will never block.

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

However, I'm not opposed to keeping it as is, since it is the same pattern used in signalStaleActiveSyncer.

This comment has been minimized.

Copy link
@wpaulino

wpaulino Apr 3, 2019

Author Collaborator

I believe not making it block can introduce a slight race condition where the syncer isn't transitioned until a new one comes in.

return
}
m.queueActiveSyncer(candidate)
m.signalNewActiveSyncer()

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

Just let queueActiveSyncer send the signal after adding the syncer to the map?

This comment has been minimized.

Copy link
@wpaulino

wpaulino Apr 3, 2019

Author Collaborator

There needs to be a distinction since the roundRobinHandler also calls queueActiveSyncer but doesn't need the signal.

// If we've yet to reach our desired number of active syncers, then
// we'll use this one.
case numActiveSyncers < m.cfg.NumActiveSyncers:
m.queueActiveSyncer(s)

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

I still it would make the concurrency aspect a bit easier to reason about. This is not totally bulletproof, as the real numActiveSyncers might have changed at this point if multiple InitSyncState are called concurrently.

// We'll start by stopping the GossipSyncer for the disconnected peer.
s.Stop()

// If it's a non-active syncer, then we can just exit now.

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

Why not just call signalStaleActiveSyncer and let the syncerHandler handle the remaining logic?

This comment has been minimized.

Copy link
@wpaulino

wpaulino Apr 3, 2019

Author Collaborator

Fixed.

// Register the this peer's for gossip syncer with the gossiper.
// This is blocks synchronously to ensure the gossip syncer is
// registered with the gossiper before attempting to read
// messages from the remote peer.
p.server.authGossiper.InitSyncState(p, recvUpdates)
//
// TODO(wilmer): Only sync updates from non-channel peers. This

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

So what's the rationale for this TODO?

Show resolved Hide resolved discovery/sync_manager.go
Show resolved Hide resolved discovery/sync_manager.go

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch 3 times, most recently from 1c80950 to 0e4a5b7 Apr 3, 2019

// move on to the next and avoid retrying as its already been
// transitioned.
case <-m.cfg.ActiveSyncerTimeoutTicker.Ticks():
log.Debugf("Timed out waiting for GossipSyncer(%x) "+

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

Warn?

This comment has been minimized.

Copy link
@wpaulino

wpaulino Apr 3, 2019

Author Collaborator

Fixed.

m.signalNewActiveSyncer()
return
}
m.Unlock()

This comment has been minimized.

Copy link
@halseth

halseth Apr 3, 2019

Collaborator

can also just defer the unlock after locking

This comment has been minimized.

Copy link
@wpaulino

wpaulino Apr 3, 2019

Author Collaborator

We don't want to hold it while signalNewActiveSyncer since it can potentially block for a bit if also instantiating many other syncers.

@halseth

halseth approved these changes Apr 3, 2019

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from 0e4a5b7 to d194300 Apr 3, 2019

@cfromknecht

This comment has been minimized.

Copy link
Collaborator

cfromknecht commented Apr 3, 2019

Needs rebase now that scb rest has been merged

wpaulino added some commits Mar 23, 2019

discovery: replace GossipSyncer syncChanUpdates flag with SyncerType
In this commit, we introduce a new type: SyncerType. This type denotes
the type of sync a GossipSyncer is currently under. We only introduce
the two possible entry states, ActiveSync and PassiveSync. An ActiveSync
GossipSyncer will exchange channels with the remote peer and receive new
graph updates from them, while a PassiveSync GossipSyncer will not and
will only response to the remote peer's queries.

This commit does not modify the behavior and is only meant to be a
refactor.
discovery: set GossipSyncer update horizon to current time
With the introduction of the gossip sync manager in a later commit,
retrieving the backlog of updates within the last hour is no longer
necessary as we'll be forcing full syncs periodically.
discovery: introduce GossipSyncer sync transitions
In this commit, we introduce the ability for GossipSyncer's to
transition their sync type. This allows us to be more flexible with our
gossip syncers, as we can now prevent them from receiving new graph
updates at any time. It's now possible to transition between the
different sync types, as long as the GossipSyncer has reached its
terminal chansSynced sync state. Certain transitions require some
additional wire messages to be sent, like in the case of an ActiveSync
GossipSyncer transitioning to a PassiveSync type.
discovery: allow gossip syncer to perform historical syncs
In this commit, we introduce the ability for gossip syncers to perform
historical syncs. This allows us to reconcile any channels we're missing
that the remote peer has starting from the genesis block of the chain.
This commit serves as a prerequisite to the SyncManager, introduced in a
later commit, where we'll be able to make spot checks by performing
historical syncs with peers to ensure we have as much of the graph as
possible.
discovery: introduce GossipSyncer signal delivery of chansSynced state
In this commit, we introduce another feature to the GossipSyncer in
which it can deliver a signal to an external caller once it reaches its
terminal chansSynced state. This is yet to be used, but will serve
useful with a round-robin sync mechanism, where we wait for to finish
syncing with a specific peer before moving on to the next.
discovery: introduce gossiper SyncManager subsystem
In this commit, we introduce a new subsystem for the gossiper: the
SyncManager. This subsystem is a major overhaul on the way the daemon
performs the graph query sync state machine with peers.

Along with this subsystem, we also introduce the concept of an active
syncer. An active syncer is simply a GossipSyncer currently operating
under an ActiveSync sync type. Before this commit, all GossipSyncer's
would act as active syncers, which means that we were receiving new
graph updates from all of them. This isn't necessary, as it greatly
increases bandwidth usage as the network grows. The SyncManager changes
this by requiring a specific number of active syncers. Once we reach
this specified number, any future peers will have a GossipSyncer with a
PassiveSync sync type.

It is responsible for three main things:

1. Choosing different peers randomly to receive graph updates from to
ensure we don't only receive them from the same set of peers.

2. Choosing different peers to force a historical sync with to ensure we
have as much of the public network as possible. The first syncer
registered with the manager will also attempt a historical sync.

3. Managing an in-order queue of active syncers where the next cannot be
started until the current one has completed its state machine to ensure
they don't overlap and request the same set of channels, which
significantly reduces bandwidth usage and addresses a number of issues.

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from d194300 to 4e246ff Apr 3, 2019

wpaulino added some commits Mar 23, 2019

config+peer: replace NoChanUpdates flag with NumGraphSyncPeers
In this commit, we replace the NoChanUpdates flag with a flag that
allows us to specify the number of peers we want to actively receive new
graph updates from. This will be required when integrating the new
gossiper SyncManager subsystem with the rest of lnd.
discovery: make timestamp range check inclusive within FilterGossipMsgs
As required by the spec:

> SHOULD send all gossip messages whose timestamp is greater or equal to
first_timestamp, and less than first_timestamp plus timestamp_range.

@wpaulino wpaulino force-pushed the wpaulino:gossip-sync-manager branch from 4e246ff to ca01695 Apr 3, 2019

@Roasbeef Roasbeef merged commit 30f2b1c into lightningnetwork:master Apr 3, 2019

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@wpaulino wpaulino deleted the wpaulino:gossip-sync-manager branch Apr 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.