Skip to content

Commit

Permalink
server: stagger initial reconnects
Browse files Browse the repository at this point in the history
This commit adds optional jitter to our initial reconnection to our
persistent peers. Currently we will attempt reconnections to all peers
simultaneously, which results in large amount of contention as the
number of channels a node has grows.

We resolve this by adding a randomized delay between 0 and 30 seconds
for all persistent peers. This spreads out the load and contention to
resources such as the database, read/write pools, and memory
allocations. On my node, this allows to start up with about 80% of the
memory burst compared to the all-at-once approach.

This also has a second-order effect in better distributing messages sent
at constant intervals, such as pings. This reduces the concurrent jobs
submitted to the read and write pools at any given time, resulting in
better reuse of read/write buffers and fewer bursty allocation and
garbage collection cycles.
  • Loading branch information
cfromknecht committed Apr 5, 2019
1 parent 4de7d0c commit e0c4b24
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 1 deletion.
2 changes: 2 additions & 0 deletions config.go
Expand Up @@ -252,6 +252,8 @@ type config struct {

RejectPush bool `long:"rejectpush" description:"If true, lnd will not accept channel opening requests with non-zero push amounts. This should prevent accidental pushes to merchant nodes."`

StaggerInitialReconnect bool `long:"stagger-initial-reconnect" description:"If true, will apply a randomized staggering between 0s and 30s when reconnecting to persistent peers on startup. The first 10 reconnections will be attempted instantly, regardless of the flags value"`

net tor.Net

Routing *routing.Conf `group:"routing" namespace:"routing"`
Expand Down
45 changes: 44 additions & 1 deletion server.go
Expand Up @@ -8,6 +8,7 @@ import (
"fmt"
"image/color"
"math/big"
prand "math/rand"
"net"
"path/filepath"
"regexp"
Expand Down Expand Up @@ -60,6 +61,18 @@ const (
// durations exceeding this value will be eligible to have their
// backoffs reduced.
defaultStableConnDuration = 10 * time.Minute

// numInstantInitReconnect specifies how many persistent peers we should
// always attempt outbound connections to immediately. After this value
// is surpassed, the remaining peers will be randomly delayed using
// maxInitReconnectDelay.
numIntsantInitReconnect = 10

// maxInitReconnectDelay specifies the maximum delay in seconds we will
// apply in attempting to reconnect to persistent peers on startup. The
// value used or a particular peer will be chosen between 0s and this
// value.
maxInitReconnectDelay = 30
)

var (
Expand Down Expand Up @@ -1931,6 +1944,7 @@ func (s *server) establishPersistentConnections() error {

// Iterate through the combined list of addresses from prior links and
// node announcements and attempt to reconnect to each node.
var numOutboundConns int
for pubStr, nodeAddr := range nodeAddrsMap {
// Add this peer to the set of peers we should maintain a
// persistent connection with.
Expand Down Expand Up @@ -1961,13 +1975,42 @@ func (s *server) establishPersistentConnections() error {
s.persistentConnReqs[pubStr] = append(
s.persistentConnReqs[pubStr], connReq)

go s.connMgr.Connect(connReq)
// We'll connect to the first 10 peers immediately, then
// randomly stagger any remaining connections if the
// stagger initial reconnect flag is set. This ensures
// that mobile nodes or nodes with a small number of
// channels obtain connectivity quickly, but larger
// nodes are able to disperse the costs of connecting to
// all peers at once.
if numOutboundConns < numIntsantInitReconnect ||
!cfg.StaggerInitialReconnect {

go s.connMgr.Connect(connReq)
} else {
go s.delayInitialReconnect(connReq)
}
}

numOutboundConns++
}

return nil
}

// delayInitialReconnect will attempt a reconnection using the passed connreq
// after sampling a value for the delay between 0s and the
// maxInitReconnectDelay.
//
// NOTE: This method MUST be run as a goroutine.
func (s *server) delayInitialReconnect(connReq *connmgr.ConnReq) {
delay := time.Duration(prand.Intn(maxInitReconnectDelay)) * time.Second
select {
case <-time.After(delay):
s.connMgr.Connect(connReq)
case <-s.quit:
}
}

// prunePersistentPeerConnection removes all internal state related to
// persistent connections to a peer within the server. This is used to avoid
// persistent connection retries to peers we do not have any open channels with.
Expand Down

0 comments on commit e0c4b24

Please sign in to comment.