-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: improve initial peer bootstrapping #1205
server: improve initial peer bootstrapping #1205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like how small the diff for this PR turned out :) A much needed optimization!
server.go
Outdated
// new connection via our public endpoint, which will require the lock | ||
// an add the peer to the server's internal state. | ||
if done != nil { | ||
done <- struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can close
the channel instead, which will never block :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
server.go
Outdated
if err != nil { | ||
return err | ||
} | ||
|
||
// ignore is a set used to keep track of peers already retrieved | ||
// from our bootstrappers in order to avoid duplicates. | ||
ignore := make(map[autopilot.NodeID]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be initialized inside initialPeerBootstrap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, forgot to move this. Fixed.
server.go
Outdated
// quickly by discarding peers that are slowing | ||
// us down. | ||
select { | ||
case <-done: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the connection fails, then this won't be sent on, and we will wait the 3 seconds before trying the next peers. Should we isntead make this channel chan error
, where we can send an error in case it fails, and close in case it succeeds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, fixed!
6b23ef7
to
8cffbf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👌
Tested locally and confirmed that we'll aggressively converge on our initial set of peers as advertised.
Needs a rebase to fix conflicts as well, also we can manually implement the shuffle using |
c216ba5
to
8826b8d
Compare
Rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but could be useful to add some more logs to the logic I think, for easier debugging of potential connection issues.
server.go
Outdated
// quickly by discarding peers that are slowing | ||
// us down. | ||
select { | ||
case <-errChan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
server.go
Outdated
case <-errChan: | ||
// TODO: tune timeout? 3 seconds might be *too* | ||
// aggressive but works well. | ||
case <-time.After(3 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle the timeout somehow? At least log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout doesn't need to be handled since we'll just continue to the next peer. Added a trace log though.
server.go
Outdated
errChan := make(chan error, 1) | ||
s.connectToPeer(a, errChan) | ||
select { | ||
case <-errChan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
select { | ||
case err := <-errChan: | ||
return err | ||
case <-s.quit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also return ErrServerShuttingDown
in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
In this commit, we address an existing issue with regards to the inital peer bootstrapping stage. At times, the bootstrappers can be unreliable by providing addresses for peers that no longer exist/are currently offline. This would lead to nodes quickly entering an exponential backoff method used to maintain a minimum target of peers without first achieving said target. We address this by separating the peer bootstrapper into two stages: the initial peer bootstrapping and maintaining a target set of nodes to maintain an up-to-date view of the network. The initial peer bootstrapping stage has been made aggressive in order to provide such view of the network as quickly as possible. Once done, we continue on with the existing exponential backoff method responsible for maintaining a target set of nodes.
In this commit, we randomize the order of the different bootstrappers in order to prevent from always querying potentially unreliable bootstrappers first.
8826b8d
to
800b136
Compare
In this commit, we address an existing issue with regards to the inital
peer bootstrapping stage. At times, the bootstrappers can be unreliable
by providing addresses for peers that no longer exist/are currently
offline. This would lead to nodes quickly entering an exponential
backoff method used to maintain a minimum target of peers without first
achieving said target.
We address this by separating the peer bootstrapper into two stages: the
initial peer bootstrapping and maintaining a target set of nodes to
maintain an up-to-date view of the network. The initial peer
bootstrapping stage has been made aggressive in order to provide such
view of the network as quickly as possible. Once done, we continue on
with the existing exponential backoff method responsible for maintaining
a target set of nodes.
We also randomize the order of the different bootstrappers to prevent
from always querying potentially unreliable bootstrappers first.