New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server]: Start Peers Asynchronously #1658

Merged
merged 4 commits into from Aug 16, 2018

Conversation

@cfromknecht
Collaborator

cfromknecht commented Jul 31, 2018

This PR adds asynchronous starting of peers,
in order to avoid potential DOS vectors. Currently,
we block with the server's mutex while peers exchange
Init messages and perform other setup. Thus, a remote
peer that does not reply with an init message will
cause server to block for 15s per attempt.

We also modify the startup behavior to spawn
peerTerminationWatchers before starting the
peer itself, ensuring that a peer is properly
cleaned up if the initialization fails. Currently,
failing to start a peer does not execute the bulk
of the teardown logic, since it is not spawned
until after a successful Start occurs.

The final commit is purely just a code move to place
the relevant methods in closer proximity, and
organize them roughly in the expected execution
order.

Prerequesities:

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Jul 31, 2018

Running this on the faucet now!

@cfromknecht cfromknecht added this to the 0.5 milestone Jul 31, 2018

@cfromknecht cfromknecht added the P2 label Jul 31, 2018

@cfromknecht cfromknecht force-pushed the cfromknecht:async-peer-start branch from c2b8d35 to 921112f Aug 1, 2018

@cfromknecht

This comment has been minimized.

Collaborator

cfromknecht commented Aug 1, 2018

Added commit to bump peer write timeout to 50s, and rebased on master

server.go Outdated
@@ -1719,10 +1719,10 @@ func (s *server) findPeerByPubStr(pubStr string) (*peer, error) {
// the cleanup routine to exit early.
//
// NOTE: This MUST be launched as a goroutine.
func (s *server) peerTerminationWatcher(p *peer) {
func (s *server) peerTerminationWatcher(p *peer, ready chan struct{}) {

This comment has been minimized.

@halseth

halseth Aug 1, 2018

Collaborator

ready should be explained in the godoc

This comment has been minimized.

@cfromknecht

cfromknecht Aug 2, 2018

Collaborator

fixed!

// Otherwise, signal to the peerTerminationWatcher that the peer startup
// was successful, and to begin watching the peer's wait group.
close(ready)

This comment has been minimized.

@halseth

halseth Aug 1, 2018

Collaborator

What's the reason for doing this here, and not as the last thing in this method?

This comment has been minimized.

@cfromknecht

cfromknecht Aug 1, 2018

Collaborator

It could be done later, as at this point even a Disconnect would unblock the select in WaitForDisconnect. I chose to put it here for clarity, and to stop selecting as soon as possible to avoid futex inflation

@cfromknecht cfromknecht force-pushed the cfromknecht:async-peer-start branch 3 times, most recently from 90f5fb2 to b90365f Aug 2, 2018

@cfromknecht cfromknecht dismissed stale reviews from wpaulino and halseth via 2965f67 Aug 3, 2018

@cfromknecht cfromknecht force-pushed the cfromknecht:async-peer-start branch 2 times, most recently from 2965f67 to 329a69c Aug 3, 2018

@cfromknecht

This comment has been minimized.

Collaborator

cfromknecht commented Aug 3, 2018

Been thinking about the ramifications of this PR, and I'd prefer to have #1551 (and transitively, #1668) before this lands in master. Marking this as blocked in the meantime

@cfromknecht cfromknecht added the blocked label Aug 3, 2018

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Aug 14, 2018

Can be rebased now that the two dependent PR's have been merged.

cfromknecht added some commits Jul 31, 2018

peer: add ready chan arg to WaitForDisconnect
This commit adds additional synchronization logic to
WaitForDisconnect, such that it can be spawned before
Start has been executed by the server. Without
modification, the current version will return
immediately since no goroutines will have been
spawned.

To solve this, we modify WaitForDisconnect to block until:
 1) the peer is disconnected,
 2) the peer is successfully started,
before watching the waitgroup.

In the first case, the waitgroup will block until all
(if any) spawned goroutines have exited. Otherwise, if
the Start is successful, we can switch to watching the
waitgroup, knowing that waitgroup counter is positive.
server: add async peer Start() + safer cleanup
This commit adds asynchronous starting of peers,
in order to avoid potential DOS vectors. Currently,
we block with the server's mutex while peers exchange
Init messages and perform other setup. Thus, a remote
peer that does not reply with an init message will
cause server to block for 15s per attempt.

We also modify the startup behavior to spawn
peerTerminationWatchers before starting the
peer itself, ensuring that a peer is properly
cleaned up if the initialization fails. Currently,
failing to start a peer does not execute the bulk
of the teardown logic, since it is not spawned
until after a successful Start occurs.

cfromknecht added some commits Jul 31, 2018

peer: increase peer write timeout to 50 seconds
Sometimes when performing an initial sync, the remote
node isn't able to pull messages off the wire because
of long running tasks and queues are saturated. With
a shorter write timeout, we will give up trying to send
messages and teardown the connection, even though the
peer is still active.

@cfromknecht cfromknecht force-pushed the cfromknecht:async-peer-start branch from 329a69c to d4d9097 Aug 14, 2018

@cfromknecht cfromknecht removed the blocked label Aug 14, 2018

@cfromknecht

This comment has been minimized.

Collaborator

cfromknecht commented Aug 14, 2018

rebased and 💚

@Roasbeef

LGTM 🚜

// call to Start returns no error. Otherwise, if the peer fails to start,
// calling Disconnect will signal the quit channel and the method will not
// block, since no goroutines were spawned.
func (p *peer) WaitForDisconnect(ready chan struct{}) {

This comment has been minimized.

@Roasbeef

@Roasbeef Roasbeef merged commit 15eeded into lightningnetwork:master Aug 16, 2018

1 of 2 checks passed

coverage/coveralls Coverage decreased (-0.06%) to 54.642%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment