Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[htlcswitch]: revert replace link, ensure removed links are stopped #1551

Merged

Conversation

cfromknecht
Copy link
Contributor

@cfromknecht cfromknecht commented Jul 13, 2018

This PR reverts the switch's current AddLink behavior, such that it rejects duplicate links with the same channel id. At the moment, we will hot swap in another link without fully shutting down the first, which is dangerous as the two can share a mailbox.

In a related commit, we changed RemoveLink such that it called go link.Stop(), and processed in the background to avoid deadlocking on the forwarding index mutex. The result of this is that it is possible for a prior Stop() to still be executing by the time a new link is added with the same channel id.

In order to recreate the behavior of replacing the old link with a new one, RemoveLink has been altered to return the removed link if it was found, and to not call/spawn link.Stop() at all. Instead, the caller is now responsible for stopping the link, which gets around the deadlocking w/in the switch.

Finally, though AddLink rejects duplicate links, we've modified the peer to first try and remove any links going by the same channel id. If one is found, we ensure that link.Stop() finishes before moving on to adding the link to the switch.

@cfromknecht cfromknecht changed the title [htlcswitch]: revert replace link, ensure removed links are stoped [htlcswitch]: revert replace link, ensure removed links are stopped Jul 13, 2018
@cfromknecht cfromknecht added needs review PR needs review by regular contributors htlcswitch safety General label for issues/PRs related to the safety of using the software labels Jul 13, 2018
@Roasbeef Roasbeef added P2 should be fixed if one has time bug fix labels Jul 13, 2018
@Roasbeef Roasbeef requested a review from wpaulino July 17, 2018 06:25
@Roasbeef Roasbeef added this to the 0.5 milestone Jul 17, 2018
Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK!

server.go Outdated
@@ -632,7 +632,11 @@ func newServer(listenAddrs []net.Addr, chanDB *channeldb.DB, cc *chainControl,
ChainIO: cc.chainIO,
MarkLinkInactive: func(chanPoint wire.OutPoint) error {
chanID := lnwire.NewChanIDFromOutPoint(&chanPoint)
return s.htlcSwitch.RemoveLink(chanID)
link, err := s.htlcSwitch.RemoveLink(chanID)
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: return early in case of error

peer.go Outdated
// down to ensure that the mailboxes are only ever under the control of
// one link.
oldLink, err := p.server.htlcSwitch.RemoveLink(link.ChanID())
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log error

peer.go Outdated
quit: p.quit,
channel: channel,
unregisterChannel: func(chanID lnwire.ChannelID) error {
_, err := p.server.htlcSwitch.RemoveLink(chanID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link should not be stopped here?

peer.go Outdated
return nil

case err == htlcswitch.ErrChannelLinkNotFound:
peerLog.Warnf("unable remove channel link with "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unable to"

@halseth halseth added the needs testing PR hasn't yet been actively tested on testnet/mainnet label Jul 17, 2018
@@ -1851,7 +1853,7 @@ func (s *Switch) getLinkByShortID(chanID lnwire.ShortChannelID) (ChannelLink, er

// RemoveLink is used to initiate the handling of the remove link command. The
// request will be propagated/handled to/in the main goroutine.
func (s *Switch) RemoveLink(chanID lnwire.ChannelID) error {
func (s *Switch) RemoveLink(chanID lnwire.ChannelID) (ChannelLink, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc should be updated to reflect that it is now the caller's responsibility to stop the link.

@@ -431,6 +416,24 @@ func (l *channelLink) Start() error {
return fmt.Errorf("unable to trim circuits above "+
"local htlc index %d: %v", localHtlcIndex, err)
}

// Sine the link is live, before we start the link we'll update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Sine/Since/

@wpaulino wpaulino added first pass review done PR has had first pass of review, needs more tho and removed needs review PR needs review by regular contributors labels Jul 18, 2018
@cfromknecht cfromknecht force-pushed the switch-revert-replace-link branch 4 times, most recently from 7836565 to 26732c4 Compare July 24, 2018 21:04
// Now that all pending and live links have been removed from
// the forwarding indexes, stop each one before shutting down.
for _, link := range linksToStop {
link.Stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The danger here is that a link could be sending a message to the switch for forwarding. So this could result in a deadlock. This is why we typically stopped the link in a goroutine. We'll need to do a sweep to ensure that any time the link is trying to forward, we're able to reliably cause it to exit via a signal from the switch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, we now thread through the link's quit channel into ForwardPackets so that calling Stop breaks and blocking forwarding calls


return s.removeLink(chanID)
if link != nil {
link.Stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here.

if err := s.RemoveLink(chanID1); err != nil {
t.Fatalf("unable to remove alice link: %v", err)
}
s.RemoveLink(chanID1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we no longer checking the return value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics are now such that it can't fail, it will block until all state related to a channel is removed or doesn't exist. The new function signature on RemoveLink doesn't return anything.

srvrLog.Errorf("unable to remove channel link: %v",
err)
}
p.server.htlcSwitch.RemoveLink(link.ChanID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message looks to not actually match what's in the commit itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but still looks diff, it says:

server: use blocking RemoveLink to shutdown links

Yet it just now ignores the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveLink doesn't return an error. It just blocks until everything is purged

@@ -556,6 +553,12 @@ func (p *peer) addLink(chanPoint *wire.OutPoint,

link := htlcswitch.NewChannelLink(linkCfg, lnChan)

// Before adding our new link, purge the switch of any pending or live
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I think we might also need this in the execution path from funding mgr -> channel manager in the peer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't both pathways call addLink?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, scratch that.

@@ -431,6 +416,24 @@ func (l *channelLink) Start() error {
return fmt.Errorf("unable to trim circuits above "+
"local htlc index %d: %v", localHtlcIndex, err)
}

// Since the link is live, before we start the link we'll update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on making this synchronous instead of dispatching? As is, it's not guaranteed to register before making a state update

@@ -3725,7 +3725,6 @@ func (h *persistentLinkHarness) restart(restartSwitch bool,

// First, remove the link from the switch.
h.coreLink.cfg.Switch.RemoveLink(h.link.ChanID())
h.coreLink.WaitForShutdown()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rational behind removing this?

Copy link
Contributor Author

@cfromknecht cfromknecht Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new RemoveLink inherently blocks until shutdown

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its godoc should be updated to note this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

@cfromknecht cfromknecht force-pushed the switch-revert-replace-link branch 4 times, most recently from ca941d8 to 0807d61 Compare July 30, 2018 21:35
@cfromknecht
Copy link
Contributor Author

Added a test to ensure that it is safe to call Stop(), even when forwarding incoming Adds (which must be done synchronously)

@cfromknecht cfromknecht force-pushed the switch-revert-replace-link branch 5 times, most recently from f78a8c3 to c8d624e Compare August 1, 2018 00:41
@cfromknecht cfromknecht force-pushed the switch-revert-replace-link branch 9 times, most recently from f4bfa51 to 9d26e3c Compare August 9, 2018 09:33
@Roasbeef
Copy link
Member

Can be rebased now that #1668 is in!

cfromknecht and others added 11 commits August 10, 2018 11:42
The new RemoveLink method blocks until the link has
been fully stopped, so we no longer need to wait
for it explicitly.
This commit adds a test that verifies Stop does not block
if the link is concurrently forwarding incoming Adds to
the switch. This test fails prior to the commits that
thread through the link's quit channel.
In this commit, we thread through a link's quit channel into
routeAsync, the primary helper method allowing links to send
htlcPackets through the switch. This is intended to remove
deadlocks from happening, where the link is synchronously
blocking on forwarding packets to the switch, but also
needs to shutdown.
@cfromknecht
Copy link
Contributor Author

rebased, PTAL

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🍕

Will run this on the faucet for a bit to attempt to flake out any regressions before merging into master.

srvrLog.Errorf("unable to remove channel link: %v",
err)
}
p.server.htlcSwitch.RemoveLink(link.ChanID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but still looks diff, it says:

server: use blocking RemoveLink to shutdown links

Yet it just now ignores the error.

@@ -556,6 +553,12 @@ func (p *peer) addLink(chanPoint *wire.OutPoint,

link := htlcswitch.NewChannelLink(linkCfg, lnChan)

// Before adding our new link, purge the switch of any pending or live
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, scratch that.

// switch, any failed packets will be returned to the provided
// ChannelLink. The link's quit signal should be provided to allow
// cancellation of forwarding during link shutdown.
ForwardPackets func(chan struct{}, ...*htlcPacket) chan error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noice

bobSwitch := n.firstBobChannelLink.cfg.Switch
ticker := bobSwitch.cfg.LogEventTicker.(*ticker.Mock)
timeout := time.After(15 * time.Second)
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent usage of the for/select idiom!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix first pass review done PR has had first pass of review, needs more tho htlcswitch needs testing PR hasn't yet been actively tested on testnet/mainnet P2 should be fixed if one has time safety General label for issues/PRs related to the safety of using the software
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants