-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[htlcswitch]: revert replace link, ensure removed links are stopped #1551
[htlcswitch]: revert replace link, ensure removed links are stopped #1551
Conversation
c69203f
to
f09660b
Compare
2293ccf
to
982b394
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK!
server.go
Outdated
@@ -632,7 +632,11 @@ func newServer(listenAddrs []net.Addr, chanDB *channeldb.DB, cc *chainControl, | |||
ChainIO: cc.chainIO, | |||
MarkLinkInactive: func(chanPoint wire.OutPoint) error { | |||
chanID := lnwire.NewChanIDFromOutPoint(&chanPoint) | |||
return s.htlcSwitch.RemoveLink(chanID) | |||
link, err := s.htlcSwitch.RemoveLink(chanID) | |||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: return early in case of error
peer.go
Outdated
// down to ensure that the mailboxes are only ever under the control of | ||
// one link. | ||
oldLink, err := p.server.htlcSwitch.RemoveLink(link.ChanID()) | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log error
peer.go
Outdated
quit: p.quit, | ||
channel: channel, | ||
unregisterChannel: func(chanID lnwire.ChannelID) error { | ||
_, err := p.server.htlcSwitch.RemoveLink(chanID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link should not be stopped here?
peer.go
Outdated
return nil | ||
|
||
case err == htlcswitch.ErrChannelLinkNotFound: | ||
peerLog.Warnf("unable remove channel link with "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"unable to"
htlcswitch/switch.go
Outdated
@@ -1851,7 +1853,7 @@ func (s *Switch) getLinkByShortID(chanID lnwire.ShortChannelID) (ChannelLink, er | |||
|
|||
// RemoveLink is used to initiate the handling of the remove link command. The | |||
// request will be propagated/handled to/in the main goroutine. | |||
func (s *Switch) RemoveLink(chanID lnwire.ChannelID) error { | |||
func (s *Switch) RemoveLink(chanID lnwire.ChannelID) (ChannelLink, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
godoc
should be updated to reflect that it is now the caller's responsibility to stop the link.
htlcswitch/link.go
Outdated
@@ -431,6 +416,24 @@ func (l *channelLink) Start() error { | |||
return fmt.Errorf("unable to trim circuits above "+ | |||
"local htlc index %d: %v", localHtlcIndex, err) | |||
} | |||
|
|||
// Sine the link is live, before we start the link we'll update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/Sine/Since/
7836565
to
26732c4
Compare
// Now that all pending and live links have been removed from | ||
// the forwarding indexes, stop each one before shutting down. | ||
for _, link := range linksToStop { | ||
link.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The danger here is that a link could be sending a message to the switch for forwarding. So this could result in a deadlock. This is why we typically stopped the link in a goroutine. We'll need to do a sweep to ensure that any time the link is trying to forward, we're able to reliably cause it to exit via a signal from the switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, we now thread through the link's quit channel into ForwardPackets
so that calling Stop
breaks and blocking forwarding calls
|
||
return s.removeLink(chanID) | ||
if link != nil { | ||
link.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here.
htlcswitch/switch_test.go
Outdated
if err := s.RemoveLink(chanID1); err != nil { | ||
t.Fatalf("unable to remove alice link: %v", err) | ||
} | ||
s.RemoveLink(chanID1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we no longer checking the return value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics are now such that it can't fail, it will block until all state related to a channel is removed or doesn't exist. The new function signature on RemoveLink
doesn't return anything.
srvrLog.Errorf("unable to remove channel link: %v", | ||
err) | ||
} | ||
p.server.htlcSwitch.RemoveLink(link.ChanID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message looks to not actually match what's in the commit itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but still looks diff, it says:
server: use blocking RemoveLink to shutdown links
Yet it just now ignores the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RemoveLink
doesn't return an error. It just blocks until everything is purged
@@ -556,6 +553,12 @@ func (p *peer) addLink(chanPoint *wire.OutPoint, | |||
|
|||
link := htlcswitch.NewChannelLink(linkCfg, lnChan) | |||
|
|||
// Before adding our new link, purge the switch of any pending or live |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I think we might also need this in the execution path from funding mgr -> channel manager in the peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't both pathways call addLink
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, scratch that.
@@ -431,6 +416,24 @@ func (l *channelLink) Start() error { | |||
return fmt.Errorf("unable to trim circuits above "+ | |||
"local htlc index %d: %v", localHtlcIndex, err) | |||
} | |||
|
|||
// Since the link is live, before we start the link we'll update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on making this synchronous instead of dispatching? As is, it's not guaranteed to register before making a state update
@@ -3725,7 +3725,6 @@ func (h *persistentLinkHarness) restart(restartSwitch bool, | |||
|
|||
// First, remove the link from the switch. | |||
h.coreLink.cfg.Switch.RemoveLink(h.link.ChanID()) | |||
h.coreLink.WaitForShutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rational behind removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new RemoveLink
inherently blocks until shutdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its godoc
should be updated to note this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
ca941d8
to
0807d61
Compare
Added a test to ensure that it is safe to call |
f78a8c3
to
c8d624e
Compare
f4bfa51
to
9d26e3c
Compare
Can be rebased now that #1668 is in! |
This reverts commit e60d2b7.
current initialization methods
The new RemoveLink method blocks until the link has been fully stopped, so we no longer need to wait for it explicitly.
This commit adds a test that verifies Stop does not block if the link is concurrently forwarding incoming Adds to the switch. This test fails prior to the commits that thread through the link's quit channel.
In this commit, we thread through a link's quit channel into routeAsync, the primary helper method allowing links to send htlcPackets through the switch. This is intended to remove deadlocks from happening, where the link is synchronously blocking on forwarding packets to the switch, but also needs to shutdown.
9d26e3c
to
f84cd14
Compare
rebased, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🍕
Will run this on the faucet for a bit to attempt to flake out any regressions before merging into master.
srvrLog.Errorf("unable to remove channel link: %v", | ||
err) | ||
} | ||
p.server.htlcSwitch.RemoveLink(link.ChanID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but still looks diff, it says:
server: use blocking RemoveLink to shutdown links
Yet it just now ignores the error.
@@ -556,6 +553,12 @@ func (p *peer) addLink(chanPoint *wire.OutPoint, | |||
|
|||
link := htlcswitch.NewChannelLink(linkCfg, lnChan) | |||
|
|||
// Before adding our new link, purge the switch of any pending or live |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, scratch that.
// switch, any failed packets will be returned to the provided | ||
// ChannelLink. The link's quit signal should be provided to allow | ||
// cancellation of forwarding during link shutdown. | ||
ForwardPackets func(chan struct{}, ...*htlcPacket) chan error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noice
bobSwitch := n.firstBobChannelLink.cfg.Switch | ||
ticker := bobSwitch.cfg.LogEventTicker.(*ticker.Mock) | ||
timeout := time.After(15 * time.Second) | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent usage of the for/select idiom!
This PR reverts the switch's current
AddLink
behavior, such that it rejects duplicate links with the same channel id. At the moment, we will hot swap in another link without fully shutting down the first, which is dangerous as the two can share a mailbox.In a related commit, we changed
RemoveLink
such that it calledgo link.Stop()
, and processed in the background to avoid deadlocking on the forwarding index mutex. The result of this is that it is possible for a priorStop()
to still be executing by the time a new link is added with the same channel id.In order to recreate the behavior of replacing the old link with a new one,
RemoveLink
has been altered to return the removed link if it was found, and to not call/spawnlink.Stop()
at all. Instead, the caller is now responsible for stopping the link, which gets around the deadlocking w/in the switch.Finally, though
AddLink
rejects duplicate links, we've modified the peer to first try and remove any links going by the same channel id. If one is found, we ensure thatlink.Stop()
finishes before moving on to adding the link to the switch.