Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htlcswitch: pipeline settles to switch #3143

Merged

Conversation

@Crypt-iQ
Copy link
Contributor

commented May 30, 2019

This PR makes the outgoing link pipeline the settle to the
switch as soon as it receives it. Previously, it would wait for a
revocation before sending it, which caused increased latency on
payments as well as possibly never settling on the incoming link.
A duplicate settle is still sent to the switch, but it is handled
gracefully. A new AckEventTicker was added to the switch which
acknowledges any pending settle / fail entries in an outgoing
link's fwd pkgs in batch. This was needed in order to reduce the
number of db txn's which would have been incurred by acking whenever
we receive a duplicate settle without batching.

Fixes #3069

@Crypt-iQ Crypt-iQ changed the title [WIP] htlcswitch: pipeline settles to switch htlcswitch: pipeline settles to switch May 31, 2019

@wpaulino wpaulino added this to the 0.7.1 milestone Jul 2, 2019

@wpaulino wpaulino added the v0.7.1 label Jul 2, 2019

@cfromknecht
Copy link
Collaborator

left a comment

@Crypt-iQ this looks great, only one minor comment. looking forward to benchmarking the performance improvement!

htlcswitch/link.go Outdated Show resolved Hide resolved
@Crypt-iQ

This comment has been minimized.

Copy link
Contributor Author

commented Jul 3, 2019

I have been running flakehunter unit tester on the failed test in travis and it hasn't failed... so I think it's unrelated. I have seen that error on my mac though (unrelated to this branch).

@cfromknecht
Copy link
Collaborator

left a comment

@Crypt-iQ a couple other small comments, fixups can be squashed as well!

htlcswitch/switch.go Outdated Show resolved Hide resolved
htlcswitch/switch.go Show resolved Hide resolved

@Roasbeef Roasbeef requested a review from wpaulino Jul 9, 2019

@wpaulino
Copy link
Collaborator

left a comment

Not a blocker, but since there'll be duplicate settles we'll see some error logs that aren't necessary and might lead to users thinking there's an issue. Below are the trace logs for a completed payment:

2019-07-09 18:55:57.835 [DBG] HSWC: ChannelLink(668de44edf52c85c7ec6e54e604dcfe88d6a312cb08afb7d25bf49eb11ecedb4:0): sampled fee rate for 3 block conf: 12500 sat/kw
2019-07-09 18:56:02.893 [TRC] HSWC: Committing fresh circuits: ([]channeldb.CircuitKey) (len=1 cap=1) {
 (channeldb.CircuitKey) (Chan ID=0:0:0, HTLC ID=3)
}
2019-07-09 18:56:02.919 [TRC] HSWC: ChannelLink(401:1:0) received switch packet inkey=(Chan ID=0:0:0, HTLC ID=3), outkey=(Chan ID=401:1:0, HTLC ID=0)
2019-07-09 18:56:02.919 [TRC] HSWC: ChannelLink(401:1:0) Received downstream htlc: payment_hash=71b691fe313daa0a8ff7b18824f755240561a11512a18fe12c7c745dee7ed532, local_log_index=2, batch_size=1
2019-07-09 18:56:02.919 [DBG] HSWC: ChannelLink(401:1:0) Queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=3)->(Chan ID=401:1:0, HTLC ID=2)
2019-07-09 18:56:02.972 [TRC] HSWC: Opening finalized circuits: ([]htlcswitch.Keystone) (len=1 cap=1) {
 (htlcswitch.Keystone) (Chan ID=0:0:0, HTLC ID=3) --> (Chan ID=401:1:0, HTLC ID=2)
}
2019-07-09 18:56:03.008 [DBG] HSWC: ChannelLink(401:1:0) removing Add packet (Chan ID=0:0:0, HTLC ID=3) from mailbox
2019-07-09 18:56:03.008 [TRC] HSWC: Deleting resolved circuits: ([]channeldb.CircuitKey) <nil>
2019-07-09 18:56:03.067 [TRC] HSWC: ChannelLink(401:1:0) processing 0 remote adds for height 5
2019-07-09 18:56:03.211 [DBG] HSWC: Closed completed SETTLE circuit for 71b691fe313daa0a8ff7b18824f755240561a11512a18fe12c7c745dee7ed532: (0:0:0, 3) <-> (401:1:0, 2)
2019-07-09 18:56:03.211 [DBG] HSWC: Tearing down open circuit with SETTLE pkt, removing circuit=(Chan ID=0:0:0, HTLC ID=3) with keystone=(Chan ID=401:1:0, HTLC ID=2)
2019-07-09 18:56:03.218 [TRC] HSWC: Deleting resolved circuits: ([]channeldb.CircuitKey) (len=1 cap=1) {
 (channeldb.CircuitKey) (Chan ID=0:0:0, HTLC ID=3)
}
2019-07-09 18:56:03.257 [DBG] HSWC: Closed completed SETTLE circuit for 71b691fe313daa0a8ff7b18824f755240561a11512a18fe12c7c745dee7ed532: (0:0:0, 3) <-> (401:1:0, 2)
2019-07-09 18:56:03.364 [TRC] HSWC: Deleting resolved circuits: ([]channeldb.CircuitKey) <nil>
2019-07-09 18:56:03.440 [DBG] HSWC: ChannelLink(401:1:0): settle-fail-filter &{1 [0]}
2019-07-09 18:56:03.440 [TRC] HSWC: ChannelLink(401:1:0) processing 0 remote adds for height 6
2019-07-09 18:56:03.440 [ERR] HSWC: Unable to find target channel for HTLC settle/fail: channel ID = 401:1:0, HTLC ID = 2
2019-07-09 18:56:03.441 [ERR] HSWC: ChannelLink(401:1:0) unhandled error while forwarding htlc packet over htlcswitch: Unable to find target channel for HTLC settle/fail: channel ID = 401:1:0, HTLC ID = 2
2019-07-09 18:56:12.673 [DBG] HSWC: Sent 100 satoshis and received 0 satoshis in the last 10 seconds (0.200000 tx/sec)
2019-07-09 18:56:18.475 [DBG] HSWC: Acked 1 settle fails: ([]channeldb.SettleFailRef) (len=1 cap=1) {
 (channeldb.SettleFailRef) {
  Source: (lnwire.ShortChannelID) 401:1:0,
  Height: (uint64) 6,
  Index: (uint16) 0
 }
}
htlcswitch/switch.go Show resolved Hide resolved
htlcswitch/switch.go Outdated Show resolved Hide resolved
server.go Outdated
@@ -430,6 +430,8 @@ func newServer(listenAddrs []net.Addr, chanDB *channeldb.DB, cc *chainControl,
htlcswitch.DefaultFwdEventInterval),
LogEventTicker: ticker.New(
htlcswitch.DefaultLogInterval),
AckEventTicker: ticker.New(

This comment has been minimized.

Copy link
@wpaulino

wpaulino Jul 10, 2019

Collaborator

Style nit: wrapping should be (if it doesn't fit in 80 chars):

ticker.New(
    htlcswitch.DefaultAckInterval,
)
// because the settles are pipelined to the switch and otherwise
// the bandwidth won't be updated by the time Alice receives a
// response here.
time.Sleep(2 * time.Second)

This comment has been minimized.

Copy link
@wpaulino

wpaulino Jul 10, 2019

Collaborator

We could use a mock ticker to send a force tick and avoid the sleep.

This comment has been minimized.

Copy link
@Crypt-iQ

Crypt-iQ Jul 11, 2019

Author Contributor

I'm a little confused about how this would work. We can feed ticks to the AckEventTicker, but that won't help since we need to actually wait for the revocation from Bob (the duplicate settle doesn't matter here) and there doesn't seem to be a straightforward way to be notified upon receiving a revocation. We could loop and timeout after some timeout interval, but that's not so clean either...

This comment has been minimized.

Copy link
@Crypt-iQ

Crypt-iQ Jul 11, 2019

Author Contributor

It seems TestChannelLinkSingleHopPayment fails for a similar reason and can be updated with this.

This comment has been minimized.

Copy link
@wpaulino

wpaulino Jul 11, 2019

Collaborator

Ah sorry, you're right. I see why the sleep is indeed needed, seems fine to me as is for now.

This comment has been minimized.

Copy link
@Crypt-iQ

Crypt-iQ Jul 11, 2019

Author Contributor

Hm, the TestChannelLinkSingleHopPayment no longer fails on my machine with flake-unit, so maybe it's just my Mac.

This comment has been minimized.

Copy link
@Crypt-iQ

Crypt-iQ Jul 12, 2019

Author Contributor

Nevermind, it flakes!

This comment has been minimized.

Copy link
@halseth

halseth Jul 18, 2019

Collaborator

Can use lntest.WaitPredicate instead?

This comment has been minimized.

Copy link
@Crypt-iQ

Crypt-iQ Jul 18, 2019

Author Contributor

Get an import cycle then, but we could just copy the function?

@Crypt-iQ

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

@wpaulino I wasn't entirely sure what to do about the extra error logs - I could remove it if the packet has a destRef field? What was once erroneous behavior is the norm so I think that'd make sense. What do you think?

@Crypt-iQ Crypt-iQ force-pushed the Crypt-iQ:pipelining_settle_0525 branch from 09f980e to 9f065dc Jul 11, 2019

@Crypt-iQ Crypt-iQ requested a review from Roasbeef as a code owner Jul 11, 2019

@wpaulino

This comment has been minimized.

Copy link
Collaborator

commented Jul 11, 2019

@wpaulino I wasn't entirely sure what to do about the extra error logs - I could remove it if the packet has a destRef field? What was once erroneous behavior is the norm so I think that'd make sense. What do you think?

Sounds good to me. I think @cfromknecht had some thoughts on possibly refactoring that portion of code to improve clarity of the new behavior, though I'd say it's not a blocker for this PR.

@Crypt-iQ

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

@wpaulino I wasn't entirely sure what to do about the extra error logs - I could remove it if the packet has a destRef field? What was once erroneous behavior is the norm so I think that'd make sense. What do you think?

Sounds good to me. I think @cfromknecht had some thoughts on possibly refactoring that portion of code to improve clarity of the new behavior, though I'd say it's not a blocker for this PR.

If we don't log the error, we should still return the error and it will still get reported in the link though. So idk if the change is worth it?

htlcswitch/switch.go Outdated Show resolved Hide resolved
htlcswitch/switch.go Show resolved Hide resolved
@Roasbeef
Copy link
Member

left a comment

This diff turned out to be much smaller than I had originally anticipated, me gusta!


// DefaultAckInterval is the duration between attempts to ack any settle
// fails in a forwarding package.
DefaultAckInterval = 15 * time.Second

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jul 17, 2019

Member

I would this we want this value to be much lower, like in the ms? The two above are for non-critical operations like logging.

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Jul 17, 2019

Collaborator

this isn't too critical, it just needs to happen at some point so we stop resending the htlc internally. if it doesn't happen before shutdown it will be done when on the next connection

lntest/itest/lnd_test.go Outdated Show resolved Hide resolved

@Crypt-iQ Crypt-iQ force-pushed the Crypt-iQ:pipelining_settle_0525 branch 3 times, most recently from 6f1c1c3 to 029956e Jul 17, 2019

@cfromknecht
Copy link
Collaborator

left a comment

LGTM! Nice work on this optimization @Crypt-iQ !! 💯

@Roasbeef

This comment has been minimized.

Copy link
Member

commented Jul 18, 2019

Needs a rebase @Crypt-iQ.

htlcswitch: pipeline settles to switch
This commit makes the outgoing link pipeline the settle to the
switch as soon as it receives it. Previously, it would wait for a
revocation before sending it, which caused increased latency on
payments as well as possibly never settling on the incoming link.
A duplicate settle is still sent to the switch, but it is handled
gracefully. A new AckEventTicker was added to the switch which
acknowledges any pending settle / fail entries in an outgoing
link's fwd pkgs in batch. This was needed in order to reduce the
number of db txn's which would have been incurred by acking whenever
we receive a duplicate settle without batching.

@Crypt-iQ Crypt-iQ force-pushed the Crypt-iQ:pipelining_settle_0525 branch from 029956e to 00814dc Jul 18, 2019

@Crypt-iQ

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2019

Rebased and squashed fixups @Roasbeef

@Roasbeef
Copy link
Member

left a comment

LGTM

@cfromknecht cfromknecht merged commit 807012f into lightningnetwork:master Jul 20, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.04%) to 60.527%
Details
@cfromknecht

This comment has been minimized.

Copy link
Collaborator

commented Jul 20, 2019

@Crypt-iQ Cheers to lower latency for anyone routing through LND!! 🍾🎉🔥

@Crypt-iQ Crypt-iQ deleted the Crypt-iQ:pipelining_settle_0525 branch Jul 20, 2019

@kincaidoneil kincaidoneil referenced this pull request Jul 29, 2019
0 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.