htlcswitch: implement strict forwarding for locally dispatched payments #1527

wpaulino · 2018-07-10T01:23:30Z

In this PR, we address an issue that could arise when using the
SendToRoute RPC. In this RPC, we specify the exact hops that a payment
should take. However, within the switch, we would set a constraint for
the first hop to be any hop as long as the first peer was at the end of
it. This would cause discrepancies when attempting to use the RPC as the
payment would actually go through another hop with the same peer. We fix
this by explicitly specifying the channel ID of the first hop.

Fixes #1500 and #1515.

federicobond · 2018-07-20T18:27:50Z

@wpaulino this fixes only the first hop channel, not the rest of the channels. Is that correct?

wpaulino · 2018-07-20T19:01:15Z

@federicobond Correct. We might consider adding a flag that enables strict forwarding all throughout.

federicobond · 2018-07-20T20:57:53Z

The BOLT for Onion Routing mentions a short_channel_id in the per_hop data of the Sphinx packet but I see that the Sphinx construction for lnd uses a pubkey, which makes it harder to strictly forward such packets.

Is the specification outdated or am I misreading it?

I'm working on a simulation using lnd and need such strict forwarding to compare results.

Roasbeef · 2018-07-21T00:05:39Z

@federicobond it uses the pubkey as it's possible that we have multiple links to a particular node. At that instance, we know better than the source w.r.t which link we want to actually forward on. AFAIK, we're the only implementation that even supports having multiple channels with a particular peer.

federicobond · 2018-07-21T00:34:54Z

Yes, and I agree it’s the most sensible thing to do for each hop, though I would be more careful with the answer from SendToRoute, which assumes that those are the actual channels taken along the route.

Do you think I might be able to provide optional channel hints in the extra data field? Would you be interested in incorporating that changeset upstream using a feature flag?

We need it for some rebalancing simulations we are doing at Muun.

Roasbeef · 2018-07-22T00:26:09Z

though I would be more careful with the answer from SendToRoute, which assumes that those are the actual channels taken along the route.

For SendToRoute, strict forwarding only matters at the first outbound hop. Also note that it isn't possible to enforce that nodes send on a particular channel, only that they send to a particular pubkey destination. You may want to re-read this PR, as well as the original issue.

Do you think I might be able to provide optional channel hints in the extra data field? Would you be interested in incorporating that changeset upstream using a feature flag?

I'm not sure what you're referring to here.

federicobond · 2018-07-23T17:42:28Z

For SendToRoute, strict forwarding only matters at the first outbound hop. Also note that it isn't possible to enforce that nodes send on a particular channel, only that they send to a particular pubkey destination.

For end users, that's true. But our use case is different as we are simulating flows of payments for rebalancing research and we need those payments to be routed exactly through certain channels. As we were running these simulation, we were thrown off by the response of SendToRoute, because it returns a list of specific channels, when those channels may not be the ones that were actually used. I think those channel IDs should be removed from the response, as they can only be guaranteed to be true for the first hop.

I'm not sure what you're referring to here.

We need some way to hint the next hop as to what specific channel should be chosen (the node can be configured to either read or ignore it).

halseth

Reads well to me at fist read through!

halseth · 2018-08-10T07:54:59Z

routing/router_test.go

@@ -608,16 +611,18 @@ func TestSendPaymentErrorPathPruning(t *testing.T) {
 		t.Fatalf("unable to fetch source node pub: %v", err)
 	}

+	roasbeefLuoji := lnwire.NewShortChanIDFromInt(689530843)


Where are these IDs coming from?

From basic_graph.json 😛

would recommend using genIDs() to ensure we don't reuse short chan ids

This is different as the point of the test is to skip the hop between roasbeef and luoji and take an alternate path within the graph in basic_graph.json.

Gotcha, I missed that this test also uses that graph. I had assumed you were reusing those chan ids in a diff test. I think should be okay then

cfromknecht

Nice work! Changes look good to me apart from minor comments in test

cfromknecht · 2018-08-10T17:50:33Z

routing/router_test.go

@@ -608,16 +611,18 @@ func TestSendPaymentErrorPathPruning(t *testing.T) {
 		t.Fatalf("unable to fetch source node pub: %v", err)
 	}

+	roasbeefLuoji := lnwire.NewShortChanIDFromInt(689530843)


would recommend using genIDs() to ensure we don't reuse short chan ids

cfromknecht · 2018-08-10T17:50:58Z

routing/router_test.go

@@ -370,7 +371,7 @@ func TestSendPaymentErrorRepeatedFeeInsufficient(t *testing.T) {
 	// We'll also fetch the first outgoing channel edge from roasbeef to
 	// son goku. We'll obtain this as we'll need to to generate the
 	// FeeInsufficient error that we'll send back.
-	chanID := uint64(3495345)
+	chanID := uint64(12345)


similar comment wrt randomizing sids

joostjager · 2018-08-13T08:08:57Z

I am working on the routing retry behaviour in #1706. If a route fails, it may be that edges are either marked as "failed once" or (on second failure) pruned out completely (not attempted anymore for this payment).

In the light of the discussion in this PR, it could be useful to also acknowledge in the retry behaviour that the actual channel being used isn't known. It may be better to associate failures not with specific channels, but with the two endpoints (pubkeys) of the channel. So instead of pruning channel 123, we prune the set of channels from node A to node B. (In case of a single channel, the result is the same)

cfromknecht · 2018-08-15T22:28:12Z

htlcswitch/switch.go

@@ -364,8 +365,8 @@ func (s *Switch) SendHTLC(nextNode [33]byte, htlc *lnwire.UpdateAddHTLC,
 	// system and something wrong happened.
 	packet := &htlcPacket{
 		incomingChanID: sourceHop,
+		outgoingChanID: firstHop,


would move this to be after incomingHTLCID for clarity, each chanid/htlcid pair corresponds to one circuit key. maintaining those groupings has proved helpful

cfromknecht · 2018-08-15T23:54:30Z

@joostjager most certainly, that's something we've discussed before as it's very related to this PR.

Since we don't have certainty about what channel the other hops try, the feedback to the router is lossy (and potentially malicious/incorrect). With sufficient fees, multiple channels between the same peers can be effectively treated as one channel from the source's perspective, so the suggestion to prune all channels connecting the pubkeys makes sense to me.

It also shouldn't be an issue if we assume that a node will always try the "best" (or at least sufficient) channel first, as then there'd no point retrying if the others are worse. However, this change could still cause us to miss a possible route if they maintain a strict forwarding policy.

Atm we will try to find the hop with the most bandwidth. I think c-lightning has a max of one channel per peer, so it wouldn't be an issue there. Eclair looks like it abides by the channel requested in the payload, though as a disclaimer, my scala is a little rusty. Just based on that knowledge, it may produce less-than-optimal results when routing through an eclair node that has multiple outgoing channels with the next peer.

ZapUser77 · 2018-08-16T02:13:30Z

Don't we encapsulate which channels to use? If we just specify the nodes... we could end up with problems: Say I have two channels from A to B:

I then set the fees of Channel 1 to zero, and channel 2 to 5 BTC... if the sender doesn't know which channel it's going to use... wouldn't this be a problem if others tried to route through that node? If the sender is leaving the determination of which channel to use to the hops... couldn't this be exploited in some way, having different fees for different channels?

Multiple channels definitely should not be treated the same. The best way to get liquidity is to open a channel to a peer, and then have them open a channel back. You start out with two channels that are excessively unbalanced, but as a pair they perfectly balanced.

C-lightning allows one channel to be opened from each side, so two channels total between peers.

We don't try to find the hop with the most bandwidth, we try to find one with enough bandwidth that cost the least in fees. And there's no way to tell what the local balance is on a remote node's channel, so you have to try all of them that fit.

And from my understanding (which definitely could be wrong) eclair nodes explicitly do not allow "being routed through." But that could just be the phone app... I haven't looked into their desktop app.

As far as pruning an entire route... what exactly do you mean? Do you mean remove all edges along that path from ever being tried again, individually? Or Just the route as a collection? How would that affect channels with rapidly shifting balances that just happened to be insufficient that millisecond that you tried to send through it... but is fine the next?

Is there no way to get feedback on exactly which channel failed?

Roasbeef · 2018-08-16T03:15:19Z

@ZapUser77 I'd recommend taking a look at the code. All those questions can be answered at a quick glance. All the information is available, just on the RPC layer it may not always be bubbled up. There's a PR to return more accurate failure information back to each peer. The sender always knows which channel it's using. Note that this PR is for local dispatch, so we always traverse the channel specified.

joostjager · 2018-08-16T10:55:48Z

@cfromknecht I opened a new PR with some additional analysis on the pruning question, #1734

ZapUser77 · 2018-08-16T16:39:13Z

"I'd recommend taking a look at the code. All those questions can be answered at a quick glance."
No. No they can't. You can't expect everyone that you speak with to have in-depth knowledge of every bit of code in LND. Something that's a 'quick glance' for you... could be hours/days for others.

I was replying specifically to "Since we don't have certainty about what channel the other hops try", as what you're saying "The sender always knows which channel it's using." directly contradicts. And from the bits of code that I had seen, I believed the sender always knew... but didn't want to be a **** about it.

Roasbeef · 2018-08-17T00:11:24Z

@ZapUser77 if you're attempting to join discussions concerning code architecture and design tradeoffs, you're expected to do your homework such that you can positively contribute to the conversation. This is open source, resources are scarce, we can't sit down an explain all prior decisions on manually walk through the code, you're expected to have a certain level of knowledge if you jump into a discussion.

The sender always knows. We use source routing.

cfromknecht · 2018-08-17T01:03:51Z

@joostjager thanks, I will take a look!

@ZapUser77 the sender always knows which channels it wants to use. The sender also knows the outgoing channel they actually use because they choose a channel they own and explicitly update that channel with the direct peer. Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel and there's no way for us to verify which channel they used.

Is there no way to get feedback on exactly which channel failed?

Nope, a node other than your direct peer could lie about which channel it tried to update, or even if it tried at all.

ZapUser77 · 2018-08-17T02:41:59Z

Thank you @cfromknecht.

@Roasbeef: I have looked at the code... but no, I haven't read and walked through all 100,000+ lines. Rather that be condescending, perhaps you can just ignore my comments that aren't directed towards you? I'm not asking to be walk you through anything. I know you're busy. And I intentionally do not direct any of my statements to you, as every time you communicate with me, I get this distinct feeling of extreme disdain you have for me. [And if you treat all newcomers to the project like that... that may be why you only have 3-5 people doing 95% of the work.]

"The sender always knows. We use source routing" I know that. However, what You are saying is directly contradicting what @cfromknecht is saying "Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel".

Anyway, I'm trying to lighten your workload, not add more too it. It'll take some time for me to work through all the code, and understand it fully.

Roasbeef · 2018-08-17T02:50:01Z

@ZapUser77 if you treat us with respect, we'll do the same towards you. Your veiled jabs at those that actively contribute to the project are unwarranted, and unproductive.

Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel

They aren't required, but they can only send the HTLC to the target node. Therefore, they can't deviate beyond the path aside from cancelling or just holding onto the HTLC. Based on the discussion here, I think we're going to go with strict forwarding everywhere, so even when we forward remote HTLCs.

ZapUser77 · 2018-08-17T03:01:55Z

@Roasbeef o_0
Umm, what? I've never made any jabs or shown any disrespect to any of the LND contributors. I have the utmost respect for all of them, esp you. I know you all have drastically more specific knowledge in this field than me.

If you could point me towards anything that you felt was disrespectful to anyone, I'll apologize for it.

"They aren't required" So what Conner said is correct, and you're statement "The sender always knows" is incorrect? I was under the impression that the source picked the channels to use, but you're saying it only picks the nodes, and 'recommends' channels.

wpaulino · 2018-08-18T00:27:50Z

This PR has been amended to enable strict forwarding all throughout.

In this commit, we address an issue that could arise when using the SendToRoute RPC. In this RPC, we specify the exact hops that a payment should take. However, within the switch, we would set a constraint for the first hop to be any hop as long as the first peer was at the end of it. This would cause discrepancies when attempting to use the RPC as the payment would actually go through another hop with the same peer. We fix this by explicitly specifying the channel ID of the first hop.

In this commit, we constrain our packet forwarding so that the packet is _deterministically_ forwarded to the outgoing channel as before we would attempt to select the best link out of all possible links with the destination peer. This is needed as users begin to manually rebalance their channels with the SendToRoute RPC, expecting their payment to take the _exact_ route they crafted.

joostjager · 2018-08-18T05:35:10Z

@wpaulino Is this really a good idea? If a forwarding node has multiple channels with varying balances open to the same next hop, the source node needs to try them all out until it finds the one that can carry the payment. To me it looks like sacrificing the performance of normal payments to benefit the specialized operation of rebalancing.

Besides, you can never be sure that every node is really forwarding strictly. A routing node might reverse this change to generate less failed payments and improve its reputation once there is some kind of reputation management for nodes in lnd. Also, as far as I know the discussion hasn't been closed whether rebalancing should be an action or can also be carried out passively by manipulating fees.

And for rebalancing, how exactly does it hurt to have the forwarding node select the channel? And if it does hurt and make the operation less efficient, I am quite sure that is still possible to come up with an algorithm that takes into account the uncertainty of the chosen channel. For execution of payments, a similar challenge exists: how to best process the failure message to maximize the chance of the next route to succeed.

ZapUser77 · 2018-08-18T10:07:32Z

Balancing can not be performed by fees. Fees only affect 'outgoing' transaction, so they'd be no benefit to channels that you need incoming transactions on.

"how exactly does it hurt to have the forwarding node select the channel?" Can they not set the fee of one channel really low to get it to show up as the "cheapest hop" then refuse to use it, and instead pick a much more expensive channel to the same node? Is there some mechanism already to prevent that?

Roasbeef · 2018-08-21T00:34:05Z

and instead pick a much more expensive channel to the same node? Is there some mechanism already to prevent that?

The sender will select that cheapest hop, and provide the exact amount required to traverse that link. So all fees are already pre-computed, if one nodes takes more than advertised, then there won't be enough fees to distribute to the rest of the nodes, so the route will fail downstream.

Balancing can not be performed by fees.

It's one piece of the solution. Consider that you can increase fees for a link in order to prevent the bandwidth from being further depleted. Also consider that by decreasing fees on certain links, i can incentivize HTLCs to be routed towards be, thereby giving me higher bandwidth on a select set of links.

I tend to agree that it's better network wide for us to select the best link to our ability for the case of remote HTLCs.

ZapUser77 · 2018-08-21T00:58:56Z

Scenario:
Channel A is all on my side, Channel B is balanced, Channel C is all on far side.
Higher fees only help prevent money from going out a specific channel. It doesn't help make money come in a specific channel.
So, adding a high fee to C wouldn't matter as it's already unbalanced to the max.
And lowering fees doesn't make that channel be used either.
Setting fees on A to zero won't help if 100% of the natural flow is inward.
An actual rebalance command is a valid solution to the need for balancing. (Which I should have in a day or three)

P.S. Thank you for being respectful with your reply.

Roasbeef · 2018-08-21T03:59:38Z

@federicobond there are no actual guarantees w.r.t which link a node will select. As a result, any rebalancing code should itself be written to account for nodes having leeway w.r.t which channel they forward over.

Roasbeef · 2018-08-21T04:20:49Z

Simplified version (no strict remote) committed as 7535371.

Roasbeef added routing htlcswitch P3 might get fixed, nice to have labels Jul 11, 2018

Roasbeef requested a review from cfromknecht July 11, 2018 00:20

Roasbeef added needs review PR needs review by regular contributors needs rebase PR has merge conflicts labels Jul 11, 2018

wpaulino added needs testing PR hasn't yet been actively tested on testnet/mainnet and removed needs rebase PR has merge conflicts labels Jul 20, 2018

wpaulino force-pushed the local-dispatch-strict-forwarding branch from 71d40bf to 7e00637 Compare July 20, 2018 19:05

Roasbeef added this to the 0.5 milestone Aug 8, 2018

halseth reviewed Aug 10, 2018

View reviewed changes

wpaulino force-pushed the local-dispatch-strict-forwarding branch 2 times, most recently from 1bae284 to df7a0ec Compare August 10, 2018 10:03

cfromknecht reviewed Aug 10, 2018

View reviewed changes

cfromknecht reviewed Aug 15, 2018

View reviewed changes

wpaulino force-pushed the local-dispatch-strict-forwarding branch from df7a0ec to 864abd6 Compare August 15, 2018 22:44

joostjager mentioned this pull request Aug 16, 2018

routing: prune based on channel sets instead of channels #1734

Merged

wpaulino force-pushed the local-dispatch-strict-forwarding branch from 864abd6 to b24551b Compare August 18, 2018 00:25

wpaulino added 4 commits August 17, 2018 17:32

htlcswitch: remove no longer needed destNode field in htlcPacket

aac8214

router+server: use first hop channel id rather than node key

5f5e44b

wpaulino force-pushed the local-dispatch-strict-forwarding branch from b24551b to 5f5e44b Compare August 18, 2018 00:32

Roasbeef closed this Aug 21, 2018

wpaulino deleted the local-dispatch-strict-forwarding branch August 21, 2018 15:26

htlcswitch: implement strict forwarding for locally dispatched payments #1527

htlcswitch: implement strict forwarding for locally dispatched payments #1527

Conversation

wpaulino commented Jul 10, 2018

federicobond commented Jul 20, 2018

wpaulino commented Jul 20, 2018

federicobond commented Jul 20, 2018 • edited Loading

Roasbeef commented Jul 21, 2018

federicobond commented Jul 21, 2018 • edited Loading

Roasbeef commented Jul 22, 2018

federicobond commented Jul 23, 2018

halseth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfromknecht left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager commented Aug 13, 2018

cfromknecht Aug 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfromknecht commented Aug 15, 2018

ZapUser77 commented Aug 16, 2018

Roasbeef commented Aug 16, 2018

joostjager commented Aug 16, 2018

ZapUser77 commented Aug 16, 2018

Roasbeef commented Aug 17, 2018

cfromknecht commented Aug 17, 2018

ZapUser77 commented Aug 17, 2018

Roasbeef commented Aug 17, 2018 • edited Loading

ZapUser77 commented Aug 17, 2018

wpaulino commented Aug 18, 2018

joostjager commented Aug 18, 2018

ZapUser77 commented Aug 18, 2018

Roasbeef commented Aug 21, 2018

ZapUser77 commented Aug 21, 2018

Roasbeef commented Aug 21, 2018

Roasbeef commented Aug 21, 2018

federicobond commented Jul 20, 2018 •

edited

Loading

federicobond commented Jul 21, 2018 •

edited

Loading

cfromknecht Aug 15, 2018 •

edited

Loading

Roasbeef commented Aug 17, 2018 •

edited

Loading