Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htlcswitch: implement strict forwarding for locally dispatched payments #1527

Conversation

wpaulino
Copy link
Contributor

In this PR, we address an issue that could arise when using the
SendToRoute RPC. In this RPC, we specify the exact hops that a payment
should take. However, within the switch, we would set a constraint for
the first hop to be any hop as long as the first peer was at the end of
it. This would cause discrepancies when attempting to use the RPC as the
payment would actually go through another hop with the same peer. We fix
this by explicitly specifying the channel ID of the first hop.

Fixes #1500 and #1515.

@Roasbeef Roasbeef added routing htlcswitch P3 might get fixed, nice to have labels Jul 11, 2018
@Roasbeef Roasbeef added needs review PR needs review by regular contributors needs rebase PR has merge conflicts labels Jul 11, 2018
@federicobond
Copy link
Contributor

@wpaulino this fixes only the first hop channel, not the rest of the channels. Is that correct?

@wpaulino
Copy link
Contributor Author

@federicobond Correct. We might consider adding a flag that enables strict forwarding all throughout.

@wpaulino wpaulino added needs testing PR hasn't yet been actively tested on testnet/mainnet and removed needs rebase PR has merge conflicts labels Jul 20, 2018
@wpaulino wpaulino force-pushed the local-dispatch-strict-forwarding branch from 71d40bf to 7e00637 Compare July 20, 2018 19:05
@federicobond
Copy link
Contributor

federicobond commented Jul 20, 2018

The BOLT for Onion Routing mentions a short_channel_id in the per_hop data of the Sphinx packet but I see that the Sphinx construction for lnd uses a pubkey, which makes it harder to strictly forward such packets.

Is the specification outdated or am I misreading it?

I'm working on a simulation using lnd and need such strict forwarding to compare results.

@Roasbeef
Copy link
Member

@federicobond it uses the pubkey as it's possible that we have multiple links to a particular node. At that instance, we know better than the source w.r.t which link we want to actually forward on. AFAIK, we're the only implementation that even supports having multiple channels with a particular peer.

@federicobond
Copy link
Contributor

federicobond commented Jul 21, 2018

Yes, and I agree it’s the most sensible thing to do for each hop, though I would be more careful with the answer from SendToRoute, which assumes that those are the actual channels taken along the route.

Do you think I might be able to provide optional channel hints in the extra data field? Would you be interested in incorporating that changeset upstream using a feature flag?

We need it for some rebalancing simulations we are doing at Muun.

@Roasbeef
Copy link
Member

though I would be more careful with the answer from SendToRoute, which assumes that those are the actual channels taken along the route.

For SendToRoute, strict forwarding only matters at the first outbound hop. Also note that it isn't possible to enforce that nodes send on a particular channel, only that they send to a particular pubkey destination. You may want to re-read this PR, as well as the original issue.

Do you think I might be able to provide optional channel hints in the extra data field? Would you be interested in incorporating that changeset upstream using a feature flag?

I'm not sure what you're referring to here.

@federicobond
Copy link
Contributor

For SendToRoute, strict forwarding only matters at the first outbound hop. Also note that it isn't possible to enforce that nodes send on a particular channel, only that they send to a particular pubkey destination.

For end users, that's true. But our use case is different as we are simulating flows of payments for rebalancing research and we need those payments to be routed exactly through certain channels. As we were running these simulation, we were thrown off by the response of SendToRoute, because it returns a list of specific channels, when those channels may not be the ones that were actually used. I think those channel IDs should be removed from the response, as they can only be guaranteed to be true for the first hop.

I'm not sure what you're referring to here.

We need some way to hint the next hop as to what specific channel should be chosen (the node can be configured to either read or ignore it).

@Roasbeef Roasbeef added this to the 0.5 milestone Aug 8, 2018
Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads well to me at fist read through!

@@ -608,16 +611,18 @@ func TestSendPaymentErrorPathPruning(t *testing.T) {
t.Fatalf("unable to fetch source node pub: %v", err)
}

roasbeefLuoji := lnwire.NewShortChanIDFromInt(689530843)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these IDs coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From basic_graph.json 😛

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would recommend using genIDs() to ensure we don't reuse short chan ids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different as the point of the test is to skip the hop between roasbeef and luoji and take an alternate path within the graph in basic_graph.json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I missed that this test also uses that graph. I had assumed you were reusing those chan ids in a diff test. I think should be okay then

@wpaulino wpaulino force-pushed the local-dispatch-strict-forwarding branch 2 times, most recently from 1bae284 to df7a0ec Compare August 10, 2018 10:03
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Changes look good to me apart from minor comments in test

@@ -608,16 +611,18 @@ func TestSendPaymentErrorPathPruning(t *testing.T) {
t.Fatalf("unable to fetch source node pub: %v", err)
}

roasbeefLuoji := lnwire.NewShortChanIDFromInt(689530843)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would recommend using genIDs() to ensure we don't reuse short chan ids

@@ -370,7 +371,7 @@ func TestSendPaymentErrorRepeatedFeeInsufficient(t *testing.T) {
// We'll also fetch the first outgoing channel edge from roasbeef to
// son goku. We'll obtain this as we'll need to to generate the
// FeeInsufficient error that we'll send back.
chanID := uint64(3495345)
chanID := uint64(12345)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment wrt randomizing sids

@joostjager
Copy link
Collaborator

I am working on the routing retry behaviour in #1706. If a route fails, it may be that edges are either marked as "failed once" or (on second failure) pruned out completely (not attempted anymore for this payment).

In the light of the discussion in this PR, it could be useful to also acknowledge in the retry behaviour that the actual channel being used isn't known. It may be better to associate failures not with specific channels, but with the two endpoints (pubkeys) of the channel. So instead of pruning channel 123, we prune the set of channels from node A to node B. (In case of a single channel, the result is the same)

@@ -364,8 +365,8 @@ func (s *Switch) SendHTLC(nextNode [33]byte, htlc *lnwire.UpdateAddHTLC,
// system and something wrong happened.
packet := &htlcPacket{
incomingChanID: sourceHop,
outgoingChanID: firstHop,
Copy link
Contributor

@cfromknecht cfromknecht Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would move this to be after incomingHTLCID for clarity, each chanid/htlcid pair corresponds to one circuit key. maintaining those groupings has proved helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@wpaulino wpaulino force-pushed the local-dispatch-strict-forwarding branch from df7a0ec to 864abd6 Compare August 15, 2018 22:44
@cfromknecht
Copy link
Contributor

@joostjager most certainly, that's something we've discussed before as it's very related to this PR.

Since we don't have certainty about what channel the other hops try, the feedback to the router is lossy (and potentially malicious/incorrect). With sufficient fees, multiple channels between the same peers can be effectively treated as one channel from the source's perspective, so the suggestion to prune all channels connecting the pubkeys makes sense to me.

It also shouldn't be an issue if we assume that a node will always try the "best" (or at least sufficient) channel first, as then there'd no point retrying if the others are worse. However, this change could still cause us to miss a possible route if they maintain a strict forwarding policy.

Atm we will try to find the hop with the most bandwidth. I think c-lightning has a max of one channel per peer, so it wouldn't be an issue there. Eclair looks like it abides by the channel requested in the payload, though as a disclaimer, my scala is a little rusty. Just based on that knowledge, it may produce less-than-optimal results when routing through an eclair node that has multiple outgoing channels with the next peer.

@ZapUser77
Copy link

Don't we encapsulate which channels to use? If we just specify the nodes... we could end up with problems: Say I have two channels from A to B:

I then set the fees of Channel 1 to zero, and channel 2 to 5 BTC... if the sender doesn't know which channel it's going to use... wouldn't this be a problem if others tried to route through that node? If the sender is leaving the determination of which channel to use to the hops... couldn't this be exploited in some way, having different fees for different channels?

Multiple channels definitely should not be treated the same. The best way to get liquidity is to open a channel to a peer, and then have them open a channel back. You start out with two channels that are excessively unbalanced, but as a pair they perfectly balanced.

C-lightning allows one channel to be opened from each side, so two channels total between peers.

We don't try to find the hop with the most bandwidth, we try to find one with enough bandwidth that cost the least in fees. And there's no way to tell what the local balance is on a remote node's channel, so you have to try all of them that fit.

And from my understanding (which definitely could be wrong) eclair nodes explicitly do not allow "being routed through." But that could just be the phone app... I haven't looked into their desktop app.

As far as pruning an entire route... what exactly do you mean? Do you mean remove all edges along that path from ever being tried again, individually? Or Just the route as a collection? How would that affect channels with rapidly shifting balances that just happened to be insufficient that millisecond that you tried to send through it... but is fine the next?

Is there no way to get feedback on exactly which channel failed?

@Roasbeef
Copy link
Member

@ZapUser77 I'd recommend taking a look at the code. All those questions can be answered at a quick glance. All the information is available, just on the RPC layer it may not always be bubbled up. There's a PR to return more accurate failure information back to each peer. The sender always knows which channel it's using. Note that this PR is for local dispatch, so we always traverse the channel specified.

@joostjager
Copy link
Collaborator

@cfromknecht I opened a new PR with some additional analysis on the pruning question, #1734

@ZapUser77
Copy link

"I'd recommend taking a look at the code. All those questions can be answered at a quick glance."
No. No they can't. You can't expect everyone that you speak with to have in-depth knowledge of every bit of code in LND. Something that's a 'quick glance' for you... could be hours/days for others.

I was replying specifically to "Since we don't have certainty about what channel the other hops try", as what you're saying "The sender always knows which channel it's using." directly contradicts. And from the bits of code that I had seen, I believed the sender always knew... but didn't want to be a **** about it.

@Roasbeef
Copy link
Member

@ZapUser77 if you're attempting to join discussions concerning code architecture and design tradeoffs, you're expected to do your homework such that you can positively contribute to the conversation. This is open source, resources are scarce, we can't sit down an explain all prior decisions on manually walk through the code, you're expected to have a certain level of knowledge if you jump into a discussion.

The sender always knows. We use source routing.

@cfromknecht
Copy link
Contributor

@joostjager thanks, I will take a look!

@ZapUser77 the sender always knows which channels it wants to use. The sender also knows the outgoing channel they actually use because they choose a channel they own and explicitly update that channel with the direct peer. Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel and there's no way for us to verify which channel they used.

Is there no way to get feedback on exactly which channel failed?

Nope, a node other than your direct peer could lie about which channel it tried to update, or even if it tried at all.

@ZapUser77
Copy link

Thank you @cfromknecht.

@Roasbeef: I have looked at the code... but no, I haven't read and walked through all 100,000+ lines. Rather that be condescending, perhaps you can just ignore my comments that aren't directed towards you? I'm not asking to be walk you through anything. I know you're busy. And I intentionally do not direct any of my statements to you, as every time you communicate with me, I get this distinct feeling of extreme disdain you have for me. [And if you treat all newcomers to the project like that... that may be why you only have 3-5 people doing 95% of the work.]

"The sender always knows. We use source routing" I know that. However, what You are saying is directly contradicting what @cfromknecht is saying "Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel".

Anyway, I'm trying to lighten your workload, not add more too it. It'll take some time for me to work through all the code, and understand it fully.

@Roasbeef
Copy link
Member

Roasbeef commented Aug 17, 2018

@ZapUser77 if you treat us with respect, we'll do the same towards you. Your veiled jabs at those that actively contribute to the project are unwarranted, and unproductive.

Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel

They aren't required, but they can only send the HTLC to the target node. Therefore, they can't deviate beyond the path aside from cancelling or just holding onto the HTLC. Based on the discussion here, I think we're going to go with strict forwarding everywhere, so even when we forward remote HTLCs.

@ZapUser77
Copy link

@Roasbeef o_0
Umm, what? I've never made any jabs or shown any disrespect to any of the LND contributors. I have the utmost respect for all of them, esp you. I know you all have drastically more specific knowledge in this field than me.

If you could point me towards anything that you felt was disrespectful to anyone, I'll apologize for it.

"They aren't required" So what Conner said is correct, and you're statement "The sender always knows" is incorrect? I was under the impression that the source picked the channels to use, but you're saying it only picks the nodes, and 'recommends' channels.

@wpaulino wpaulino force-pushed the local-dispatch-strict-forwarding branch from 864abd6 to b24551b Compare August 18, 2018 00:25
@wpaulino
Copy link
Contributor Author

This PR has been amended to enable strict forwarding all throughout.

In this commit, we address an issue that could arise when using the
SendToRoute RPC. In this RPC, we specify the exact hops that a payment
should take. However, within the switch, we would set a constraint for
the first hop to be any hop as long as the first peer was at the end of
it. This would cause discrepancies when attempting to use the RPC as the
payment would actually go through another hop with the same peer. We fix
this by explicitly specifying the channel ID of the first hop.
In this commit, we constrain our packet forwarding so that the packet is
_deterministically_ forwarded to the outgoing channel as before we would
attempt to select the best link out of all possible links with the
destination peer. This is needed as users begin to manually rebalance
their channels with the SendToRoute RPC, expecting their payment to take
the _exact_ route they crafted.
@wpaulino wpaulino force-pushed the local-dispatch-strict-forwarding branch from b24551b to 5f5e44b Compare August 18, 2018 00:32
@joostjager
Copy link
Collaborator

@wpaulino Is this really a good idea? If a forwarding node has multiple channels with varying balances open to the same next hop, the source node needs to try them all out until it finds the one that can carry the payment. To me it looks like sacrificing the performance of normal payments to benefit the specialized operation of rebalancing.

Besides, you can never be sure that every node is really forwarding strictly. A routing node might reverse this change to generate less failed payments and improve its reputation once there is some kind of reputation management for nodes in lnd. Also, as far as I know the discussion hasn't been closed whether rebalancing should be an action or can also be carried out passively by manipulating fees.

And for rebalancing, how exactly does it hurt to have the forwarding node select the channel? And if it does hurt and make the operation less efficient, I am quite sure that is still possible to come up with an algorithm that takes into account the uncertainty of the chosen channel. For execution of payments, a similar challenge exists: how to best process the failure message to maximize the chance of the next route to succeed.

@ZapUser77
Copy link

Balancing can not be performed by fees. Fees only affect 'outgoing' transaction, so they'd be no benefit to channels that you need incoming transactions on.

"how exactly does it hurt to have the forwarding node select the channel?" Can they not set the fee of one channel really low to get it to show up as the "cheapest hop" then refuse to use it, and instead pick a much more expensive channel to the same node? Is there some mechanism already to prevent that?

@Roasbeef
Copy link
Member

and instead pick a much more expensive channel to the same node? Is there some mechanism already to prevent that?

The sender will select that cheapest hop, and provide the exact amount required to traverse that link. So all fees are already pre-computed, if one nodes takes more than advertised, then there won't be enough fees to distribute to the rest of the nodes, so the route will fail downstream.

Balancing can not be performed by fees.

It's one piece of the solution. Consider that you can increase fees for a link in order to prevent the bandwidth from being further depleted. Also consider that by decreasing fees on certain links, i can incentivize HTLCs to be routed towards be, thereby giving me higher bandwidth on a select set of links.

I tend to agree that it's better network wide for us to select the best link to our ability for the case of remote HTLCs.

@ZapUser77
Copy link

Scenario:
Channel A is all on my side, Channel B is balanced, Channel C is all on far side.
Higher fees only help prevent money from going out a specific channel. It doesn't help make money come in a specific channel.
So, adding a high fee to C wouldn't matter as it's already unbalanced to the max.
And lowering fees doesn't make that channel be used either.
Setting fees on A to zero won't help if 100% of the natural flow is inward.
An actual rebalance command is a valid solution to the need for balancing. (Which I should have in a day or three)

P.S. Thank you for being respectful with your reply.

@Roasbeef
Copy link
Member

@federicobond there are no actual guarantees w.r.t which link a node will select. As a result, any rebalancing code should itself be written to account for nodes having leeway w.r.t which channel they forward over.

@Roasbeef
Copy link
Member

Simplified version (no strict remote) committed as 7535371.

@Roasbeef Roasbeef closed this Aug 21, 2018
@wpaulino wpaulino deleted the local-dispatch-strict-forwarding branch August 21, 2018 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
htlcswitch needs review PR needs review by regular contributors needs testing PR hasn't yet been actively tested on testnet/mainnet P3 might get fixed, nice to have routing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

payinvoice gives wrong path
7 participants