New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htlcswitch: implement strict forwarding for locally dispatched payments #1527

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@wpaulino
Collaborator

wpaulino commented Jul 10, 2018

In this PR, we address an issue that could arise when using the
SendToRoute RPC. In this RPC, we specify the exact hops that a payment
should take. However, within the switch, we would set a constraint for
the first hop to be any hop as long as the first peer was at the end of
it. This would cause discrepancies when attempting to use the RPC as the
payment would actually go through another hop with the same peer. We fix
this by explicitly specifying the channel ID of the first hop.

Fixes #1500 and #1515.

@federicobond

This comment has been minimized.

federicobond commented Jul 20, 2018

@wpaulino this fixes only the first hop channel, not the rest of the channels. Is that correct?

@wpaulino

This comment has been minimized.

Collaborator

wpaulino commented Jul 20, 2018

@federicobond Correct. We might consider adding a flag that enables strict forwarding all throughout.

@wpaulino wpaulino added needs testing and removed needs rebase labels Jul 20, 2018

@wpaulino wpaulino force-pushed the wpaulino:local-dispatch-strict-forwarding branch from 71d40bf to 7e00637 Jul 20, 2018

@federicobond

This comment has been minimized.

federicobond commented Jul 20, 2018

The BOLT for Onion Routing mentions a short_channel_id in the per_hop data of the Sphinx packet but I see that the Sphinx construction for lnd uses a pubkey, which makes it harder to strictly forward such packets.

Is the specification outdated or am I misreading it?

I'm working on a simulation using lnd and need such strict forwarding to compare results.

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Jul 21, 2018

@federicobond it uses the pubkey as it's possible that we have multiple links to a particular node. At that instance, we know better than the source w.r.t which link we want to actually forward on. AFAIK, we're the only implementation that even supports having multiple channels with a particular peer.

@federicobond

This comment has been minimized.

federicobond commented Jul 21, 2018

Yes, and I agree it’s the most sensible thing to do for each hop, though I would be more careful with the answer from SendToRoute, which assumes that those are the actual channels taken along the route.

Do you think I might be able to provide optional channel hints in the extra data field? Would you be interested in incorporating that changeset upstream using a feature flag?

We need it for some rebalancing simulations we are doing at Muun.

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Jul 22, 2018

though I would be more careful with the answer from SendToRoute, which assumes that those are the actual channels taken along the route.

For SendToRoute, strict forwarding only matters at the first outbound hop. Also note that it isn't possible to enforce that nodes send on a particular channel, only that they send to a particular pubkey destination. You may want to re-read this PR, as well as the original issue.

Do you think I might be able to provide optional channel hints in the extra data field? Would you be interested in incorporating that changeset upstream using a feature flag?

I'm not sure what you're referring to here.

@federicobond

This comment has been minimized.

federicobond commented Jul 23, 2018

For SendToRoute, strict forwarding only matters at the first outbound hop. Also note that it isn't possible to enforce that nodes send on a particular channel, only that they send to a particular pubkey destination.

For end users, that's true. But our use case is different as we are simulating flows of payments for rebalancing research and we need those payments to be routed exactly through certain channels. As we were running these simulation, we were thrown off by the response of SendToRoute, because it returns a list of specific channels, when those channels may not be the ones that were actually used. I think those channel IDs should be removed from the response, as they can only be guaranteed to be true for the first hop.

I'm not sure what you're referring to here.

We need some way to hint the next hop as to what specific channel should be chosen (the node can be configured to either read or ignore it).

@Roasbeef Roasbeef added this to the 0.5 milestone Aug 8, 2018

@halseth

Reads well to me at fist read through!

@@ -608,16 +611,18 @@ func TestSendPaymentErrorPathPruning(t *testing.T) {
t.Fatalf("unable to fetch source node pub: %v", err)
}
roasbeefLuoji := lnwire.NewShortChanIDFromInt(689530843)

This comment has been minimized.

@halseth

halseth Aug 10, 2018

Collaborator

Where are these IDs coming from?

This comment has been minimized.

@wpaulino

wpaulino Aug 10, 2018

Collaborator

From basic_graph.json 😛

This comment has been minimized.

@cfromknecht

cfromknecht Aug 10, 2018

Collaborator

would recommend using genIDs() to ensure we don't reuse short chan ids

This comment has been minimized.

@wpaulino

wpaulino Aug 10, 2018

Collaborator

This is different as the point of the test is to skip the hop between roasbeef and luoji and take an alternate path within the graph in basic_graph.json.

This comment has been minimized.

@cfromknecht

cfromknecht Aug 15, 2018

Collaborator

Gotcha, I missed that this test also uses that graph. I had assumed you were reusing those chan ids in a diff test. I think should be okay then

@wpaulino wpaulino force-pushed the wpaulino:local-dispatch-strict-forwarding branch 2 times, most recently from 1bae284 to df7a0ec Aug 10, 2018

@cfromknecht

Nice work! Changes look good to me apart from minor comments in test

@@ -608,16 +611,18 @@ func TestSendPaymentErrorPathPruning(t *testing.T) {
t.Fatalf("unable to fetch source node pub: %v", err)
}
roasbeefLuoji := lnwire.NewShortChanIDFromInt(689530843)

This comment has been minimized.

@cfromknecht

cfromknecht Aug 10, 2018

Collaborator

would recommend using genIDs() to ensure we don't reuse short chan ids

@@ -370,7 +371,7 @@ func TestSendPaymentErrorRepeatedFeeInsufficient(t *testing.T) {
// We'll also fetch the first outgoing channel edge from roasbeef to
// son goku. We'll obtain this as we'll need to to generate the
// FeeInsufficient error that we'll send back.
chanID := uint64(3495345)
chanID := uint64(12345)

This comment has been minimized.

@cfromknecht

cfromknecht Aug 10, 2018

Collaborator

similar comment wrt randomizing sids

@joostjager

This comment has been minimized.

Collaborator

joostjager commented Aug 13, 2018

I am working on the routing retry behaviour in #1706. If a route fails, it may be that edges are either marked as "failed once" or (on second failure) pruned out completely (not attempted anymore for this payment).

In the light of the discussion in this PR, it could be useful to also acknowledge in the retry behaviour that the actual channel being used isn't known. It may be better to associate failures not with specific channels, but with the two endpoints (pubkeys) of the channel. So instead of pruning channel 123, we prune the set of channels from node A to node B. (In case of a single channel, the result is the same)

@@ -364,8 +365,8 @@ func (s *Switch) SendHTLC(nextNode [33]byte, htlc *lnwire.UpdateAddHTLC,
// system and something wrong happened.
packet := &htlcPacket{
incomingChanID: sourceHop,
outgoingChanID: firstHop,

This comment has been minimized.

@cfromknecht

cfromknecht Aug 15, 2018

Collaborator

would move this to be after incomingHTLCID for clarity, each chanid/htlcid pair corresponds to one circuit key. maintaining those groupings has proved helpful

This comment has been minimized.

@wpaulino

wpaulino Aug 15, 2018

Collaborator

Fixed.

@wpaulino wpaulino force-pushed the wpaulino:local-dispatch-strict-forwarding branch from df7a0ec to 864abd6 Aug 15, 2018

@cfromknecht

This comment has been minimized.

Collaborator

cfromknecht commented Aug 15, 2018

@joostjager most certainly, that's something we've discussed before as it's very related to this PR.

Since we don't have certainty about what channel the other hops try, the feedback to the router is lossy (and potentially malicious/incorrect). With sufficient fees, multiple channels between the same peers can be effectively treated as one channel from the source's perspective, so the suggestion to prune all channels connecting the pubkeys makes sense to me.

It also shouldn't be an issue if we assume that a node will always try the "best" (or at least sufficient) channel first, as then there'd no point retrying if the others are worse. However, this change could still cause us to miss a possible route if they maintain a strict forwarding policy.

Atm we will try to find the hop with the most bandwidth. I think c-lightning has a max of one channel per peer, so it wouldn't be an issue there. Eclair looks like it abides by the channel requested in the payload, though as a disclaimer, my scala is a little rusty. Just based on that knowledge, it may produce less-than-optimal results when routing through an eclair node that has multiple outgoing channels with the next peer.

@ZapUser77

This comment has been minimized.

ZapUser77 commented Aug 16, 2018

Don't we encapsulate which channels to use? If we just specify the nodes... we could end up with problems: Say I have two channels from A to B:

I then set the fees of Channel 1 to zero, and channel 2 to 5 BTC... if the sender doesn't know which channel it's going to use... wouldn't this be a problem if others tried to route through that node? If the sender is leaving the determination of which channel to use to the hops... couldn't this be exploited in some way, having different fees for different channels?

Multiple channels definitely should not be treated the same. The best way to get liquidity is to open a channel to a peer, and then have them open a channel back. You start out with two channels that are excessively unbalanced, but as a pair they perfectly balanced.

C-lightning allows one channel to be opened from each side, so two channels total between peers.

We don't try to find the hop with the most bandwidth, we try to find one with enough bandwidth that cost the least in fees. And there's no way to tell what the local balance is on a remote node's channel, so you have to try all of them that fit.

And from my understanding (which definitely could be wrong) eclair nodes explicitly do not allow "being routed through." But that could just be the phone app... I haven't looked into their desktop app.

As far as pruning an entire route... what exactly do you mean? Do you mean remove all edges along that path from ever being tried again, individually? Or Just the route as a collection? How would that affect channels with rapidly shifting balances that just happened to be insufficient that millisecond that you tried to send through it... but is fine the next?

Is there no way to get feedback on exactly which channel failed?

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Aug 16, 2018

@ZapUser77 I'd recommend taking a look at the code. All those questions can be answered at a quick glance. All the information is available, just on the RPC layer it may not always be bubbled up. There's a PR to return more accurate failure information back to each peer. The sender always knows which channel it's using. Note that this PR is for local dispatch, so we always traverse the channel specified.

@joostjager

This comment has been minimized.

Collaborator

joostjager commented Aug 16, 2018

@cfromknecht I opened a new PR with some additional analysis on the pruning question, #1734

@ZapUser77

This comment has been minimized.

ZapUser77 commented Aug 16, 2018

"I'd recommend taking a look at the code. All those questions can be answered at a quick glance."
No. No they can't. You can't expect everyone that you speak with to have in-depth knowledge of every bit of code in LND. Something that's a 'quick glance' for you... could be hours/days for others.

I was replying specifically to "Since we don't have certainty about what channel the other hops try", as what you're saying "The sender always knows which channel it's using." directly contradicts. And from the bits of code that I had seen, I believed the sender always knew... but didn't want to be a **** about it.

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Aug 17, 2018

@ZapUser77 if you're attempting to join discussions concerning code architecture and design tradeoffs, you're expected to do your homework such that you can positively contribute to the conversation. This is open source, resources are scarce, we can't sit down an explain all prior decisions on manually walk through the code, you're expected to have a certain level of knowledge if you jump into a discussion.

The sender always knows. We use source routing.

@cfromknecht

This comment has been minimized.

Collaborator

cfromknecht commented Aug 17, 2018

@joostjager thanks, I will take a look!

@ZapUser77 the sender always knows which channels it wants to use. The sender also knows the outgoing channel they actually use because they choose a channel they own and explicitly update that channel with the direct peer. Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel and there's no way for us to verify which channel they used.

Is there no way to get feedback on exactly which channel failed?

Nope, a node other than your direct peer could lie about which channel it tried to update, or even if it tried at all.

@ZapUser77

This comment has been minimized.

ZapUser77 commented Aug 17, 2018

Thank you @cfromknecht.

@Roasbeef: I have looked at the code... but no, I haven't read and walked through all 100,000+ lines. Rather that be condescending, perhaps you can just ignore my comments that aren't directed towards you? I'm not asking to be walk you through anything. I know you're busy. And I intentionally do not direct any of my statements to you, as every time you communicate with me, I get this distinct feeling of extreme disdain you have for me. [And if you treat all newcomers to the project like that... that may be why you only have 3-5 people doing 95% of the work.]

"The sender always knows. We use source routing" I know that. However, what You are saying is directly contradicting what @cfromknecht is saying "Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel".

Anyway, I'm trying to lighten your workload, not add more too it. It'll take some time for me to work through all the code, and understand it fully.

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Aug 17, 2018

@ZapUser77 if you treat us with respect, we'll do the same towards you. Your veiled jabs at those that actively contribute to the project are unwarranted, and unproductive.

Once the HTLC travels beyond the first channel, the other nodes aren't required to follow the requested channel

They aren't required, but they can only send the HTLC to the target node. Therefore, they can't deviate beyond the path aside from cancelling or just holding onto the HTLC. Based on the discussion here, I think we're going to go with strict forwarding everywhere, so even when we forward remote HTLCs.

@ZapUser77

This comment has been minimized.

ZapUser77 commented Aug 17, 2018

@Roasbeef o_0
Umm, what? I've never made any jabs or shown any disrespect to any of the LND contributors. I have the utmost respect for all of them, esp you. I know you all have drastically more specific knowledge in this field than me.

If you could point me towards anything that you felt was disrespectful to anyone, I'll apologize for it.

"They aren't required" So what Conner said is correct, and you're statement "The sender always knows" is incorrect? I was under the impression that the source picked the channels to use, but you're saying it only picks the nodes, and 'recommends' channels.

@wpaulino wpaulino force-pushed the wpaulino:local-dispatch-strict-forwarding branch from 864abd6 to b24551b Aug 18, 2018

@wpaulino

This comment has been minimized.

Collaborator

wpaulino commented Aug 18, 2018

This PR has been amended to enable strict forwarding all throughout.

wpaulino added some commits Jul 10, 2018

htlcswitch: implement strict forwarding for locally dispatched payments
In this commit, we address an issue that could arise when using the
SendToRoute RPC. In this RPC, we specify the exact hops that a payment
should take. However, within the switch, we would set a constraint for
the first hop to be any hop as long as the first peer was at the end of
it. This would cause discrepancies when attempting to use the RPC as the
payment would actually go through another hop with the same peer. We fix
this by explicitly specifying the channel ID of the first hop.
htlcswitch: implement strict forwarding for forwarded payments
In this commit, we constrain our packet forwarding so that the packet is
_deterministically_ forwarded to the outgoing channel as before we would
attempt to select the best link out of all possible links with the
destination peer. This is needed as users begin to manually rebalance
their channels with the SendToRoute RPC, expecting their payment to take
the _exact_ route they crafted.

@wpaulino wpaulino force-pushed the wpaulino:local-dispatch-strict-forwarding branch from b24551b to 5f5e44b Aug 18, 2018

@joostjager

This comment has been minimized.

Collaborator

joostjager commented Aug 18, 2018

@wpaulino Is this really a good idea? If a forwarding node has multiple channels with varying balances open to the same next hop, the source node needs to try them all out until it finds the one that can carry the payment. To me it looks like sacrificing the performance of normal payments to benefit the specialized operation of rebalancing.

Besides, you can never be sure that every node is really forwarding strictly. A routing node might reverse this change to generate less failed payments and improve its reputation once there is some kind of reputation management for nodes in lnd. Also, as far as I know the discussion hasn't been closed whether rebalancing should be an action or can also be carried out passively by manipulating fees.

And for rebalancing, how exactly does it hurt to have the forwarding node select the channel? And if it does hurt and make the operation less efficient, I am quite sure that is still possible to come up with an algorithm that takes into account the uncertainty of the chosen channel. For execution of payments, a similar challenge exists: how to best process the failure message to maximize the chance of the next route to succeed.

@ZapUser77

This comment has been minimized.

ZapUser77 commented Aug 18, 2018

Balancing can not be performed by fees. Fees only affect 'outgoing' transaction, so they'd be no benefit to channels that you need incoming transactions on.

"how exactly does it hurt to have the forwarding node select the channel?" Can they not set the fee of one channel really low to get it to show up as the "cheapest hop" then refuse to use it, and instead pick a much more expensive channel to the same node? Is there some mechanism already to prevent that?

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Aug 21, 2018

and instead pick a much more expensive channel to the same node? Is there some mechanism already to prevent that?

The sender will select that cheapest hop, and provide the exact amount required to traverse that link. So all fees are already pre-computed, if one nodes takes more than advertised, then there won't be enough fees to distribute to the rest of the nodes, so the route will fail downstream.

Balancing can not be performed by fees.

It's one piece of the solution. Consider that you can increase fees for a link in order to prevent the bandwidth from being further depleted. Also consider that by decreasing fees on certain links, i can incentivize HTLCs to be routed towards be, thereby giving me higher bandwidth on a select set of links.

I tend to agree that it's better network wide for us to select the best link to our ability for the case of remote HTLCs.

@ZapUser77

This comment has been minimized.

ZapUser77 commented Aug 21, 2018

Scenario:
Channel A is all on my side, Channel B is balanced, Channel C is all on far side.
Higher fees only help prevent money from going out a specific channel. It doesn't help make money come in a specific channel.
So, adding a high fee to C wouldn't matter as it's already unbalanced to the max.
And lowering fees doesn't make that channel be used either.
Setting fees on A to zero won't help if 100% of the natural flow is inward.
An actual rebalance command is a valid solution to the need for balancing. (Which I should have in a day or three)

P.S. Thank you for being respectful with your reply.

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Aug 21, 2018

@federicobond there are no actual guarantees w.r.t which link a node will select. As a result, any rebalancing code should itself be written to account for nodes having leeway w.r.t which channel they forward over.

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Aug 21, 2018

Simplified version (no strict remote) committed as 7535371.

@Roasbeef Roasbeef closed this Aug 21, 2018

@wpaulino wpaulino deleted the wpaulino:local-dispatch-strict-forwarding branch Aug 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment