Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
routing: prune based on channel sets instead of channels #1734
Things have moved on since this PR was opened. I pushed an updated version of node pair pruning in mission control.
Advantages of node pair pruning:
ORIGINAL PR DESCRIPTION
In the light of the discussion in #1527, it could be useful to also acknowledge in the retry behaviour that the actual channel being used isn't known. It may be better to associate failures not with specific channels, but with the two endpoints (pubkeys) of the channel. So instead of pruning channel 123, we prune the set of channels from node A to node B. (In case of a single channel, the result is the same)
So for Eclair what would happen:
In the last case, channel set pruning will result in a sub-optimal outcome. In the current channel-based pruning the third route calculation may retry 125 with the new policy applied or come up with a third channel E->F which hasn't been tried yet.
If the forwarding node (L) is an lnd node,
So actually, in both the eclair and lnd cases, channel set pruning can be sub-optimal.
So maybe not a good idea then at all for policy failures?
Other scenarios to think about could be with a malicious/buggy node. I am not sure if it could keep the source node busy with rerouting for a while by sending back channel updates for different channels than the channel that was requested (setting failed once markers for irrelevant channels, so S keeps trying). It might be worthwhile to match the channel id in the update with the requested channel id as a way to mitigate this.
On the other hand, edges are also pruned without a channel update:
I think a fundamental decision is whether to base the logic on
Ideas are welcome.
@joostjager really dig these changes, should help drive down the number of failed attempts for making a successful payment. Did an initial pass and changes read well to me.
One thing that I'm not sure of is if we should always prune the edge set in response to a
In the most literal sense, it would seem that we should only apply this to the specific edges that failed. Given the current state of the network however, it could be that these are also symptoms of unstable hops, and perhaps should be avoided altogether. There seems to be a balance, but I don't have enough data to really point one way or another. Do you have any more thoughts on this?
Down the road, these failures could be fed directly into a stocastic mission control, which would learn or apply the correct heurisitc. At the moment though, we may have to determine this through experimentation.
The problem remains that you cannot be sure what edge really failed. The more I think about it, the more I am in favor of not making any assumptions. Not only to deal with buggy nodes properly, but also in case of future malicious nodes appearing.
Suppose the hop payload would not have contained a channel id, but just the pubkey of the next hop.
Edge set pruning was already a smaller potential problem for
For c-lightning it wasn't a problem as you said, because only a single channel allowed.
Than just for eclair, what could happen is that we wouldn't try all channels from the forwarding node to the next, but stop after two failures. Even though a channel with enough balance may exist. I don't know how many relevant eclair fwd'ing nodes there are currently, what Eclair's plans are to change fwding logic to what lnd does (and increase earned fees for fwd nodes), how often routing needs to probe more than 2 channels between the same set of nodes, how unbalanced those channels are, etc. It is an uncertainty, but maybe we are able to try it out when we are working on the probability machine anyway. Therefore we also need to evaluate different models/parameters.
Based on the findings re eclair above, and also the to-be-added section to the spec on non-strict forwarding, I think we can safely proceed with these changes now. The one open question I have (was possibly answered above) is how will we deal with fee errors in the case of multiple channels to a node with distinct fees? Will we simply assume that there's no reason to do this, and not try to do anything fancy w.r.t errors sent back to the sender?
I would say that we only prune the channels that would need a fee of at most what we paid in the attempt for which we got the error. In case all channels have the same policy, this means we prune all channels. If there are cheaper channels, we prune those too. More expensive ones are left in for a new path finding round. Time lock need to be worked in too. So prune all channels with a lower or equal fee and a shorter or equal timelock delta.
halseth left a comment
Nice, I really like the direction this is going! Diff was much smaller than I expected, and separating pair failures from node failures also makes the code easier to reason about.
LGTM, but only did a high-level pass for now, will circle back to it.
Previously mission control tracked failures on a per node, per channel basis. This commit changes this to tracking on the level of directed node pairs. The goal of moving to this coarser-grained level is to reduce the number of required payment attempts without compromising payment reliability.