lnrpc: deprecate QueryRoutes with more than one route #2497

joostjager · 2019-01-17T13:45:24Z

This PR marks the num_routes argument of the query routes rpc as deprecated.

The use case for returning multiple routes is limited at best, but it does require maintaining a dedicated algorithm (k-shortest) in lnd.

It also encourages blindly attempting the second best route without interpreting the outcome of the best route, which is likely to result in less efficient payments.

The recommended approach is to request a single route, attempt payment, process the outcome if it failed (eg black list edges) and request another route, taking into account the outcome of the first attempt.

Currently QueryRoutes with multiple routes may have a use in special scripts (probing, rebalancing, ..). To keep facilitating those, a follow up PR will add any source to destination queries to QueryRoutes.

halseth · 2019-01-17T20:17:07Z

Is the workaround you mention (blacklisting edges) currently possible? Maybe we should provide that before deprecating this.

joostjager · 2019-01-17T20:46:47Z

No, not possible currently.

But if we want to go this way, I thought it is better to signal it sooner rather than later.

Roasbeef · 2019-01-18T00:35:08Z

People can still use it while we mark it deprecated, before we remove it all together, we should have the replacement in place though.

lnrpc/rpc.proto

halseth · 2019-01-22T09:30:47Z

Change LGTM.

Question is still whether we should wait to have a replacement before deprecating, but as you say maybe best to just start signaling right away.

EDIT: Would be nice to be able to point to the "new way" of doing it when deprecating.

treygriffith · 2019-01-22T22:20:55Z

As a user of the QueryRoutes rpc, our use case for multiple routes is to implement our own maximum time lock similar to the maximum fee that is already supported by the rpc. I suppose you could do this with edge blacklisting as well, but I'm not sure that's the right tool for the job.

joostjager · 2019-01-23T07:54:33Z

@treygriffith thanks for mentioning your use case.

I don't think it works very well with black listing, but #1941 should do it.

ZapUser77 · 2019-02-07T05:01:05Z

This seems like a bad idea.
Scenario:
"best" path yielded is A-B-C-D-E-F
However, B-C 100% of the balance is on C side, so this can't be used. With only 1 routed returned, you have no alternative, and no way of knowing which channel is the problem. And "blacklisting" a channel isn't a solution unless it's a single-use blacklist and you have some way to determine exactly which edge is the problem.

"To keep facilitating those, a follow up PR will add any source to destination queries to QueryRoutes."
This will definitely be beneficial.

joostjager · 2019-02-07T08:14:56Z

First thing to note is that returning multiple routes alleviates the problem you describe above somewhat, but it is still not a real solution. It could still be that all returned routes fail and that the route that would work is not in the set. You could ask for all existing routes, but that quickly becomes infeasible.

The plan is that before we actually remove num_routes, we would have any-to-any route queries with black list implemented. Then it is always possible to implement the k-shortest algorithm client side.

In addition to that, we are working towards returning structured errors from SendToRoute (#1662), which will allow a client to pinpoint the failing edge, update a (single use) black list and pass that into query routes.

ZapUser77 · 2019-02-07T08:34:01Z

Thanks.
Would the sender know exactly which channel failed in the new errors?
Is "insufficient local balance" covered under "temporary channel failure"?
This is something that perplexes me about AMP as well... how will one know the available channel balance of a non-directly connected channel? [All falls under the "how do we know which channels to avoid" due to insufficient local balance.]

joostjager · 2019-03-05T21:08:50Z

@Roasbeef route cache restored

This check was a left over from when the fee limit wasn't checked yet in the path finding algorithm.

This allows removing a lot of empty map initialization code and makes the code more readable.

This commit allows the execution of QueryRoutes to be controlled using lists of black-listed edges and nodes. Any path returned will not pass through the edges and/or nodes on the list.

Currently public keys are represented either as a 33-byte array (Vertex) or as a btcec.PublicKey struct. The latter isn't useable as index into maps and cannot be used easily in compares. Therefore the 33-byte array representation is used predominantly throughout the code base. This commit converts the argument types of source and target nodes for path finding to Vertex. Path finding executes no crypto operations and using Vertex simplifies the code. Additionally, it prepares for the path finding source parameter to be exposed over rpc in a follow up commit without requiring conversion back and forth between Vertex and btcec.PublicKey.

This commit allows execution of QueryRoutes from any source node. Previously this was restricted to only the self node.

Now that QueryRoutes gained the ability to route from any source node and takes in edge and node black lists, all pieces are in place to have users implemented their own k-shortests path algorithm. Or any other algorithm they might wish to use and currently can't. This commit marks the num_routes field as deprecated as a preparation for removing k-shortest for lnd.

Roasbeef

The next chapter on the path to removing ksp! First pass review completed, pretty straight forward diffs with no major comments. The only thing I think we should do before merging this is to update the QueryRoutes integration test to exercise this new behavior, and the new parsing logic along the way.

routing/pathfind.go

This commit moves the query routes backend logic from the main rpc server into the sub server. It is another step towards splitting up the main rpc server code. In addition to this, a unit test is added to verify rpc parameter parsing.

joostjager · 2019-03-11T09:00:42Z

The only thing I think we should do before merging this is to update the QueryRoutes integration test to exercise this new behavior, and the new parsing logic along the way.

I think we should be careful with integration tests and only use them to test code that isn't testable otherwise. The queryroutes rpc behaviour can be tested in a unit test as well. I added this test in the last commit.

Roasbeef

LGTM 💎

wamde · 2019-04-12T21:45:12Z

Does this mean that moving forward advanced routing functionality will be seen as something which should live outside lnd? Will this be extendable via plugins, or are we encouraged to build on top of the gRPC API?
I am asking because I am re-implementing KSP for rebalance-lnd, but until now it was not clear to me whether it would be a short term fix or something we should invest more into.

joostjager · 2019-04-16T08:03:15Z

@wamde What do you mean exactly with 'advanced routing functionality'?

KSP specifically has in our opinion not many good use cases, so that is why we decided to move it out of lnd. All code has a recurring maintenance cost that must weigh up against the benefits.

Can you tell me why you need ksp for rebalancing rather than trying the best route first and after processing the outcome of that, query again for the (then) best route?

wamde · 2019-04-19T08:34:54Z

Let me explain (re-stating some obvious stuff for context):

Rebalancing tries to find loops in the graph, with some constraints. We may want to force an exit channel (to drain that one specifically), a re-entry channel (to top up that one specifically), or both.
Due to many factors (let's say the network is still in its infancy), the shortest path may not work (unbalanced channel, offline node, you name it). So we may want to try the 2nd best route, 3rd and so on. My personal experience with rebalance-lnd is that it takes 1-10 paths to complete a rebalance, and for poorly connected channels it may not succeed at all.
QueryRoutes currently gives k routes sorted by fee_milli_msat, but does not seem to try and maximise the chance of these paths working (yet -- I know there is more to come but for now that's what we get), so we typically take n routes and try them all sequentially. We may want to finetune this by weighing certain nodes specifically, not just looking at the total fee in msats but in actual sats, by also optimising for short routes (in terms of # of hops and not just fees) etc.
Finding routes is slow, so if we can compute n routes and try them all without having to wait for another re-computation after a failure, it helps. Even better, if there was a way to thread a pathfinding service and have another thread probing/trying such routes, the speediness of rebalances would greatly improve.

All of this to say that I think that we are fine re-implementing all of that and adding flexibility to it on top of lnd on our own, but:

I would like not to re-invent the wheel, so I was seeking clarification on the direction being taken at the lnd level
it would be great, if generic pathfinding stayed in lnd, to have a better framework to plug new optimisers or whatever as plugins as opposed to rebuilding the whole thing on top of graph RPCs

joostjager · 2019-04-19T08:59:13Z

Thanks for the clarification @wamde

Finding routes is slow, so if we can compute n routes and try them all without having to wait for another re-computation after a failure, it helps

In your experience, is calling QueryRoutes for 20 routes at once significantly faster than calling it 20 times and requesting just a single route in every call?

I am sensitive to the performance argument in general. There are probably many ways to speed up path finding. One good candidate is keeping the graph in memory, but this is quite a refactor which we started but haven't completed yet.

if there was a way to thread a pathfinding service and have another thread probing/trying such routes

These calls can be made concurrently, but ideally you only want to start the next path finding round when the previous probe has completed. To take into account the outcome of that probe.

I would like not to re-invent the wheel, so I was seeking clarification on the direction being taken at the lnd level

Yes understood. On the lnd level we want to offer the right tools. If ksp is a way around slow path finding, the right thing to do is to optimize path finding. Ksp started to take more maintenance effort than we'd like, so we decided to move it out. This loads some of the work onto the user, but it also creates an opportunity to implement it in a better, more tailored way perhaps.

would be great, if generic pathfinding stayed in lnd, to have a better framework to plug new optimisers or whatever as plugins as opposed to rebuilding the whole thing on top of graph RPCs

What do you mean with generic pathfinding exactly? We want to move the RPCs in a direction where they remain relatively simple, but still powerful enough to solve real world path finding problems. Any feedback on that is always welcome. A caller defined hop limit could also be an extension of the current api.

Plugins are not on the roadmap atm.

manreo · 2019-04-25T21:57:09Z

Hi @joostjager this is really nice, and I think it is the right way to go giving the difficulty in path-finding.
Don't you also need to update the lncli?
https://github.com/lightningnetwork/lnd/blob/master/cmd/lncli/commands.go#L2994

joostjager · 2019-04-26T06:06:28Z

@mrmanpew As this is advanced functionality, we considered adding the flags to lncli low priority.

wamde · 2019-04-28T17:52:27Z

Thanks for the clarification @wamde

Finding routes is slow, so if we can compute n routes and try them all without having to wait for another re-computation after a failure, it helps

In your experience, is calling QueryRoutes for 20 routes at once significantly faster than calling it 20 times and requesting just a single route in every call?

I can't answer this directly, but I can describe what we do:
We query n routes (can't remember what our default n is), try them, and if all of them fail, request another batch of n routes. That new batch takes a while to come, sometimes dozens of seconds.

I am sensitive to the performance argument in general. There are probably many ways to speed up path finding. One good candidate is keeping the graph in memory, but this is quite a refactor which we started but haven't completed yet.

Yes, that sounds like an obvious one, along with keeping heuristics from previous requests and SendToRoute attempts.

if there was a way to thread a pathfinding service and have another thread probing/trying such routes

These calls can be made concurrently, but ideally you only want to start the next path finding round when the previous probe has completed. To take into account the outcome of that probe.

Yes, I probably worded that wrong. I mean one thread to find paths, passing them as it finds them to another thread trying them in a sequential manner.
We could go further and probe routes with dummy preimages (I think, if I understood one of Alex's cryptic tweets correctly), and that could be parallelised, but it would probably require deeper interaction with the inner HTLC manager which I think we don't have access to using the API.

I would like not to re-invent the wheel, so I was seeking clarification on the direction being taken at the lnd level

Yes understood. On the lnd level we want to offer the right tools. If ksp is a way around slow path finding, the right thing to do is to optimize path finding. Ksp started to take more maintenance effort than we'd like, so we decided to move it out. This loads some of the work onto the user, but it also creates an opportunity to implement it in a better, more tailored way perhaps.

The current KSP is a bit slow and could certainly use heuristics and caching, but my main pet peeve is around its lack of flexibility. A recent PR added the source pubkey parameter, which is great, but I still think that for rebalancing use cases we need more flexibility.

would be great, if generic pathfinding stayed in lnd, to have a better framework to plug new optimisers or whatever as plugins as opposed to rebuilding the whole thing on top of graph RPCs

What do you mean with generic pathfinding exactly? We want to move the RPCs in a direction where they remain relatively simple, but still powerful enough to solve real world path finding problems. Any feedback on that is always welcome. A caller defined hop limit could also be an extension of the current api.

Hop limit is one way. To give you an example of what I am currently working on in a branch of rebalance-lnd, I compute the route's weight by using 25 sats * # of hops + total_fee. The 25 parameter can obviously be tweaked in the future, but it could also be used to reflect other preferences, per channel or node: depending on their uptime, level of trust, etc. If such a function could be re-defined via a plugin we could virtually implement any kind of path finding optimiser.
We could also set infinite weights on certain channels, effectively blacklisting them from the pathfinding algorithm. I do some of that in my path finder to effectively remove channels where n*amount > chan_capacity in order to reduce the amount of failures due to poorly balanced channels.

Plugins are not on the roadmap atm.

Understood.

joostjager · 2019-04-29T07:18:49Z

I can't answer this directly, but I can describe what we do:
We query n routes (can't remember what our default n is), try them, and if all of them fail, request another batch of n routes. That new batch takes a while to come, sometimes dozens of seconds.

If you would request the routes one at a time, I expect that it would in total not take significantly more time than requesting n routes at once.

Assuming the performance is not degraded by the api change, do I understand it right that at the moment all the tools are there to implement KSP client side?

With regards to further influencing path finding, I would suggest to open a new issue describing the use case and the extra levers that you'd need. Redefining the ignored edges and nodes list as more of a weight penalty which can be set to infinity could be a direction to take.

joostjager requested a review from halseth January 21, 2019 13:03

Roasbeef added rpc Related to the RPC interface code health Related to code commenting, refactoring, and other non-behaviour improvements gRPC P3 might get fixed, nice to have labels Jan 21, 2019

halseth reviewed Jan 22, 2019

View reviewed changes

lnrpc/rpc.proto Outdated Show resolved Hide resolved

joostjager requested a review from Roasbeef January 24, 2019 07:43

joostjager force-pushed the querysingleroute branch from 53d1b3f to cbb4344 Compare February 1, 2019 09:05

Roasbeef added this to the 0.6 milestone Feb 5, 2019

joostjager force-pushed the querysingleroute branch 3 times, most recently from 070e5ad to 9bca98a Compare February 6, 2019 07:04

joostjager force-pushed the querysingleroute branch from 9bca98a to bbf8f24 Compare February 7, 2019 07:58

joostjager mentioned this pull request Feb 14, 2019

routing: add cltv limit #2640

Merged

joostjager force-pushed the querysingleroute branch from bbf8f24 to e9e83eb Compare February 14, 2019 07:27

joostjager force-pushed the querysingleroute branch 6 times, most recently from 0e723a6 to 643a7fd Compare March 5, 2019 16:54

joostjager force-pushed the querysingleroute branch from 7e1fbaa to d062139 Compare March 5, 2019 21:07

joostjager force-pushed the querysingleroute branch from 3c7c456 to 44ffcd8 Compare March 5, 2019 21:18

joostjager added 9 commits March 6, 2019 15:30

routing: add todo describing fee limit bug

f4cc2e2

routing: remove redundant fee limit check in newRoute

4937304

This check was a left over from when the fee limit wasn't checked yet in the path finding algorithm.

routing: export RestrictParams and EdgeLocator

b2b28b4

routing: allow nil maps for ignored edges and nodes

4376f3e

This allows removing a lot of empty map initialization code and makes the code more readable.

lnrpc+routing: add edges and nodes restrictions to query routes

b09adc3

This commit allows the execution of QueryRoutes to be controlled using lists of black-listed edges and nodes. Any path returned will not pass through the edges and/or nodes on the list.

routing: add source parameter to query routes

c62c9d6

This commit allows execution of QueryRoutes from any source node. Previously this was restricted to only the self node.

routing: add todo describing route cache bug

6cc82b4

joostjager force-pushed the querysingleroute branch from 44ffcd8 to 6cc82b4 Compare March 6, 2019 14:31

Roasbeef requested changes Mar 9, 2019

View reviewed changes

routing/pathfind.go Show resolved Hide resolved

routerrpc: move query routes into sub server

293971c

This commit moves the query routes backend logic from the main rpc server into the sub server. It is another step towards splitting up the main rpc server code. In addition to this, a unit test is added to verify rpc parameter parsing.

Roasbeef approved these changes Mar 13, 2019

View reviewed changes

Roasbeef merged commit ad88490 into lightningnetwork:master Mar 13, 2019

C-Otto mentioned this pull request Apr 12, 2019

Important: Work around QueryRoutes numroutes deprecation C-Otto/rebalance-lnd#51

Closed

Crypt-iQ mentioned this pull request Apr 15, 2019

routing: add support for pegged hops in findPaths #2444

Closed

joostjager mentioned this pull request May 7, 2019

lnrpc+routing: remove k shortest path finding #3054

Merged

C-Otto mentioned this pull request Jun 15, 2019

QueryRoutes always returns same route #3206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lnrpc: deprecate QueryRoutes with more than one route #2497

lnrpc: deprecate QueryRoutes with more than one route #2497

joostjager commented Jan 17, 2019 •

edited

Loading

halseth commented Jan 17, 2019

joostjager commented Jan 17, 2019

Roasbeef commented Jan 18, 2019

halseth commented Jan 22, 2019 •

edited

Loading

treygriffith commented Jan 22, 2019

joostjager commented Jan 23, 2019

ZapUser77 commented Feb 7, 2019

joostjager commented Feb 7, 2019

ZapUser77 commented Feb 7, 2019

joostjager commented Mar 5, 2019

Roasbeef left a comment

joostjager commented Mar 11, 2019

Roasbeef left a comment

wamde commented Apr 12, 2019

joostjager commented Apr 16, 2019

wamde commented Apr 19, 2019

joostjager commented Apr 19, 2019

manreo commented Apr 25, 2019

joostjager commented Apr 26, 2019

wamde commented Apr 28, 2019

joostjager commented Apr 29, 2019

lnrpc: deprecate QueryRoutes with more than one route #2497

lnrpc: deprecate QueryRoutes with more than one route #2497

Conversation

joostjager commented Jan 17, 2019 • edited Loading

halseth commented Jan 17, 2019

joostjager commented Jan 17, 2019

Roasbeef commented Jan 18, 2019

halseth commented Jan 22, 2019 • edited Loading

treygriffith commented Jan 22, 2019

joostjager commented Jan 23, 2019

ZapUser77 commented Feb 7, 2019

joostjager commented Feb 7, 2019

ZapUser77 commented Feb 7, 2019

joostjager commented Mar 5, 2019

Roasbeef left a comment

Choose a reason for hiding this comment

joostjager commented Mar 11, 2019

Roasbeef left a comment

Choose a reason for hiding this comment

wamde commented Apr 12, 2019

joostjager commented Apr 16, 2019

wamde commented Apr 19, 2019

joostjager commented Apr 19, 2019

manreo commented Apr 25, 2019

joostjager commented Apr 26, 2019

wamde commented Apr 28, 2019

joostjager commented Apr 29, 2019

joostjager commented Jan 17, 2019 •

edited

Loading

halseth commented Jan 22, 2019 •

edited

Loading