Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lnrpc: deprecate QueryRoutes with more than one route #2497

Merged
merged 10 commits into from Mar 13, 2019

Conversation

@joostjager
Copy link
Collaborator

@joostjager joostjager commented Jan 17, 2019

This PR marks the num_routes argument of the query routes rpc as deprecated.

The use case for returning multiple routes is limited at best, but it does require maintaining a dedicated algorithm (k-shortest) in lnd.

It also encourages blindly attempting the second best route without interpreting the outcome of the best route, which is likely to result in less efficient payments.

The recommended approach is to request a single route, attempt payment, process the outcome if it failed (eg black list edges) and request another route, taking into account the outcome of the first attempt.

Currently QueryRoutes with multiple routes may have a use in special scripts (probing, rebalancing, ..). To keep facilitating those, a follow up PR will add any source to destination queries to QueryRoutes.

@halseth
Copy link
Collaborator

@halseth halseth commented Jan 17, 2019

Is the workaround you mention (blacklisting edges) currently possible? Maybe we should provide that before deprecating this.

@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Jan 17, 2019

No, not possible currently.

But if we want to go this way, I thought it is better to signal it sooner rather than later.

@Roasbeef
Copy link
Member

@Roasbeef Roasbeef commented Jan 18, 2019

People can still use it while we mark it deprecated, before we remove it all together, we should have the replacement in place though.

lnrpc/rpc.proto Outdated Show resolved Hide resolved
@halseth
Copy link
Collaborator

@halseth halseth commented Jan 22, 2019

Change LGTM.

Question is still whether we should wait to have a replacement before deprecating, but as you say maybe best to just start signaling right away.

EDIT: Would be nice to be able to point to the "new way" of doing it when deprecating.

@treygriffith
Copy link

@treygriffith treygriffith commented Jan 22, 2019

As a user of the QueryRoutes rpc, our use case for multiple routes is to implement our own maximum time lock similar to the maximum fee that is already supported by the rpc. I suppose you could do this with edge blacklisting as well, but I'm not sure that's the right tool for the job.

@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Jan 23, 2019

@treygriffith thanks for mentioning your use case.

I don't think it works very well with black listing, but #1941 should do it.

@joostjager joostjager requested a review from Roasbeef Jan 24, 2019
@joostjager joostjager force-pushed the joostjager:querysingleroute branch from 53d1b3f to cbb4344 Feb 1, 2019
@Roasbeef Roasbeef added this to the 0.6 milestone Feb 5, 2019
@joostjager joostjager force-pushed the joostjager:querysingleroute branch 3 times, most recently from 070e5ad to 9bca98a Feb 5, 2019
@ZapUser77
Copy link

@ZapUser77 ZapUser77 commented Feb 7, 2019

This seems like a bad idea.
Scenario:
"best" path yielded is A-B-C-D-E-F
However, B-C 100% of the balance is on C side, so this can't be used. With only 1 routed returned, you have no alternative, and no way of knowing which channel is the problem. And "blacklisting" a channel isn't a solution unless it's a single-use blacklist and you have some way to determine exactly which edge is the problem.

"To keep facilitating those, a follow up PR will add any source to destination queries to QueryRoutes."
This will definitely be beneficial.

@joostjager joostjager force-pushed the joostjager:querysingleroute branch from 9bca98a to bbf8f24 Feb 7, 2019
@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Feb 7, 2019

First thing to note is that returning multiple routes alleviates the problem you describe above somewhat, but it is still not a real solution. It could still be that all returned routes fail and that the route that would work is not in the set. You could ask for all existing routes, but that quickly becomes infeasible.

The plan is that before we actually remove num_routes, we would have any-to-any route queries with black list implemented. Then it is always possible to implement the k-shortest algorithm client side.

In addition to that, we are working towards returning structured errors from SendToRoute (#1662), which will allow a client to pinpoint the failing edge, update a (single use) black list and pass that into query routes.

@ZapUser77
Copy link

@ZapUser77 ZapUser77 commented Feb 7, 2019

Thanks.
Would the sender know exactly which channel failed in the new errors?
Is "insufficient local balance" covered under "temporary channel failure"?
This is something that perplexes me about AMP as well... how will one know the available channel balance of a non-directly connected channel? [All falls under the "how do we know which channels to avoid" due to insufficient local balance.]

@joostjager joostjager force-pushed the joostjager:querysingleroute branch from bbf8f24 to e9e83eb Feb 14, 2019
@joostjager joostjager force-pushed the joostjager:querysingleroute branch 6 times, most recently from 0e723a6 to 643a7fd Mar 1, 2019
@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Mar 5, 2019

@Roasbeef route cache restored

@joostjager joostjager force-pushed the joostjager:querysingleroute branch from 3c7c456 to 44ffcd8 Mar 5, 2019
joostjager added 9 commits Mar 5, 2019
This check was a left over from when the fee limit wasn't checked yet in
the path finding algorithm.
This allows removing a lot of empty map initialization code and makes
the code more readable.
This commit allows the execution of QueryRoutes to be controlled using
lists of black-listed edges and nodes. Any path returned will not pass
through the edges and/or nodes on the list.
Currently public keys are represented either as a 33-byte array (Vertex) or as a
btcec.PublicKey struct. The latter isn't useable as index into maps and
cannot be used easily in compares. Therefore the 33-byte array
representation is used predominantly throughout the code base.

This commit converts the argument types of source and target nodes for
path finding to Vertex. Path finding executes no crypto operations and
using Vertex simplifies the code.

Additionally, it prepares for the path finding source parameter to be
exposed over rpc in a follow up commit without requiring conversion back
and forth between Vertex and btcec.PublicKey.
This commit allows execution of QueryRoutes from any source node.
Previously this was restricted to only the self node.
Now that QueryRoutes gained the ability to route from any source node
and takes in edge and node black lists, all pieces are in place to have
users implemented their own k-shortests path algorithm. Or any other algorithm
they might wish to use and currently can't.

This commit marks the num_routes field as deprecated as a preparation
for removing k-shortest for lnd.
@joostjager joostjager force-pushed the joostjager:querysingleroute branch from 44ffcd8 to 6cc82b4 Mar 6, 2019
Copy link
Member

@Roasbeef Roasbeef left a comment

The next chapter on the path to removing ksp! First pass review completed, pretty straight forward diffs with no major comments. The only thing I think we should do before merging this is to update the QueryRoutes integration test to exercise this new behavior, and the new parsing logic along the way.

routing/pathfind.go Show resolved Hide resolved
This commit moves the query routes backend logic from the main
rpc server into the sub server. It is another step towards splitting up
the main rpc server code.

In addition to this, a unit test is added to verify rpc parameter
parsing.
@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Mar 11, 2019

The only thing I think we should do before merging this is to update the QueryRoutes integration test to exercise this new behavior, and the new parsing logic along the way.

I think we should be careful with integration tests and only use them to test code that isn't testable otherwise. The queryroutes rpc behaviour can be tested in a unit test as well. I added this test in the last commit.

Copy link
Member

@Roasbeef Roasbeef left a comment

LGTM 💎

@Roasbeef Roasbeef merged commit ad88490 into lightningnetwork:master Mar 13, 2019
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.07%) to 60.257%
Details
@wamde
Copy link

@wamde wamde commented Apr 12, 2019

Does this mean that moving forward advanced routing functionality will be seen as something which should live outside lnd? Will this be extendable via plugins, or are we encouraged to build on top of the gRPC API?
I am asking because I am re-implementing KSP for rebalance-lnd, but until now it was not clear to me whether it would be a short term fix or something we should invest more into.

@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Apr 16, 2019

@wamde What do you mean exactly with 'advanced routing functionality'?

KSP specifically has in our opinion not many good use cases, so that is why we decided to move it out of lnd. All code has a recurring maintenance cost that must weigh up against the benefits.

Can you tell me why you need ksp for rebalancing rather than trying the best route first and after processing the outcome of that, query again for the (then) best route?

@wamde
Copy link

@wamde wamde commented Apr 19, 2019

Let me explain (re-stating some obvious stuff for context):

  • Rebalancing tries to find loops in the graph, with some constraints. We may want to force an exit channel (to drain that one specifically), a re-entry channel (to top up that one specifically), or both.
  • Due to many factors (let's say the network is still in its infancy), the shortest path may not work (unbalanced channel, offline node, you name it). So we may want to try the 2nd best route, 3rd and so on. My personal experience with rebalance-lnd is that it takes 1-10 paths to complete a rebalance, and for poorly connected channels it may not succeed at all.
  • QueryRoutes currently gives k routes sorted by fee_milli_msat, but does not seem to try and maximise the chance of these paths working (yet -- I know there is more to come but for now that's what we get), so we typically take n routes and try them all sequentially. We may want to finetune this by weighing certain nodes specifically, not just looking at the total fee in msats but in actual sats, by also optimising for short routes (in terms of # of hops and not just fees) etc.
  • Finding routes is slow, so if we can compute n routes and try them all without having to wait for another re-computation after a failure, it helps. Even better, if there was a way to thread a pathfinding service and have another thread probing/trying such routes, the speediness of rebalances would greatly improve.

All of this to say that I think that we are fine re-implementing all of that and adding flexibility to it on top of lnd on our own, but:

  • I would like not to re-invent the wheel, so I was seeking clarification on the direction being taken at the lnd level
  • it would be great, if generic pathfinding stayed in lnd, to have a better framework to plug new optimisers or whatever as plugins as opposed to rebuilding the whole thing on top of graph RPCs
@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Apr 19, 2019

Thanks for the clarification @wamde

Finding routes is slow, so if we can compute n routes and try them all without having to wait for another re-computation after a failure, it helps

In your experience, is calling QueryRoutes for 20 routes at once significantly faster than calling it 20 times and requesting just a single route in every call?

I am sensitive to the performance argument in general. There are probably many ways to speed up path finding. One good candidate is keeping the graph in memory, but this is quite a refactor which we started but haven't completed yet.

if there was a way to thread a pathfinding service and have another thread probing/trying such routes

These calls can be made concurrently, but ideally you only want to start the next path finding round when the previous probe has completed. To take into account the outcome of that probe.

I would like not to re-invent the wheel, so I was seeking clarification on the direction being taken at the lnd level

Yes understood. On the lnd level we want to offer the right tools. If ksp is a way around slow path finding, the right thing to do is to optimize path finding. Ksp started to take more maintenance effort than we'd like, so we decided to move it out. This loads some of the work onto the user, but it also creates an opportunity to implement it in a better, more tailored way perhaps.

would be great, if generic pathfinding stayed in lnd, to have a better framework to plug new optimisers or whatever as plugins as opposed to rebuilding the whole thing on top of graph RPCs

What do you mean with generic pathfinding exactly? We want to move the RPCs in a direction where they remain relatively simple, but still powerful enough to solve real world path finding problems. Any feedback on that is always welcome. A caller defined hop limit could also be an extension of the current api.

Plugins are not on the roadmap atm.

@LightningHelper
Copy link
Contributor

@LightningHelper LightningHelper commented Apr 25, 2019

Hi @joostjager this is really nice, and I think it is the right way to go giving the difficulty in path-finding.
Don't you also need to update the lncli?
https://github.com/lightningnetwork/lnd/blob/master/cmd/lncli/commands.go#L2994

@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Apr 26, 2019

@MrManPew As this is advanced functionality, we considered adding the flags to lncli low priority.

@wamde
Copy link

@wamde wamde commented Apr 28, 2019

Thanks for the clarification @wamde

Finding routes is slow, so if we can compute n routes and try them all without having to wait for another re-computation after a failure, it helps

In your experience, is calling QueryRoutes for 20 routes at once significantly faster than calling it 20 times and requesting just a single route in every call?

I can't answer this directly, but I can describe what we do:
We query n routes (can't remember what our default n is), try them, and if all of them fail, request another batch of n routes. That new batch takes a while to come, sometimes dozens of seconds.

I am sensitive to the performance argument in general. There are probably many ways to speed up path finding. One good candidate is keeping the graph in memory, but this is quite a refactor which we started but haven't completed yet.

Yes, that sounds like an obvious one, along with keeping heuristics from previous requests and SendToRoute attempts.

if there was a way to thread a pathfinding service and have another thread probing/trying such routes

These calls can be made concurrently, but ideally you only want to start the next path finding round when the previous probe has completed. To take into account the outcome of that probe.

Yes, I probably worded that wrong. I mean one thread to find paths, passing them as it finds them to another thread trying them in a sequential manner.
We could go further and probe routes with dummy preimages (I think, if I understood one of Alex's cryptic tweets correctly), and that could be parallelised, but it would probably require deeper interaction with the inner HTLC manager which I think we don't have access to using the API.

I would like not to re-invent the wheel, so I was seeking clarification on the direction being taken at the lnd level

Yes understood. On the lnd level we want to offer the right tools. If ksp is a way around slow path finding, the right thing to do is to optimize path finding. Ksp started to take more maintenance effort than we'd like, so we decided to move it out. This loads some of the work onto the user, but it also creates an opportunity to implement it in a better, more tailored way perhaps.

The current KSP is a bit slow and could certainly use heuristics and caching, but my main pet peeve is around its lack of flexibility. A recent PR added the source pubkey parameter, which is great, but I still think that for rebalancing use cases we need more flexibility.

would be great, if generic pathfinding stayed in lnd, to have a better framework to plug new optimisers or whatever as plugins as opposed to rebuilding the whole thing on top of graph RPCs

What do you mean with generic pathfinding exactly? We want to move the RPCs in a direction where they remain relatively simple, but still powerful enough to solve real world path finding problems. Any feedback on that is always welcome. A caller defined hop limit could also be an extension of the current api.

Hop limit is one way. To give you an example of what I am currently working on in a branch of rebalance-lnd, I compute the route's weight by using 25 sats * # of hops + total_fee. The 25 parameter can obviously be tweaked in the future, but it could also be used to reflect other preferences, per channel or node: depending on their uptime, level of trust, etc. If such a function could be re-defined via a plugin we could virtually implement any kind of path finding optimiser.
We could also set infinite weights on certain channels, effectively blacklisting them from the pathfinding algorithm. I do some of that in my path finder to effectively remove channels where n*amount > chan_capacity in order to reduce the amount of failures due to poorly balanced channels.

Plugins are not on the roadmap atm.

Understood.

@joostjager
Copy link
Collaborator Author

@joostjager joostjager commented Apr 29, 2019

I can't answer this directly, but I can describe what we do:
We query n routes (can't remember what our default n is), try them, and if all of them fail, request another batch of n routes. That new batch takes a while to come, sometimes dozens of seconds.

If you would request the routes one at a time, I expect that it would in total not take significantly more time than requesting n routes at once.

Assuming the performance is not degraded by the api change, do I understand it right that at the moment all the tools are there to implement KSP client side?

With regards to further influencing path finding, I would suggest to open a new issue describing the use case and the extra levers that you'd need. Redefining the ignored edges and nodes list as more of a weight penalty which can be set to infinity could be a direction to take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants