Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing+routerrpc: improve prob. estimation for untried connections #3462

Merged
merged 7 commits into from
Oct 23, 2019

Conversation

joostjager
Copy link
Contributor

@joostjager joostjager commented Sep 4, 2019

This PR changes mission control to partially base the estimated probability for untried connections on historical results obtained in previous payment attempts. This prevents the payment loop from getting stuck in an area of the network that is not of good quality. It also incentivizes routing nodes to keep all
of their channels in good shape.

The configuration parameter --routerrpc.aprioriweight is a value in the range [0, 1] that defines to what extent historical results should be extrapolated to untried connections. Setting it to 1 will completely ignore historical results and always assume the configured a priori probability for untried connections. A value of 0 will ignore the a priori probability completely and only base the probability on historical results, unless there are none available.

Using this parameter, senders can configure how aggressive nodes should be penalized. A low a priori weight will make mission control switch quickly to using a different node if a tried node doesn't perform well. The downside of this is that the sender may miss out on low fee channels that could possibly complete the payment successfully.

Indirectly, the collective of senders also shapes the network with their settings of this parameter. When senders start to demand a higher performance from routing nodes, routing nodes need to meet that expectation in order to keep generating revenue.

@joostjager joostjager added incomplete WIP PR, not fully finalized, but light review possible mission control payments Related to invoices/payments routing labels Sep 4, 2019
@joostjager joostjager requested review from halseth and removed request for Roasbeef and cfromknecht September 4, 2019 16:28
@joostjager joostjager force-pushed the mc-extrapolate branch 2 times, most recently from b93992e to d80f08c Compare September 6, 2019 06:50
@joostjager
Copy link
Contributor Author

Did a bit of performance testing with and without these changes. With a clean mission control history, there is no noticeable difference. This is expected, because there are hardly iterations in the loop inside GetNodeProbability. For the test call to QueryRoutes, 330 nodes were visited and 7661 edges expanded.

To see a performance effect, it is probably needed to have a fully loaded mission control history. It is a question how likely this situation is, especially on low-powered devices where pathfinding performance becomes important. There is also a default limit to the mission control history of 1000 past payments, although this can be changed.

I tried a simulation of a full mission control by inserting 10 fake results for every node visited, but didn't see an effect either. Db access during path finding is likely still where time is mainly spent.

I'd propose to not prematurely add optimizations like caching of the node probability in this pr and first see what feedback we get on it from users. I've added another commit that logs useful performance metrics.

@joostjager joostjager removed the incomplete WIP PR, not fully finalized, but light review possible label Sep 6, 2019
routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
// The value of the apriori weight is in the range [0, 1]. Convert it to
// a factor that properly expresses the intention of the weight in the
// following weight average calculation.
aprioriFactor := 1/(1-m.cfg.AprioriWeight) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not entirely obvious to me what this formula is doing... care to add a more detailed comment on how the factor is different from the weight?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explanation added

routing/missioncontrol.go Outdated Show resolved Hide resolved

return probability
now := m.now()
for _, result := range results {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the critical change of this PR. Would be nice to have some examples on how the resulting probability will look like given a few results. Maybe the examples can be given as unit tests for this formula?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular I'm interested in the scenarios we talked about earlier:
A node having 100s of untried channels, and 1 tried channel (success/failure)
A node having 100s of untried channels, and X tried channels (success/failure), where X is about 2-5
A node having few channels, and some result (say 3 channels, 2 failures/successes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted probability estimation from mission control and added more unit tests.

The number of untried channels isn't a factor in the calculation at all, so I can't write a unit test for that. Probability estimates are only based on previous payment results.

routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the mc-extrapolate branch 5 times, most recently from ef1b2e9 to 9316e0f Compare September 18, 2019 14:54
@joostjager joostjager requested a review from halseth September 18, 2019 15:00
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed an initial surface-level pass. Will do another one focusing on the core estimation changes to ensure I fully grok them. Overall I think this is a step in the right direction, but wonder how we can test these assumptions in the wild, and also if it's too early to make a change like this given that as we know it, node operators aren't very sophisticated yet.

routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/result_interpretation.go Show resolved Hide resolved
i.pairResults[inPair.Reverse()] = pairResult{}

// If not the ultimate node, mark the outgoing connection as failed for
// the node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have A <-> B <-> C and B failed, the incoming is A -> B and outgoing is B -> C?

Copy link
Contributor Author

@joostjager joostjager Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Added helper variables to make this clearer.

routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
@joostjager
Copy link
Contributor Author

Completed an initial surface-level pass. Will do another one focusing on the core estimation changes to ensure I fully grok them. Overall I think this is a step in the right direction, but wonder how we can test these assumptions in the wild, and also if it's too early to make a change like this given that as we know it, node operators aren't very sophisticated yet.

We can always set the default a priori weight quite high, so the effect of this will be light initially. Give some time for users and operators to become aware of the new scoring.

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity and conceived impact of this PR! I'm down with the current approach, deemed that we test it a bit to make sure we don't have any obvious regressions.

In addition to logging performance metrics as now, should we consider logging more (on trace) how the probabilities are calculated? I think we now log the final probability found for a channel, but not how it was calculated. Could maybe be useful to log whether it was calculated from a previous success, failure or node level failure?


// Mark the incoming connection as failed for the node.
inPair, _ := getPair(rt, idx-1)
i.pairResults[inPair.Reverse()] = pairResult{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set success=false in pairResult for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, there were comments in other prs about not adding redundant initializers. Not sure what you standard is here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I like adding them (exceptions being nil initializers perhaps), and in this case I feel it makes sense, since an empty struct is not obvious what it means.

routing/missioncontrol.go Outdated Show resolved Hide resolved
m.lastNodeFailure[*i.nodeFailure] = result.timeReply
m.setAllPairResult(*i.nodeFailure, &timedPairResult{
timestamp: result.timeReply,
pairResult: pairResult{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about setting success=false

routing/probability_estimator.go Outdated Show resolved Hide resolved
routing/probability_estimator_test.go Outdated Show resolved Hide resolved
routing/probability_estimator_test.go Show resolved Hide resolved
routing/probability_estimator_test.go Show resolved Hide resolved
routing/pathfind.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the mc-extrapolate branch 3 times, most recently from 00b3271 to 07e1ad1 Compare September 30, 2019 11:07
@joostjager
Copy link
Contributor Author

joostjager commented Sep 30, 2019

Moved new querymcprob call to #3556

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would be nice IMO to set success: false to more clearly indicate failure 😄

i.nodeFailure = &rt.Hops[idx-1].PubKeyBytes

// Mark the incoming connection as failed for the node.
incomingChannelIdx := idx - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use this in indexing above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routing/result_interpretation.go Show resolved Hide resolved
lnrpc/routerrpc/router.proto Show resolved Hide resolved
routing/pathfind.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
@@ -446,8 +452,7 @@ func (m *MissionControl) applyPaymentResult(
"node=%v", *i.nodeFailure)

m.setAllPairResult(*i.nodeFailure, timedPairResult{
timestamp: result.timeReply,
pairResult: pairResult{},
timestamp: result.timeReply,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you agree that just from looking at this call it is not obvious that we are recording a failure here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added constructors to make it more obvious without adding redundant field initializers.


reserved 3, 4, 5, 6;

PairData data = 7 [json_name="data"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name suggestion: history?

Copy link
Contributor Author

@joostjager joostjager Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #3556 and renamed

routing/result_interpretation.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/pathfind.go Outdated Show resolved Hide resolved
@joostjager joostjager added this to the 0.9 milestone Oct 8, 2019
routing/missioncontrol.go Show resolved Hide resolved
if idx < len(rt.Hops) {
outgoingChannelIdx := idx
outPair, _ := getPair(rt, outgoingChannelIdx)
i.pairResults[outPair] = failPairResult(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we thread through the amount of the HTLC that resulted in this failure at this stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it for this PR. Here it is intentionally set to zero, because it should be a penalty to any amount, not just >= amt.

// Mark the incoming connection as failed for the node.
incomingChannelIdx := idx - 1
inPair, _ := getPair(rt, incomingChannelIdx)
i.pairResults[inPair.Reverse()] = failPairResult(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to reverse the pair here? If I'm following, then the intention is to fail A -> B, not B <- A.

Prior context (in case it's hidden):

If we have A <-> B <-> C and B failed, the incoming is A -> B and outgoing is B -> C?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intent to penalize as much as we can for a node level failure, including future outgoing traffic for this connection. The pair as it is returned by getPair is directed towards the failed node. Therefore we first reverse the pair.

Added this as comment.

lnrpc/routerrpc/router.proto Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
routing/probability_estimator.go Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the mc-extrapolate branch 3 times, most recently from ed598c0 to a0b76b6 Compare October 9, 2019 12:09
@joostjager joostjager requested a review from Roasbeef October 9, 2019 12:10
@joostjager joostjager requested a review from guggero October 21, 2019 09:13
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, clean PR! I was able to follow the changes quite easily even though the subject matter is very new to me. Only found a few nits in the code.

Not scope of this PR, just thoughts in general:
The main thing I'm not sure about are the memory and CPU time implications of these changes when we get to a larger number of channels/peers. It would be very useful if we had something like the "1 million channel project" for lnd.

routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol_test.go Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
routing/probability_estimator_test.go Outdated Show resolved Hide resolved
To make it explicit whether a failure or a success result is
instantiated.
This commit changes the in-memory structure of the mission control
state. It prepares for calculation of a node probability. For this we
need to be able to efficiently look up the last results for all channels
of a node.
This commit modifies the interpretation of node-level failures.
Previously only the failing node was marked. With this commit, also the
incoming and outgoing connections involved in the route are marked as
failed.

The change prepares for the removal of node-level failures in mission
control probability estimation.
Probability estimates are amount dependent. Previously we assumed an
amount, but that starts to make less sense when we make probability more
dependent on amounts in the future.
This commit changes mission control to partially base the estimated
probability for untried connections on historical results obtained in
previous payment attempts. This incentivizes routing nodes to keep all
of their channels in good shape.
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🏹

@Roasbeef Roasbeef merged commit 8ed7583 into lightningnetwork:master Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mission control payments Related to invoices/payments routing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants