Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing+routerrpc: improve prob. estimation for untried connections #3462

Merged
merged 7 commits into from Oct 23, 2019

Conversation

@joostjager
Copy link
Collaborator

joostjager commented Sep 4, 2019

This PR changes mission control to partially base the estimated probability for untried connections on historical results obtained in previous payment attempts. This prevents the payment loop from getting stuck in an area of the network that is not of good quality. It also incentivizes routing nodes to keep all
of their channels in good shape.

The configuration parameter --routerrpc.aprioriweight is a value in the range [0, 1] that defines to what extent historical results should be extrapolated to untried connections. Setting it to 1 will completely ignore historical results and always assume the configured a priori probability for untried connections. A value of 0 will ignore the a priori probability completely and only base the probability on historical results, unless there are none available.

Using this parameter, senders can configure how aggressive nodes should be penalized. A low a priori weight will make mission control switch quickly to using a different node if a tried node doesn't perform well. The downside of this is that the sender may miss out on low fee channels that could possibly complete the payment successfully.

Indirectly, the collective of senders also shapes the network with their settings of this parameter. When senders start to demand a higher performance from routing nodes, routing nodes need to meet that expectation in order to keep generating revenue.

@joostjager joostjager requested review from cfromknecht and Roasbeef as code owners Sep 4, 2019
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch from 95a93c8 to 4d4e209 Sep 4, 2019
@joostjager joostjager requested review from halseth and removed request for Roasbeef and cfromknecht Sep 4, 2019
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch 2 times, most recently from b93992e to d80f08c Sep 5, 2019
@joostjager

This comment has been minimized.

Copy link
Collaborator Author

joostjager commented Sep 6, 2019

Did a bit of performance testing with and without these changes. With a clean mission control history, there is no noticeable difference. This is expected, because there are hardly iterations in the loop inside GetNodeProbability. For the test call to QueryRoutes, 330 nodes were visited and 7661 edges expanded.

To see a performance effect, it is probably needed to have a fully loaded mission control history. It is a question how likely this situation is, especially on low-powered devices where pathfinding performance becomes important. There is also a default limit to the mission control history of 1000 past payments, although this can be changed.

I tried a simulation of a full mission control by inserting 10 fake results for every node visited, but didn't see an effect either. Db access during path finding is likely still where time is mainly spent.

I'd propose to not prematurely add optimizations like caching of the node probability in this pr and first see what feedback we get on it from users. I've added another commit that logs useful performance metrics.

@joostjager joostjager removed the incomplete label Sep 6, 2019
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch from 03c6ad6 to 9614d33 Sep 11, 2019
routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
// The value of the apriori weight is in the range [0, 1]. Convert it to
// a factor that properly expresses the intention of the weight in the
// following weight average calculation.
aprioriFactor := 1/(1-m.cfg.AprioriWeight) - 1

This comment has been minimized.

Copy link
@halseth

halseth Sep 18, 2019

Collaborator

it's not entirely obvious to me what this formula is doing... care to add a more detailed comment on how the factor is different from the weight?

This comment has been minimized.

Copy link
@joostjager

joostjager Sep 18, 2019

Author Collaborator

explanation added

routing/missioncontrol.go Outdated Show resolved Hide resolved

return probability
now := m.now()
for _, result := range results {

This comment has been minimized.

Copy link
@halseth

halseth Sep 18, 2019

Collaborator

this is the critical change of this PR. Would be nice to have some examples on how the resulting probability will look like given a few results. Maybe the examples can be given as unit tests for this formula?

This comment has been minimized.

Copy link
@halseth

halseth Sep 18, 2019

Collaborator

In particular I'm interested in the scenarios we talked about earlier:
A node having 100s of untried channels, and 1 tried channel (success/failure)
A node having 100s of untried channels, and X tried channels (success/failure), where X is about 2-5
A node having few channels, and some result (say 3 channels, 2 failures/successes).

This comment has been minimized.

Copy link
@joostjager

joostjager Sep 18, 2019

Author Collaborator

Extracted probability estimation from mission control and added more unit tests.

The number of untried channels isn't a factor in the calculation at all, so I can't write a unit test for that. Probability estimates are only based on previous payment results.

routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch 5 times, most recently from ef1b2e9 to 9316e0f Sep 18, 2019
@joostjager joostjager requested a review from halseth Sep 18, 2019
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch from 9316e0f to 4c01b5a Sep 18, 2019
@joostjager joostjager requested a review from cfromknecht Sep 19, 2019
Copy link
Member

Roasbeef left a comment

Completed an initial surface-level pass. Will do another one focusing on the core estimation changes to ensure I fully grok them. Overall I think this is a step in the right direction, but wonder how we can test these assumptions in the wild, and also if it's too early to make a change like this given that as we know it, node operators aren't very sophisticated yet.

routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/result_interpretation.go Show resolved Hide resolved
i.pairResults[inPair.Reverse()] = pairResult{}

// If not the ultimate node, mark the outgoing connection as failed for
// the node.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Sep 20, 2019

Member

If we have A <-> B <-> C and B failed, the incoming is A -> B and outgoing is B -> C?

This comment has been minimized.

Copy link
@joostjager

joostjager Sep 27, 2019

Author Collaborator

Yes. Added helper variables to make this clearer.

routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
@joostjager

This comment has been minimized.

Copy link
Collaborator Author

joostjager commented Sep 20, 2019

Completed an initial surface-level pass. Will do another one focusing on the core estimation changes to ensure I fully grok them. Overall I think this is a step in the right direction, but wonder how we can test these assumptions in the wild, and also if it's too early to make a change like this given that as we know it, node operators aren't very sophisticated yet.

We can always set the default a priori weight quite high, so the effect of this will be light initially. Give some time for users and operators to become aware of the new scoring.

@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch from 4c01b5a to c315ee3 Sep 25, 2019
Copy link
Collaborator

halseth left a comment

I like the simplicity and conceived impact of this PR! I'm down with the current approach, deemed that we test it a bit to make sure we don't have any obvious regressions.

In addition to logging performance metrics as now, should we consider logging more (on trace) how the probabilities are calculated? I think we now log the final probability found for a channel, but not how it was calculated. Could maybe be useful to log whether it was calculated from a previous success, failure or node level failure?


// Mark the incoming connection as failed for the node.
inPair, _ := getPair(rt, idx-1)
i.pairResults[inPair.Reverse()] = pairResult{}

This comment has been minimized.

Copy link
@halseth

halseth Sep 27, 2019

Collaborator

set success=false in pairResult for clarity?

This comment has been minimized.

Copy link
@joostjager

joostjager Sep 27, 2019

Author Collaborator

If I remember correctly, there were comments in other prs about not adding redundant initializers. Not sure what you standard is here.

This comment has been minimized.

Copy link
@halseth

halseth Sep 30, 2019

Collaborator

Personally I like adding them (exceptions being nil initializers perhaps), and in this case I feel it makes sense, since an empty struct is not obvious what it means.

routing/missioncontrol.go Outdated Show resolved Hide resolved
m.lastNodeFailure[*i.nodeFailure] = result.timeReply
m.setAllPairResult(*i.nodeFailure, &timedPairResult{
timestamp: result.timeReply,
pairResult: pairResult{},

This comment has been minimized.

Copy link
@halseth

halseth Sep 27, 2019

Collaborator

same comment about setting success=false

routing/probability_estimator.go Outdated Show resolved Hide resolved
routing/probability_estimator_test.go Outdated Show resolved Hide resolved
routing/probability_estimator_test.go Show resolved Hide resolved
routing/probability_estimator_test.go Show resolved Hide resolved
routing/pathfind.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch 2 times, most recently from 00b3271 to 07e1ad1 Sep 30, 2019
@joostjager

This comment has been minimized.

Copy link
Collaborator Author

joostjager commented Sep 30, 2019

Moved new querymcprob call to #3556

Copy link
Collaborator

halseth left a comment

LGTM, but would be nice IMO to set success: false to more clearly indicate failure 😄

i.nodeFailure = &rt.Hops[idx-1].PubKeyBytes

// Mark the incoming connection as failed for the node.
incomingChannelIdx := idx - 1

This comment has been minimized.

Copy link
@halseth

halseth Sep 30, 2019

Collaborator

Can use this in indexing above.

This comment has been minimized.

Copy link
@joostjager
routing/result_interpretation.go Show resolved Hide resolved
lnrpc/routerrpc/router.proto Show resolved Hide resolved
routing/pathfind.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
@@ -446,8 +452,7 @@ func (m *MissionControl) applyPaymentResult(
"node=%v", *i.nodeFailure)

m.setAllPairResult(*i.nodeFailure, timedPairResult{
timestamp: result.timeReply,
pairResult: pairResult{},
timestamp: result.timeReply,

This comment has been minimized.

Copy link
@halseth

halseth Sep 30, 2019

Collaborator

don't you agree that just from looking at this call it is not obvious that we are recording a failure here?

This comment has been minimized.

Copy link
@joostjager

joostjager Sep 30, 2019

Author Collaborator

I've added constructors to make it more obvious without adding redundant field initializers.


reserved 3, 4, 5, 6;

PairData data = 7 [json_name="data"];

This comment has been minimized.

Copy link
@halseth

halseth Sep 30, 2019

Collaborator

name suggestion: history?

This comment has been minimized.

Copy link
@joostjager

joostjager Sep 30, 2019

Author Collaborator

Moved to #3556 and renamed

@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch from 07e1ad1 to ac28dd3 Sep 30, 2019
routing/result_interpretation.go Outdated Show resolved Hide resolved
routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/pathfind.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch from ac28dd3 to 414a5f1 Oct 4, 2019
@joostjager joostjager added the v0.9.0 label Oct 8, 2019
@joostjager joostjager added this to the 0.9 milestone Oct 8, 2019
routing/missioncontrol.go Show resolved Hide resolved
if idx < len(rt.Hops) {
outgoingChannelIdx := idx
outPair, _ := getPair(rt, outgoingChannelIdx)
i.pairResults[outPair] = failPairResult(0)

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Oct 9, 2019

Member

Why don't we thread through the amount of the HTLC that resulted in this failure at this stage?

This comment has been minimized.

Copy link
@joostjager

joostjager Oct 9, 2019

Author Collaborator

We don't need it for this PR. Here it is intentionally set to zero, because it should be a penalty to any amount, not just >= amt.

// Mark the incoming connection as failed for the node.
incomingChannelIdx := idx - 1
inPair, _ := getPair(rt, incomingChannelIdx)
i.pairResults[inPair.Reverse()] = failPairResult(0)

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Oct 9, 2019

Member

Why do we need to reverse the pair here? If I'm following, then the intention is to fail A -> B, not B <- A.

Prior context (in case it's hidden):

If we have A <-> B <-> C and B failed, the incoming is A -> B and outgoing is B -> C?

This comment has been minimized.

Copy link
@joostjager

joostjager Oct 9, 2019

Author Collaborator

We intent to penalize as much as we can for a node level failure, including future outgoing traffic for this connection. The pair as it is returned by getPair is directed towards the failed node. Therefore we first reverse the pair.

Added this as comment.

lnrpc/routerrpc/router.proto Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
routing/probability_estimator.go Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch 3 times, most recently from ed598c0 to a0b76b6 Oct 9, 2019
@joostjager joostjager requested a review from Roasbeef Oct 9, 2019
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch from a0b76b6 to 24e3299 Oct 9, 2019
@joostjager joostjager requested a review from guggero Oct 21, 2019
Copy link
Collaborator

guggero left a comment

Nice, clean PR! I was able to follow the changes quite easily even though the subject matter is very new to me. Only found a few nits in the code.

Not scope of this PR, just thoughts in general:
The main thing I'm not sure about are the memory and CPU time implications of these changes when we get to a larger number of channels/peers. It would be very useful if we had something like the "1 million channel project" for lnd.

routing/missioncontrol.go Outdated Show resolved Hide resolved
routing/missioncontrol_test.go Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
routing/probability_estimator.go Outdated Show resolved Hide resolved
routing/probability_estimator_test.go Outdated Show resolved Hide resolved
joostjager added 7 commits Sep 4, 2019
To make it explicit whether a failure or a success result is
instantiated.
This commit changes the in-memory structure of the mission control
state. It prepares for calculation of a node probability. For this we
need to be able to efficiently look up the last results for all channels
of a node.
This commit modifies the interpretation of node-level failures.
Previously only the failing node was marked. With this commit, also the
incoming and outgoing connections involved in the route are marked as
failed.

The change prepares for the removal of node-level failures in mission
control probability estimation.
Probability estimates are amount dependent. Previously we assumed an
amount, but that starts to make less sense when we make probability more
dependent on amounts in the future.
This commit changes mission control to partially base the estimated
probability for untried connections on historical results obtained in
previous payment attempts. This incentivizes routing nodes to keep all
of their channels in good shape.
@joostjager joostjager force-pushed the joostjager:mc-extrapolate branch from 24e3299 to 1fac41d Oct 22, 2019
Copy link
Member

Roasbeef left a comment

LGTM 🏹

@Roasbeef Roasbeef merged commit 8ed7583 into lightningnetwork:master Oct 23, 2019
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.03%) to 62.96%
Details
@joostjager joostjager added this to Done in v0.9.0-beta Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
v0.9.0-beta
  
Done
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.