-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
routing: use correct fees by searching backwards #1321
Conversation
@Roasbeef This is work in progress. Can you do an early check and tell me what you think of the general idea? |
What's this meant to fix? As they say, if it ain't broken, don't fix it...
…On Mon, Jun 4, 2018, 2:12 PM Joost Jager ***@***.***> wrote:
@Roasbeef <https://github.com/Roasbeef> This is work in progress. Can you
do a early check and tell me what you think?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1321 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA87LsfwLbUu6WF5rdkmkOul6iBJ4Dp5ks5t5aKzgaJpZM4UZybJ>
.
|
The original findPath implementation can come up with a path that is rejected in newRoute because of "channel graph has insufficient capacity". The reason is that findPath calculates with the same amt all the way, while newRoute takes into account that amt needs to increase with every hop (looking from target back to source). This change make findPath use the correct fees and eliminates the possibility that newRoute returns the insufficient capacity error. |
Gotcha, can you update the PR description, and also the commit messages with this information? Thanks! In the past this wasn't an issue as each instance kicked off a k-shortest paths routine, so the failing paths would be excluded with a |
The fee limit PR as it is now will fail finding a route if the lowest weight route exceeds the max fee. There may be a possible route, but it isn't found because the time/fee weight balance doesn't favor that route. When the fee limits are merged, I can update this PR to also take the fee limits into account already during findPath. This will lead to less failed route calculations, because the algorithm will continue exploring the graph for higher weight paths that do satisfy the fee limit (which means a higher total time lock). |
@joostjager can you add a test case that exercise the failure scenario you are trying to fix? Will help understanding the fix :) |
Yes, I am working on that. |
ecbbfd0
to
77f11f5
Compare
Quite some changes were needed to add this unit test in a maintainable way. I think trying to work towards a single test graph where many scenarios can be tested is better than setting up different graphs for different test cases. Also, better visualization of the test graph would be useful. Maybe replace the ascii art with a graphviz generation script. I see more potential work in the routing area, but will leave the scope of this PR as it is now to keep it manageable. Waiting for #1113 |
77f11f5
to
8b8a7eb
Compare
The first graph there is just the most basic one, the other graphs themselves are a bit more complex, and agreed that having a rendering of them would make it easier to set up tests, however, it's also pretty straight forward to add additional vertexes/edges to the basic graph at runtime. |
Yes indeed, runtime modifications is another option. Only makes it slightly more difficult to reason about what is happening. The graph jsons contain info that could also be generated runtime. Another idea would be to describe the graph with only the essential values. For example, only aliases and generate the pubkeys (deterministically) in memory when the test runs. It makes the graph description short and can probably be described in go code instead of a separate json. Then, for a special test case like the one described in this issue, it is not much effort to bring up a custom graph. This contradicts my comment above about working towards a single graph. It depends on the amount of effort required. |
Implemented alternative way to describe graph in #1358 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent observation! We might not see this routing failures that often, as long as fees are small compared to payment amounts, but totally agree that this is the way we should find paths correctly.
My main concern atm is that the path finding procedure does read a bit convoluted after these changes. Since we are shoehorning the target->source search into the existing block of code, variable names, map indexes, method names, comments and the steps taken now looks a bit off and is hard to follow. Instead we should try to do a bigger rewrite, where it is turned into a proper target->source search, where running amounts are kept as "distances". I don't think the changes should be that large.
You also mentioned that edge weights are a bit off since we are rounding to sat
some places instead of using msat
. I think it would be worthwhile to fix this, and make sure we are using msat
everywhere. How big of a change would this be?
routing/pathfind_test.go
Outdated
// the route search was executed backwards. | ||
{ target: "elst", paymentAmt: 100000, totalTimeLock: 103, | ||
expectedHops: []expectedHop{ | ||
{alias: "phamnuwen", fwdAmount: 100000200, fee: 100010000, timeLock: 102}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this to add up, maybe I'm missing something.
For a payment of 100,000,000 msat
, the expected fees should be:
sophon->elst: 200 msat + 100,000,000 msat * 0/1,000,000 msat = 200 msat
pham->sophon: 10,000 msat + 100,000,200 msat * 1,000,000/1,000,000 msat = 100,010,200 msat
We must send paymentAmt+fees = 100,000,000 msat + 200 msat + 100,010,200 msat = 200,010,400 msat
. This should not be able to go through the 120,000,000 msat
channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! This is indeed not correct. I hope that conversion to table-driven made this easier to spot. And I think it actually uncovers two existing bugs in newRoute on master:
- Fee calculation for a hop does not include the fee that needs to be paid to the next hop
- The incoming channel capacity "sanity" check does not include the fee to be paid to the current hop
I also made the expected total amount for a route explicit in the test table. Need to be careful with calculating expected values in unit tests, to prevent duplication of the same (incorrect logic). In this case, the calculation of the expected value was ok, but still.
See c21c980 for fixes.
With regards to you other comments:
|
39f1d23
to
eee132a
Compare
routing/pathfind_test.go
Outdated
name: "three hop with fee carry over", | ||
paymentAmount: 100000, | ||
feeRates: []lnwire.MilliSatoshi{10000, 10000, 10000}, | ||
expectedTotalAmount: 102010, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@halseth This test on master fails. newRoute on master returns 102000 for total amount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Does it make sense to add this test + fix to a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. The fee limit changes are dependent on the previous (backward search) work, but this one is independent.
285f3f9
to
76098dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really starting to like these changes! 😀
Table driven tests removes a lot of code duplication, and makes it easier to spot errors in the tests. 👏
Most of my comments are about code cleanup and style to make the routing algorithm easier to follow.
|
||
for _, testCase := range basicGraphPathFindingTests { | ||
t.Run(testCase.target, func(subT *testing.T) { | ||
testBasicGraphPathFindingCase(subT, graph, aliases, &testCase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome refactor! 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
routing/pathfind_test.go
Outdated
} | ||
|
||
var basicGraphPathFindingTests = []basicGraphPathFindingTestCase{ | ||
// Basic route with one intermediate hop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: period at end of sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
routing/pathfind_test.go
Outdated
{alias: "songoku", fwdAmount: 100000, fee: 110, timeLock: 101}, | ||
{alias: "sophon", fwdAmount: 100000, fee: 0, timeLock: 101}, | ||
}}, | ||
// Basic direct (one hop) route |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a newline between test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
routing/testdata/basic_graph.json
Outdated
" │ │ └──────┘ ", | ||
" │ │ ▲ ", | ||
" │ │ | 100k sat ", | ||
" │ │ | ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you forgot to add "▼" to the new channels 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
rpcserver.go
Outdated
@@ -3188,7 +3188,8 @@ func unmarshallRoute(rpcroute *lnrpc.Route, | |||
|
|||
routingHop := &routing.ChannelHop{ | |||
ChannelEdgePolicy: channelEdgePolicy, | |||
Capacity: btcutil.Amount(hop.ChanCapacity), | |||
Bandwidth: lnwire.NewMSatFromSatoshis( | |||
btcutil.Amount(hop.ChanCapacity)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, something weird about the formatting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
routing/pathfind.go
Outdated
if tempDist < distance[fromVertex].dist && bandwidth >= amountToSend && | ||
amountToSend >= edge.MinHTLC && edge.TimeLockDelta != 0 { | ||
|
||
amountToReceive := amountToSend + fee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining this calculation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added.
// distance map with a distance of 0. This indicates our starting | ||
// point in the graph traversal. | ||
sourceVertex := Vertex(sourceNode.PubKeyBytes) | ||
distance[sourceVertex] = nodeWithDist{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we must still make sure the source is added to the map with dist infinity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source node is already added in the graph.ForEachNode loop further up. The reason it was explictly added here previously, was to set (overwrite) the distance to 0. But because of backwards searching, this is no longer necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we are then assuming that the source node will always be returned from graph.ForEachNode
. That's a reasonable assumption to make, but I think it won't hurt to also handle the case where it's not.
Maybe that's fine anyway, because we will just finish our path finding, and then discover no path was found to the source node? Or do we depend on it being in the map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always returned. If you look at
Line 276 in b5a2288
node, err := fetchLightningNode(nodes, selfPub) |
it can be seen that even channeldb itself relies on the source pubkey being present in the node bucket.
If these semantics would change at some point, path finding would never find a path anymore. So the bug that would be created, will quickly surface.
I'd prefer not to add a check anyway, because it can only elicit questions from developer about the reason there might be for this to happen.
Comment added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable, agreed!
routing/pathfind.go
Outdated
// further our graph traversal. | ||
pivot := Vertex(bestNode.PubKeyBytes) | ||
err := bestNode.ForEachChannel(tx, func(tx *bolt.Tx, | ||
edgeInfo *channeldb.ChannelEdgeInfo, | ||
outEdge, _ *channeldb.ChannelEdgePolicy) error { | ||
outEdge *channeldb.ChannelEdgePolicy, inEdge *channeldb.ChannelEdgePolicy) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be written outEdge, inEdge *channeldb.ChannelEdgePolicy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
routing/pathfind.go
Outdated
pathEdges = append(pathEdges, prev[prevNode].edge) | ||
|
||
prevNode = Vertex(prev[prevNode].prevNode) | ||
// If the potential route is below the max hop limit, then we'll use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This comment should not mention the hop limit, as that is first checked below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Probably something that was moved earlier.
routing/pathfind_test.go
Outdated
@@ -616,6 +638,163 @@ func TestKShortestPathFinding(t *testing.T) { | |||
assertExpectedPath(t, paths[1], "roasbeef", "satoshi", "luoji") | |||
} | |||
|
|||
func TestNewRoute(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the newRoute
changes should be removed, since they were separated into a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is based upon the other PR, because I need the TestNewRoute function. Those changes are shown here to because the base of this PR is master. But really, all commits in this PR are stacked on top of the newRoute bugfix. Please let me know how to do this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
d60dff0
to
7e384c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting reallly close now, most comments about style, and removal of the (now) unnecessary DB migration.
Very excited to get this in!
autopilot/graph.go
Outdated
@@ -85,6 +85,11 @@ func (d dbNode) ForEachChannel(cb func(ChannelEdge) error) error { | |||
return d.node.ForEachChannel(d.tx, func(tx *bolt.Tx, | |||
ei *channeldb.ChannelEdgeInfo, ep, _ *channeldb.ChannelEdgePolicy) error { | |||
|
|||
// Skip channels for which no outgoing edge policy is available. | |||
if ep == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at where it is used, I think it should only need to know whether there's a channel or not between the nodes, not if the policy is known or not.
So in this case, we could extract both pubkeys from the ei
, and compare them to the current dbNode
to figure out the key of the peer.
channeldb/graph.go
Outdated
|
||
if err := putChanEdgePolicyUnknown(edges, edge.ChannelID, | ||
key[:]); err != nil { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To separate long lines like this, the first thing would be
err := putCha...
if err != nil {
return err
}
// connecting node. If the callback returns an error, then the iteration is | ||
// halted with the error propagated back up to the caller. | ||
// | ||
// Unknown policies are passed into the callback as nil values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
channeldb/graph.go
Outdated
incomingNode := toEdgePolicy.Node.PubKeyBytes[:] | ||
fromEdgePolicy, err := fetchChanEdgePolicy( | ||
edges, chanID, incomingNode, nodes, | ||
// Using the incoming node, the incoming edge policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! 😀
channeldb/graph.go
Outdated
@@ -2014,6 +2041,16 @@ func (c *ChannelEdgeInfo) BitcoinKey2() (*btcec.PublicKey, error) { | |||
return key, nil | |||
} | |||
|
|||
// OtherNodeKeyBytes returns the node key bytes of the other end of | |||
// the channel. | |||
func (c *ChannelEdgeInfo) OtherNodeKeyBytes(thisNodeKey []byte) [33]byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
return pureFee + timeLockPenalty | ||
return int64(fee) + timeLockPenalty | ||
} | ||
|
||
// findPath attempts to find a path from the source node within the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add to the method description here that we search backwards now, and why 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
// that are not known to us yet in the distance map. | ||
for vertex := range additionalEdges { | ||
additionalEdgesWithSrc := make(map[Vertex][]*edgePolicyWithSource) | ||
for vertex, outgoingEdgePolicies := range additionalEdges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we probably shouldn't change ChannelEdgePolicy
, as it directly maps to the wire messages being sent on the network. I'm fine with keeping this as is now, as I agree that this could be considered an implementation detail the caller shouldn't need to worry about.
routing/pathfind.go
Outdated
|
||
// If the estimated band width of the channel edge is not able | ||
// to carry the amount that needs to be send, return. | ||
if bandwidth < amountToSend { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be checked earlier in processEdge
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved up
|
||
// If the amountToSend is less than the minimum required amount, | ||
// return. | ||
if amountToSend < edge.MinHTLC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, could check earlier for an early return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved up
routing/pathfind_test.go
Outdated
expectedHops: []expectedHop{ | ||
{alias: "luoji", fwdAmount: 100000, fee: 0, timeLock: 101}, | ||
}}, | ||
// Basic route with one intermediate hop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now!
autopilot/graph.go
Outdated
if err != nil { | ||
return err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, semantics change compared to master, but I think it is better like this. As you say, policies are not required for the auto pilot.
But the change become bigger than I liked, because of the lookup of LightningNode
which was previously done inside LightningNode.ForEachChannel
. Needed to gain access to db
somehow.
I kept the changes apart in commit d258efe. Maybe there is a cleaner way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yeah this might not even work, since calling FetchLightningNode
inside an already open DB transaction might cause problems.
I don't see an immediate way of achieving what we want without doing more changes to the database to make it possible to fetch the node without the edge policy. I'm okay with keeping your original implementation (skipping a nil-policy) with an added TODO saying that we ideally should support the case where the policy is nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, pulling out that commit.
@halseth I think we have resolved all points now. |
channeldb/graph.go
Outdated
// with the node's public key and sends with the compact edge ID. | ||
// For each chanID, there will be two entries within the bucket, as the | ||
// graph is directed: nodes may have different policies w.r.t to fees | ||
// for their respective directions. An unknown policy is represented as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
channeldb/graph.go
Outdated
// | ||
// maps: pubKey || edgeID -> edge policy for node | ||
// maps: pubKey || chanID -> channel edge policy for node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I thought edgePolicies
were stored in a bucket in addition to this. So I can understand why we were talking about different things now.. 😛 The final solution you came up with looks good to me though 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that explains it 😄
channeldb/graph.go
Outdated
if edgeBytes := edges.Get(edgeKey[:]); edgeBytes != nil { | ||
// An unknown policy value does not have a update time recorded, so | ||
// it also does not need to be removed. | ||
if edgeBytes := edges.Get(edgeKey[:]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be even safer to check !bytes.Equals(edgeBytes, unknownPolicy)
(unknown policy wouldn't be required to be empty slice), where unknownPolicy
is a constant defined as the empty slice.
channeldb/graph.go
Outdated
@@ -2836,6 +2890,11 @@ func fetchChanEdgePolicy(edges *bolt.Bucket, chanID []byte, | |||
return nil, ErrEdgeNotFound | |||
} | |||
|
|||
// No need to deserialize unknown policy. | |||
if len(edgeBytes) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, compare agains unknownPolicy
constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this is addressed this PR can be squashed and rebased, and I think it should be good 👍
channeldb/migrations.go
Outdated
_, err := fetchChanEdgePolicy(edges, | ||
channelID[:], keyBytes[:], nodes) | ||
|
||
if err == ErrEdgeNotFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of other error we must return it such that the migration can be aborted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one, fixed.
channeldb/migrations.go
Outdated
func migrateEdgePolicies(tx *bolt.Tx) error { | ||
nodes := tx.Bucket(nodeBucket) | ||
if nodes == nil { | ||
return ErrGraphNotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can safely return nil
if the buckets are not found? We are probably then trying to migrate an empty DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed.
channeldb/migrations.go
Outdated
log.Tracef("Adding unknown edge policy present for node %x, channel %v", | ||
keyBytes, channelId) | ||
|
||
putChanEdgePolicyUnknown(edges, channelId, keyBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error must be handled here also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, added.
|
Rebased after gofmt commits on master |
The commit ensures that for every channel, there will always be two entries in the edges bucket. If the policy from one or both ends of the channel is unknown, it is marked as such. This allows efficient lookup of incoming edges. This is required for backwards payment path finding.
Amended all commits with gofmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks mostly LGTM now, awesome work! Should get review from someone else before merge, however.
channeldb/graph.go
Outdated
if edgeBytes := edges.Get(edgeKey[:]); edgeBytes != nil { | ||
// An unknown policy value does not have a update time recorded, so | ||
// it also does not need to be removed. | ||
if edgeBytes := edges.Get(edgeKey[:]); edgeBytes != nil && !bytes.Equal(edgeBytes[:], unknownPolicy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split over two lines
channeldb/graph.go
Outdated
|
||
var incomingPolicy, outgoingPolicy *ChannelEdgePolicy | ||
|
||
if policy1 != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I suggested earlier to do it this way, but I didn't anticipate we would need these switch statements 😛
Could we instead do:
chanID := nodeEdge[33:]
edgeInfo, err := fetchChanEdgeInfo(edgeIndex, chanID)
if err != nil {
return err
}
toEdgePolicy, err := fetchChanEdgePolicy(
edges, chanID, nodeNub, nodes,
)
...
otherNode, err := edgeInfo.OtherNodeKeyBytes(nodePub)
if err != nil {
return err
}
toEdgePolicy, err := fetchChanEdgePolicy(
edges, chanID, otherNode, nodes,
)
(or just deserialize the toEdgePolicy
directly if first checked if not unknownPolicy
, as you suggested initially)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is better indeed than the ugly switches. Glad to hear to my initial creation was not so bad 😉 but I agree with you that readability and not duplicating code is important too.
channeldb/graph.go
Outdated
@@ -2899,6 +2943,20 @@ func putChanEdgePolicy(edges *bolt.Bucket, edge *ChannelEdgePolicy, from, to []b | |||
return edges.Put(edgeKey[:], b.Bytes()[:]) | |||
} | |||
|
|||
// putChanEdgePolicyUnknown marks the edge policy as unknown in the edges bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: both lines a bit long
Review comments addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! This PR includes quite a few nice clean ups w.r. to the testing code, and fixes a few legacy bugs related to incorrectly applying the fee limit during path finding.
The PR looks mostly good to me, however I noticed that in one area, it'll create a DB transaction inside of an existing one. Instead, it should carry over the same transaction as in the past, this has lead to deadlocks, and also creating a new transaction is an expensive activity. This is my only major comment in this PR.
I'll also move to start running this on a few nodes doing path finding, just to see if anything weird pops up.
channeldb/graph.go
Outdated
if err != nil && err != ErrEdgeNotFound && | ||
err != ErrGraphNodeNotFound { | ||
return err | ||
policy1, policy2, err := fetchChanEdgePolicies(edgeIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgetting to check the error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the edge info here, it's the value of the cursor seek, why fetch it again? Not blocking, more of a nit that code seems to have been unnecessarily changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially it was unchanged. I modified it after discussion with @halseth for readability (over performance), see comments above.
channeldb/graph.go
Outdated
@@ -2824,6 +2864,20 @@ func putChanEdgePolicy(edges *bolt.Bucket, edge *ChannelEdgePolicy, from, to []b | |||
return edges.Put(edgeKey[:], b.Bytes()[:]) | |||
} | |||
|
|||
// putChanEdgePolicyUnknown marks the edge policy as unknown in the edges bucket. | |||
func putChanEdgePolicyUnknown(edges *bolt.Bucket, channelID uint64, from []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
for _, testCase := range basicGraphPathFindingTests { | ||
t.Run(testCase.target, func(subT *testing.T) { | ||
testBasicGraphPathFindingCase(subT, graph, aliases, &testCase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
channeldb/graph.go
Outdated
return fmt.Errorf("Unexpected node in policy") | ||
} | ||
} | ||
incomingPolicy, err := fetchChanEdgePolicy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the error here and above.
routing/pathfind.go
Outdated
// If the edge has no time lock delta, the payment will always | ||
// fail, so return. | ||
|
||
// TODO(joostjager): Is this really true? Can't it be that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's just a dumb value to set, as it more or less guarantees that the node won't be able to properly handle a timed out HTLC, and will likely lose money.
@@ -684,13 +798,6 @@ func findPath(tx *bolt.Tx, graph *channeldb.ChannelGraph, | |||
"too many hops") | |||
} | |||
|
|||
// As our traversal of the prev map above walked backwards from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, no more reversal!
return err | ||
} | ||
|
||
// Lookup the full node details in order to be able to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit dangerous and could possibly lead to a deadlock: we're creating a new DB transaction inside of an existing db transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bolt allow concurrent readers, but I'm weary of making additional transactions inside of callbacks like this. We started to expose the transaction object in these methods to safely allow nested calls without creating new database transactions. We could possibly add methods onto the EdgeInfo
struct, which accepts the existing transaction (or if nil creates a new one), in order to allow the caller to fetch the full LightningNode
struct without creating a new database transaction.
Actually, I'll merge this as is, and then add a follow up commit to avoid the double DB transaction issue. Thanks for bearing with us through this review process! The final PR really turned out well! |
Committed as 29b6bae! |
I am happy that this one has been merged now. Will work on the follow-up. |
Ah, I see you fixed the open issues yourselves. I was thinking of adding a tx parameter to FetchLightningNode, but this works too of course. |
The original findPath implementation can come up with a path that is rejected in newRoute because of "channel graph has insufficient capacity". The reason is that findPath calculates with the same amt all the way, while newRoute takes into account that amt needs to increase with every hop (looking from target back to source).
This change makes findPath use the correct fees and eliminates the possibility that newRoute returns the insufficient capacity error.
In this PR: