Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing: use correct fees by searching backwards #1321

Closed
wants to merge 5 commits into from

Conversation

joostjager
Copy link
Collaborator

@joostjager joostjager commented Jun 4, 2018

The original findPath implementation can come up with a path that is rejected in newRoute because of "channel graph has insufficient capacity". The reason is that findPath calculates with the same amt all the way, while newRoute takes into account that amt needs to increase with every hop (looking from target back to source).

This change makes findPath use the correct fees and eliminates the possibility that newRoute returns the insufficient capacity error.

In this PR:

  • Refactor of the path finding unit test to be able add more cases without code duplication (table driven).
  • Unit test for invalid "channel graph has insufficient capacity" result
  • Reworked search algorithm to already take into account increasing amounts that need to be send to next hops

@joostjager
Copy link
Collaborator Author

joostjager commented Jun 4, 2018

@Roasbeef This is work in progress. Can you do an early check and tell me what you think of the general idea?

@Roasbeef
Copy link
Member

Roasbeef commented Jun 4, 2018 via email

@joostjager
Copy link
Collaborator Author

The original findPath implementation can come up with a path that is rejected in newRoute because of "channel graph has insufficient capacity". The reason is that findPath calculates with the same amt all the way, while newRoute takes into account that amt needs to increase with every hop (looking from target back to source).

This change make findPath use the correct fees and eliminates the possibility that newRoute returns the insufficient capacity error.

@Roasbeef
Copy link
Member

Roasbeef commented Jun 4, 2018

Gotcha, can you update the PR description, and also the commit messages with this information? Thanks!

In the past this wasn't an issue as each instance kicked off a k-shortest paths routine, so the failing paths would be excluded with a continue. We'll want to first get in the PR which adds fee limits to the path finding routine (say you don't want to pay more than 2% or w/e), as that will affect this implementation.

@joostjager
Copy link
Collaborator Author

The fee limit PR as it is now will fail finding a route if the lowest weight route exceeds the max fee. There may be a possible route, but it isn't found because the time/fee weight balance doesn't favor that route.

When the fee limits are merged, I can update this PR to also take the fee limits into account already during findPath. This will lead to less failed route calculations, because the algorithm will continue exploring the graph for higher weight paths that do satisfy the fee limit (which means a higher total time lock).

@halseth
Copy link
Contributor

halseth commented Jun 6, 2018

@joostjager can you add a test case that exercise the failure scenario you are trying to fix? Will help understanding the fix :)

@joostjager
Copy link
Collaborator Author

Yes, I am working on that.

@joostjager joostjager force-pushed the routing branch 2 times, most recently from ecbbfd0 to 77f11f5 Compare June 7, 2018 11:59
@joostjager
Copy link
Collaborator Author

Quite some changes were needed to add this unit test in a maintainable way. I think trying to work towards a single test graph where many scenarios can be tested is better than setting up different graphs for different test cases.

Also, better visualization of the test graph would be useful. Maybe replace the ascii art with a graphviz generation script.

I see more potential work in the routing area, but will leave the scope of this PR as it is now to keep it manageable. Waiting for #1113

@joostjager joostjager force-pushed the routing branch 2 times, most recently from 77f11f5 to 8b8a7eb Compare June 7, 2018 13:19
@Roasbeef
Copy link
Member

Roasbeef commented Jun 8, 2018

Also, better visualization of the test graph would be useful. Maybe replace the ascii art with a graphviz generation script.

The first graph there is just the most basic one, the other graphs themselves are a bit more complex, and agreed that having a rendering of them would make it easier to set up tests, however, it's also pretty straight forward to add additional vertexes/edges to the basic graph at runtime.

@joostjager
Copy link
Collaborator Author

Yes indeed, runtime modifications is another option. Only makes it slightly more difficult to reason about what is happening.

The graph jsons contain info that could also be generated runtime. Another idea would be to describe the graph with only the essential values. For example, only aliases and generate the pubkeys (deterministically) in memory when the test runs. It makes the graph description short and can probably be described in go code instead of a separate json. Then, for a special test case like the one described in this issue, it is not much effort to bring up a custom graph.

This contradicts my comment above about working towards a single graph. It depends on the amount of effort required.

@joostjager
Copy link
Collaborator Author

Implemented alternative way to describe graph in #1358

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent observation! We might not see this routing failures that often, as long as fees are small compared to payment amounts, but totally agree that this is the way we should find paths correctly.

My main concern atm is that the path finding procedure does read a bit convoluted after these changes. Since we are shoehorning the target->source search into the existing block of code, variable names, map indexes, method names, comments and the steps taken now looks a bit off and is hard to follow. Instead we should try to do a bigger rewrite, where it is turned into a proper target->source search, where running amounts are kept as "distances". I don't think the changes should be that large.

You also mentioned that edge weights are a bit off since we are rounding to sat some places instead of using msat. I think it would be worthwhile to fix this, and make sure we are using msat everywhere. How big of a change would this be?

// the route search was executed backwards.
{ target: "elst", paymentAmt: 100000, totalTimeLock: 103,
expectedHops: []expectedHop{
{alias: "phamnuwen", fwdAmount: 100000200, fee: 100010000, timeLock: 102},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this to add up, maybe I'm missing something.

For a payment of 100,000,000 msat, the expected fees should be:
sophon->elst: 200 msat + 100,000,000 msat * 0/1,000,000 msat = 200 msat
pham->sophon: 10,000 msat + 100,000,200 msat * 1,000,000/1,000,000 msat = 100,010,200 msat

We must send paymentAmt+fees = 100,000,000 msat + 200 msat + 100,010,200 msat = 200,010,400 msat. This should not be able to go through the 120,000,000 msat channel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! This is indeed not correct. I hope that conversion to table-driven made this easier to spot. And I think it actually uncovers two existing bugs in newRoute on master:

  • Fee calculation for a hop does not include the fee that needs to be paid to the next hop
  • The incoming channel capacity "sanity" check does not include the fee to be paid to the current hop

I also made the expected total amount for a route explicit in the test table. Need to be careful with calculating expected values in unit tests, to prevent duplication of the same (incorrect logic). In this case, the calculation of the expected value was ok, but still.

See c21c980 for fixes.

@joostjager
Copy link
Collaborator Author

joostjager commented Jun 12, 2018

With regards to you other comments:

  • Naming and structure: Now that we are on the same page about the general idea of this PR, I will start that kind of finishing work.

    One problem that I encountered in making this as simple as possible, is that ChannelEdgePolicy does not contain the "from" node. Therefore I needed to create a reverse lookup table and pass an extra argument to processNode. Do you think it would be good to add this property to ChannelEdgePolicy? Or, because it is currently specific to the backwards search (I think), deal with it in findPath and accept the extra complexity?

  • I will look at msat vs sat. But channel capacities are an on-chain thing expressed in whole satoshis and I think it is good to keep that explicit by using btcutil.amount. Where did I say that edge weights are a bit off because of rounding? I know that the channel capacity check was not correct because of rounding, but don't think it affects the weights. Edge weights (on master) are indeed a bit off, but for another reason. And that is again that there is no running amount and fees (and therefore weights) are only based on payment amount.

@joostjager joostjager force-pushed the routing branch 3 times, most recently from 39f1d23 to eee132a Compare June 12, 2018 12:29
name: "three hop with fee carry over",
paymentAmount: 100000,
feeRates: []lnwire.MilliSatoshi{10000, 10000, 10000},
expectedTotalAmount: 102010,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halseth This test on master fails. newRoute on master returns 102000 for total amount.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Does it make sense to add this test + fix to a separate PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. The fee limit changes are dependent on the previous (backward search) work, but this one is independent.

@joostjager joostjager force-pushed the routing branch 5 times, most recently from 285f3f9 to 76098dd Compare June 13, 2018 10:09
Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really starting to like these changes! 😀

Table driven tests removes a lot of code duplication, and makes it easier to spot errors in the tests. 👏

Most of my comments are about code cleanup and style to make the routing algorithm easier to follow.


for _, testCase := range basicGraphPathFindingTests {
t.Run(testCase.target, func(subT *testing.T) {
testBasicGraphPathFindingCase(subT, graph, aliases, &testCase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome refactor! 💯

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

var basicGraphPathFindingTests = []basicGraphPathFindingTestCase{
// Basic route with one intermediate hop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: period at end of sentences.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

{alias: "songoku", fwdAmount: 100000, fee: 110, timeLock: 101},
{alias: "sophon", fwdAmount: 100000, fee: 0, timeLock: 101},
}},
// Basic direct (one hop) route
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a newline between test cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

" │ │ └──────┘ ",
" │ │ ",
" │ │ | 100k sat ",
" │ │ | ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you forgot to add "▼" to the new channels 😛

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

rpcserver.go Outdated
@@ -3188,7 +3188,8 @@ func unmarshallRoute(rpcroute *lnrpc.Route,

routingHop := &routing.ChannelHop{
ChannelEdgePolicy: channelEdgePolicy,
Capacity: btcutil.Amount(hop.ChanCapacity),
Bandwidth: lnwire.NewMSatFromSatoshis(
btcutil.Amount(hop.ChanCapacity)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, something weird about the formatting here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if tempDist < distance[fromVertex].dist && bandwidth >= amountToSend &&
amountToSend >= edge.MinHTLC && edge.TimeLockDelta != 0 {

amountToReceive := amountToSend + fee
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining this calculation :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.

// distance map with a distance of 0. This indicates our starting
// point in the graph traversal.
sourceVertex := Vertex(sourceNode.PubKeyBytes)
distance[sourceVertex] = nodeWithDist{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must still make sure the source is added to the map with dist infinity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source node is already added in the graph.ForEachNode loop further up. The reason it was explictly added here previously, was to set (overwrite) the distance to 0. But because of backwards searching, this is no longer necessary.

Copy link
Contributor

@halseth halseth Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we are then assuming that the source node will always be returned from graph.ForEachNode. That's a reasonable assumption to make, but I think it won't hurt to also handle the case where it's not.

Maybe that's fine anyway, because we will just finish our path finding, and then discover no path was found to the source node? Or do we depend on it being in the map?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is always returned. If you look at

node, err := fetchLightningNode(nodes, selfPub)

it can be seen that even channeldb itself relies on the source pubkey being present in the node bucket.

If these semantics would change at some point, path finding would never find a path anymore. So the bug that would be created, will quickly surface.

I'd prefer not to add a check anyway, because it can only elicit questions from developer about the reason there might be for this to happen.

Comment added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable, agreed!

// further our graph traversal.
pivot := Vertex(bestNode.PubKeyBytes)
err := bestNode.ForEachChannel(tx, func(tx *bolt.Tx,
edgeInfo *channeldb.ChannelEdgeInfo,
outEdge, _ *channeldb.ChannelEdgePolicy) error {
outEdge *channeldb.ChannelEdgePolicy, inEdge *channeldb.ChannelEdgePolicy) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be written outEdge, inEdge *channeldb.ChannelEdgePolicy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

pathEdges = append(pathEdges, prev[prevNode].edge)

prevNode = Vertex(prev[prevNode].prevNode)
// If the potential route is below the max hop limit, then we'll use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment should not mention the hop limit, as that is first checked below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Probably something that was moved earlier.

@@ -616,6 +638,163 @@ func TestKShortestPathFinding(t *testing.T) {
assertExpectedPath(t, paths[1], "roasbeef", "satoshi", "luoji")
}

func TestNewRoute(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the newRoute changes should be removed, since they were separated into a new PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is based upon the other PR, because I need the TestNewRoute function. Those changes are shown here to because the base of this PR is master. But really, all commits in this PR are stacked on top of the newRoute bugfix. Please let me know how to do this better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@joostjager joostjager force-pushed the routing branch 3 times, most recently from d60dff0 to 7e384c6 Compare July 28, 2018 12:41
Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting reallly close now, most comments about style, and removal of the (now) unnecessary DB migration.

Very excited to get this in!

@@ -85,6 +85,11 @@ func (d dbNode) ForEachChannel(cb func(ChannelEdge) error) error {
return d.node.ForEachChannel(d.tx, func(tx *bolt.Tx,
ei *channeldb.ChannelEdgeInfo, ep, _ *channeldb.ChannelEdgePolicy) error {

// Skip channels for which no outgoing edge policy is available.
if ep == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at where it is used, I think it should only need to know whether there's a channel or not between the nodes, not if the policy is known or not.

So in this case, we could extract both pubkeys from the ei, and compare them to the current dbNode to figure out the key of the peer.


if err := putChanEdgePolicyUnknown(edges, edge.ChannelID,
key[:]); err != nil {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To separate long lines like this, the first thing would be

err := putCha...
if err != nil {
    return err 
}

// connecting node. If the callback returns an error, then the iteration is
// halted with the error propagated back up to the caller.
//
// Unknown policies are passed into the callback as nil values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

incomingNode := toEdgePolicy.Node.PubKeyBytes[:]
fromEdgePolicy, err := fetchChanEdgePolicy(
edges, chanID, incomingNode, nodes,
// Using the incoming node, the incoming edge policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! 😀

@@ -2014,6 +2041,16 @@ func (c *ChannelEdgeInfo) BitcoinKey2() (*btcec.PublicKey, error) {
return key, nil
}

// OtherNodeKeyBytes returns the node key bytes of the other end of
// the channel.
func (c *ChannelEdgeInfo) OtherNodeKeyBytes(thisNodeKey []byte) [33]byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


return pureFee + timeLockPenalty
return int64(fee) + timeLockPenalty
}

// findPath attempts to find a path from the source node within the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add to the method description here that we search backwards now, and why 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

// that are not known to us yet in the distance map.
for vertex := range additionalEdges {
additionalEdgesWithSrc := make(map[Vertex][]*edgePolicyWithSource)
for vertex, outgoingEdgePolicies := range additionalEdges {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we probably shouldn't change ChannelEdgePolicy, as it directly maps to the wire messages being sent on the network. I'm fine with keeping this as is now, as I agree that this could be considered an implementation detail the caller shouldn't need to worry about.


// If the estimated band width of the channel edge is not able
// to carry the amount that needs to be send, return.
if bandwidth < amountToSend {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be checked earlier in processEdge?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up


// If the amountToSend is less than the minimum required amount,
// return.
if amountToSend < edge.MinHTLC {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, could check earlier for an early return.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up

expectedHops: []expectedHop{
{alias: "luoji", fwdAmount: 100000, fee: 0, timeLock: 101},
}},
// Basic route with one intermediate hop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now!

if err != nil {
return err
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, semantics change compared to master, but I think it is better like this. As you say, policies are not required for the auto pilot.

But the change become bigger than I liked, because of the lookup of LightningNode which was previously done inside LightningNode.ForEachChannel. Needed to gain access to db somehow.

I kept the changes apart in commit d258efe. Maybe there is a cleaner way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah this might not even work, since calling FetchLightningNode inside an already open DB transaction might cause problems.

I don't see an immediate way of achieving what we want without doing more changes to the database to make it possible to fetch the node without the edge policy. I'm okay with keeping your original implementation (skipping a nil-policy) with an added TODO saying that we ideally should support the case where the policy is nil.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, pulling out that commit.

@joostjager
Copy link
Collaborator Author

@halseth I think we have resolved all points now.

// with the node's public key and sends with the compact edge ID.
// For each chanID, there will be two entries within the bucket, as the
// graph is directed: nodes may have different policies w.r.t to fees
// for their respective directions. An unknown policy is represented as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

//
// maps: pubKey || edgeID -> edge policy for node
// maps: pubKey || chanID -> channel edge policy for node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I thought edgePolicies were stored in a bucket in addition to this. So I can understand why we were talking about different things now.. 😛 The final solution you came up with looks good to me though 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that explains it 😄

if edgeBytes := edges.Get(edgeKey[:]); edgeBytes != nil {
// An unknown policy value does not have a update time recorded, so
// it also does not need to be removed.
if edgeBytes := edges.Get(edgeKey[:]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be even safer to check !bytes.Equals(edgeBytes, unknownPolicy) (unknown policy wouldn't be required to be empty slice), where unknownPolicy is a constant defined as the empty slice.

@@ -2836,6 +2890,11 @@ func fetchChanEdgePolicy(edges *bolt.Bucket, chanID []byte,
return nil, ErrEdgeNotFound
}

// No need to deserialize unknown policy.
if len(edgeBytes) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, compare agains unknownPolicy constant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is addressed this PR can be squashed and rebased, and I think it should be good 👍

_, err := fetchChanEdgePolicy(edges,
channelID[:], keyBytes[:], nodes)

if err == ErrEdgeNotFound {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of other error we must return it such that the migration can be aborted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, fixed.

func migrateEdgePolicies(tx *bolt.Tx) error {
nodes := tx.Bucket(nodeBucket)
if nodes == nil {
return ErrGraphNotFound
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely return nil if the buckets are not found? We are probably then trying to migrate an empty DB.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed.

log.Tracef("Adding unknown edge policy present for node %x, channel %v",
keyBytes, channelId)

putChanEdgePolicyUnknown(edges, channelId, keyBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error must be handled here also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added.

@joostjager
Copy link
Collaborator Author

unknownPolicy constant restored as empty slice and rebased.

@joostjager
Copy link
Collaborator Author

Rebased after gofmt commits on master

The commit ensures that for every channel, there will always
be two entries in the edges bucket. If the policy from one or
both ends of the channel is unknown, it is marked as such.

This allows efficient lookup of incoming edges. This is
required for backwards payment path finding.
@joostjager
Copy link
Collaborator Author

Amended all commits with gofmt

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks mostly LGTM now, awesome work! Should get review from someone else before merge, however.

if edgeBytes := edges.Get(edgeKey[:]); edgeBytes != nil {
// An unknown policy value does not have a update time recorded, so
// it also does not need to be removed.
if edgeBytes := edges.Get(edgeKey[:]); edgeBytes != nil && !bytes.Equal(edgeBytes[:], unknownPolicy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split over two lines


var incomingPolicy, outgoingPolicy *ChannelEdgePolicy

if policy1 != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I suggested earlier to do it this way, but I didn't anticipate we would need these switch statements 😛

Could we instead do:

chanID := nodeEdge[33:]
edgeInfo, err := fetchChanEdgeInfo(edgeIndex, chanID)
    if err != nil {
        return err
    }

toEdgePolicy, err := fetchChanEdgePolicy(
    edges, chanID, nodeNub, nodes,
)
...

otherNode, err := edgeInfo.OtherNodeKeyBytes(nodePub)
if err != nil {
    return err
}

toEdgePolicy, err := fetchChanEdgePolicy(
    edges, chanID, otherNode, nodes,
)

(or just deserialize the toEdgePolicy directly if first checked if not unknownPolicy, as you suggested initially)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is better indeed than the ugly switches. Glad to hear to my initial creation was not so bad 😉 but I agree with you that readability and not duplicating code is important too.

@@ -2899,6 +2943,20 @@ func putChanEdgePolicy(edges *bolt.Bucket, edge *ChannelEdgePolicy, from, to []b
return edges.Put(edgeKey[:], b.Bytes()[:])
}

// putChanEdgePolicyUnknown marks the edge policy as unknown in the edges bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: both lines a bit long

@joostjager
Copy link
Collaborator Author

Review comments addressed

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! This PR includes quite a few nice clean ups w.r. to the testing code, and fixes a few legacy bugs related to incorrectly applying the fee limit during path finding.

The PR looks mostly good to me, however I noticed that in one area, it'll create a DB transaction inside of an existing one. Instead, it should carry over the same transaction as in the past, this has lead to deadlocks, and also creating a new transaction is an expensive activity. This is my only major comment in this PR.

I'll also move to start running this on a few nodes doing path finding, just to see if anything weird pops up.

if err != nil && err != ErrEdgeNotFound &&
err != ErrGraphNodeNotFound {
return err
policy1, policy2, err := fetchChanEdgePolicies(edgeIndex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgetting to check the error here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the edge info here, it's the value of the cursor seek, why fetch it again? Not blocking, more of a nit that code seems to have been unnecessarily changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially it was unchanged. I modified it after discussion with @halseth for readability (over performance), see comments above.

@@ -2824,6 +2864,20 @@ func putChanEdgePolicy(edges *bolt.Bucket, edge *ChannelEdgePolicy, from, to []b
return edges.Put(edgeKey[:], b.Bytes()[:])
}

// putChanEdgePolicyUnknown marks the edge policy as unknown in the edges bucket.
func putChanEdgePolicyUnknown(edges *bolt.Bucket, channelID uint64, from []byte) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


for _, testCase := range basicGraphPathFindingTests {
t.Run(testCase.target, func(subT *testing.T) {
testBasicGraphPathFindingCase(subT, graph, aliases, &testCase)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return fmt.Errorf("Unexpected node in policy")
}
}
incomingPolicy, err := fetchChanEdgePolicy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring the error here and above.

// If the edge has no time lock delta, the payment will always
// fail, so return.

// TODO(joostjager): Is this really true? Can't it be that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's just a dumb value to set, as it more or less guarantees that the node won't be able to properly handle a timed out HTLC, and will likely lose money.

@@ -684,13 +798,6 @@ func findPath(tx *bolt.Tx, graph *channeldb.ChannelGraph,
"too many hops")
}

// As our traversal of the prev map above walked backwards from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, no more reversal!

return err
}

// Lookup the full node details in order to be able to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit dangerous and could possibly lead to a deadlock: we're creating a new DB transaction inside of an existing db transaction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bolt allow concurrent readers, but I'm weary of making additional transactions inside of callbacks like this. We started to expose the transaction object in these methods to safely allow nested calls without creating new database transactions. We could possibly add methods onto the EdgeInfo struct, which accepts the existing transaction (or if nil creates a new one), in order to allow the caller to fetch the full LightningNode struct without creating a new database transaction.

@Roasbeef
Copy link
Member

Actually, I'll merge this as is, and then add a follow up commit to avoid the double DB transaction issue. Thanks for bearing with us through this review process! The final PR really turned out well!

@Roasbeef
Copy link
Member

Committed as 29b6bae!

@Roasbeef Roasbeef closed this Aug 10, 2018
@joostjager
Copy link
Collaborator Author

joostjager commented Aug 10, 2018

I am happy that this one has been merged now. Will work on the follow-up.

@joostjager
Copy link
Collaborator Author

Ah, I see you fixed the open issues yourselves. I was thinking of adding a tx parameter to FetchLightningNode, but this works too of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fees Related to the fees paid for transactions (both LN and funding/commitment transactions) first pass review done PR has had first pass of review, needs more tho needs rebase PR has merge conflicts needs testing PR hasn't yet been actively tested on testnet/mainnet P2 should be fixed if one has time routing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants