Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing, channeldb: several optimizations for path finding #3418

Merged
merged 6 commits into from Oct 25, 2019

Conversation

@champo
Copy link
Contributor

@champo champo commented Aug 20, 2019

This PR brings down path finding time by using the channel cache and some other small optimizations. The goal was to bring the time down to ~10ms range. Using a benchmark, described below, it started at 220ms/72mb per path finding operation and with these changes it comes down to 9ms/1.1mb.

Benchmarking

I've written a benchmark based on a dump of the main net graph (its a few months old now) in order to test these changes. The bench is roughly as follows:

  • The first time only, a DB is built using the graph dump JSON
  • The DB is used for a simple path finding run to prime the channel cache
  • Path finding runs in a nested benchmark as many times as go test sees fit

A second bench, almost equal to the first one runs the nested benchmark in parallel to stress concurrency.

If you wish to run it, first uncomment the DB generation code and run the test with a ~20m timeout to let it generate the full DB. Then comment that again and run your benchmarks.

Findings

Bench @ master

BenchmarkBigGraphCachePerformance/findPath-4         	     100	 203460951 ns/op	72351182 B/op	 1459914 allocs/op
BenchmarkBigGraphCacheConcurrency/findPath-4         	      50	 204061673 ns/op	72480387 B/op	 1460464 allocs/op

At first glance path finding spent a lot of its time in three things:

  • Reading from the DB
  • Parsing the contents read
  • GC'ing

Adding the channel cache brings down the time and allocations considerably:

BenchmarkBigGraphCachePerformance/findPath-4         	     500	  27721522 ns/op	 9830663 B/op	  155751 allocs/op
BenchmarkBigGraphCacheConcurrency/findPath-4         	     200	  65236737 ns/op	 9910162 B/op	  156245 allocs/op

This is mostly because it avoids hitting disk and allocating new copies of channels. The wire format parsing code seems to be heavy on small allocations, so avoiding that is also a huge win.

The next find was that the hot path has a log on trace level. That log creates a lot of small allocations regardless of the current log level. Adding a log level check around it avoids that for most configs around and removes quite a bit of allocations, also bringing down the runtime by 18%:

BenchmarkBigGraphCachePerformance/findPath-4         	    1000	  22749674 ns/op	 7599558 B/op	   97047 allocs/op
BenchmarkBigGraphCacheConcurrency/findPath-4         	     200	  59610474 ns/op	 7676371 B/op	   97534 allocs/op

The distance map was being pre-populated with all nodes, consuming about 20% of the path finding time. Turns out the code used this fact in only one place to check if the candidate distance was better than the existing one. Changing that code to check for existance first let us remove the pre-population:

BenchmarkBigGraphCachePerformance/findPath-4         	    1000	  13426902 ns/op	 4516310 B/op	   19026 allocs/op
BenchmarkBigGraphCacheConcurrency/findPath-4         	     200	  62240894 ns/op	 4592095 B/op	   19509 allocs/op

Path finding uses several structures that grow to hold references to a big part of the graph. Tweaking the capacity for those resulted in lower memory usage and some modest speed bump:

BenchmarkBigGraphCachePerformance/findPath-4         	    1000	  12426815 ns/op	 3432225 B/op	   18771 allocs/op
BenchmarkBigGraphCacheConcurrency/findPath-4         	     300	  55189725 ns/op	 3507752 B/op	   19248 allocs/op

There were 3 structures used: distance, next and distanceHeap. distance and next could be merged by adding a nextHop pointer to nodeWithDist this is updated whenever a cheaper path is found. Also, nodeWithDist is a big struct now and distanceHeap held it by value, changing it to use a pointer removed a lot of copying work. Both these changes reduced memory usage by about half and runtime by 20%.

BenchmarkBigGraphCachePerformance/findPath-4         	    2000	  10203189 ns/op	 1639554 B/op	   18759 allocs/op
BenchmarkBigGraphCacheConcurrency/findPath-4         	     300	  55169048 ns/op	 1716604 B/op	   19259 allocs/op

The last insight was that a function introduced to access the channel cache was returning a *ChannelEdge. That struct is small and the pointer was introducing an allocation that added up since its part of the hot path. Changing that function to return by value reduced time by 10% and memory by 39%.

BenchmarkBigGraphCachePerformance/findPath-4         	    2000	   9136682 ns/op	 1171369 B/op	    4083 allocs/op
BenchmarkBigGraphCacheConcurrency/findPath-4         	     300	  50998898 ns/op	 1247789 B/op	    4579 allocs/op

Pull Request Checklist

  • If this is your first time contributing, we recommend you read the Code
    Contribution Guidelines
  • All changes are Go version 1.12 compliant
  • The code being submitted is commented according to Code Documentation and Commenting
  • For new code: Code is accompanied by tests which exercise both
    the positive and negative (error paths) conditions (if applicable)
  • For bug fixes: Code is accompanied by new tests which trigger
    the bug being fixed to prevent regressions
  • Any new logging statements use an appropriate subsystem and
    logging level
  • Code has been formatted with go fmt
  • For code and documentation: lines are wrapped at 80 characters
    (the tab character should be counted as 8 characters, not 4, as some IDEs do
    per default)
  • Running make check does not fail any tests
  • Running go vet does not report any issues
  • Running make lint does not report any new issues that did not
    already exist
  • All commits build properly and pass tests. Only in exceptional
    cases it can be justifiable to violate this condition. In that case, the
    reason should be stated in the commit message.
  • Commits have a logical structure according to Ideal Git Commit Structure
@Roasbeef Roasbeef removed their request for review Aug 20, 2019
Copy link
Collaborator

@joostjager joostjager left a comment

This is an awesome pr. It goes after a bottleneck that we've had for a long time. So long that it is actually the oldest currently open pr #379.

Especially on low powered devices, path finding consumes a very significant part of the total time required to complete the payment.

One question that I have is regarding increased memory usage. I don't mean allocation/deallocation or gc action, but just the requirements to keep the full channel set in memory. For path finding, a worst case scenario can be triggered by specifying a source node that isn't in the graph. The algorithm will then work backwards from the target to find the source, but it won't find it. When it terminates, it has explored nearly all edges.

If we already used channel cache in a way previously that ended with having all channels in memory, then it seems there is no new problem. But not sure if that is the case.

Loading

channeldb/graph.go Outdated Show resolved Hide resolved
Loading
channeldb/graph.go Outdated Show resolved Hide resolved
Loading
channeldb/graph.go Outdated Show resolved Hide resolved
Loading
channeldb/graph.go Outdated Show resolved Hide resolved
Loading
channeldb/graph.go Outdated Show resolved Hide resolved
Loading
routing/pathfind.go Outdated Show resolved Hide resolved
Loading
routing/pathfind.go Outdated Show resolved Hide resolved
Loading
log.Tracef("path finding probability: fromnode=%v, tonode=%v, "+
"probability=%v", fromVertex, toNode, edgeProbability)
if log.Level() <= btclog.LevelTrace {
log.Tracef("path finding probability: fromnode=%v, tonode=%v, "+
Copy link
Collaborator

@joostjager joostjager Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea why this has so much impact? It seems that the first thing log.Tracef does internally is checking the log level like you do here.

Loading

Copy link
Contributor Author

@champo champo Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't got an exact answer. The working theory is that it generates some allocations to hold the var args that are too small to care most of the time but that it adds up because it's called a lot of times per findPath

Loading

Copy link
Collaborator

@joostjager joostjager Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be easy to test by adding a test function locally in pathfind.go with var args and call that instead of log.Tracef. You could then also modify it to take four (non var) args and see the difference.

Loading

Copy link
Contributor Author

@champo champo Aug 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely some overhead due to var args. The fixed arguments version had the same performance as the if. However, the var arg version had the same performance as calling log.Tracef directly.

Loading

Copy link
Collaborator

@joostjager joostjager Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok interesting. Maybe you could use a logClosure here to prevent the if statement.

Loading

Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the closure is that it is only called when the log level is high enough and only then the performance hit is incurred.

log.Debug(newLogClosure(func() string {
		return fmt.Sprintf("%v %v %v", a, b, c)
	}))

Loading

Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps add a comment here explaining for findings around the var args. So that that experiment is captured for future reference.

Loading

Copy link
Collaborator

@halseth halseth Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer using a logClosure, if that works

Loading

Copy link
Contributor Author

@champo champo Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested log closure and it adds about 1MB of extra allocations and a couple ms of extra runtime. If it is ok with you, I'd keep the if with an explicative comment.

Loading

Copy link
Collaborator

@joostjager joostjager Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @halseth offline. For consistency, we prefer the log closure. We can re-evaluate the hot path once the cache changes are in, see how much more significant this becomes.

Loading

routing/pathfind.go Show resolved Hide resolved
Loading
@champo
Copy link
Contributor Author

@champo champo commented Aug 23, 2019

The overall memory usage should remain similar. The cache is limited to a number of channels set by a config value. If the number of channels exceeds that value, this code will still fallback to reading from the DB. If the graph is big in relation to the cache, the performance gains will be lower.

I'll address the coments in the next few days 🙂

Loading

@champo champo force-pushed the routing_is_fast_now branch from f77e4e1 to 4a689a5 Aug 24, 2019
@champo
Copy link
Contributor Author

@champo champo commented Aug 25, 2019

@joostjager updated with the feedback and rebasing against latest master. I've fixed a bug along the way relating to edge policies with missing nodes. Lastly, a few smaller changes to bring memory usage down by about 1mb.

Loading

@champo
Copy link
Contributor Author

@champo champo commented Aug 25, 2019

Added 4 new changes bringing down the time to 9ms and memory usage to 1.1mb. This is the last of the updated I'll send, I've run out of ideas I can easily implement.

Loading

channeldb/graph.go Outdated Show resolved Hide resolved
Loading
channeldb/graph.go Outdated Show resolved Hide resolved
Loading
log.Tracef("path finding probability: fromnode=%v, tonode=%v, "+
"probability=%v", fromVertex, toNode, edgeProbability)
if log.Level() <= btclog.LevelTrace {
log.Tracef("path finding probability: fromnode=%v, tonode=%v, "+
Copy link
Collaborator

@joostjager joostjager Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok interesting. Maybe you could use a logClosure here to prevent the if statement.

Loading

currentNode := source
for currentNode != target { // TODO(roasbeef): assumes no cycles
// Determine the next hop forward using the next map.
nextNode := next[currentNode]
nextNode, ok := distance[currentNode]
Copy link
Collaborator

@joostjager joostjager Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change var name nextNode?

Loading

Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean change to currentNodeDist or something

Loading

channeldb/graph.go Outdated Show resolved Hide resolved
Loading
@joostjager
Copy link
Collaborator

@joostjager joostjager commented Aug 26, 2019

Good changes again 👍

Main question is what to do with the added if node == nil and if db == nil code. I think that with the right design this should be avoidable, but it probably requires more changes. On the other hand, building on top of something that shouldn't be there in the first place defers paying those costs to the future.

Final comment is about commit structure. In the end, these commits should be reorganized so that it becomes a logical stack. For example (some of these are already a separate commit now):

  • all the cache changes in graph.go can be done first in one or multiple commits first.
  • add NumNodes to graph.go and pre-allocate map
  • merging nextNode into the path finding heap
  • change to nodeWithDist pointers
  • remove unused k-shortest code
  • remove distance map infinity initialization
  • fix log performance

What you generally want to avoid in one pr is to make changes in a commit and change them again later. For example, initializing the map first with 10000 and later replacing it by NumNodes. I find it easier if those two are just squashed together and there is one commit that presents the final changes in a particular area.

You could also consider making two prs. One with the internal path finding improvements that are pretty straight-forward to review and get approval on. Then a second one to carefully do the cache.

Loading

@champo
Copy link
Contributor Author

@champo champo commented Aug 28, 2019

I'll update this PR to leave the channel cache out and send another one for the cache. It will take me a few days to investigate the comments on the cache.

Thanks for the feedback!

Loading

Copy link
Collaborator

@joostjager joostjager left a comment

Very pleasant to review with the updated commit structure. And I think it is good to get these changes in first.

Loading

log.Tracef("path finding probability: fromnode=%v, tonode=%v, "+
"probability=%v", fromVertex, toNode, edgeProbability)
if log.Level() <= btclog.LevelTrace {
log.Tracef("path finding probability: fromnode=%v, tonode=%v, "+
Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps add a comment here explaining for findings around the var args. So that that experiment is captured for future reference.

Loading

routing/pathfind.go Show resolved Hide resolved
Loading
@@ -190,6 +190,36 @@ func (c *ChannelGraph) Database() *DB {
return c.db
}

// NumNodes returns the number of nodes currently in the graph
func (c *ChannelGraph) NumNodes(tx *bbolt.Tx) (int, error) {

Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove newline

Loading

return nodes.ForEach(func(pubKey, _ []byte) error {
if len(pubKey) == 33 {
result += 1
}
Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there also something with a different length in this bucket?

Loading

Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result++

Loading

Copy link
Contributor Author

@champo champo Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the if from

if bytes.Equal(pubKey, sourceKey) || len(pubKey) != 33 {

The first impl for this didnt have it and reported about ~1000 extra "nodes" that weren't listed by ForEachNode

Loading

Copy link
Collaborator

@joostjager joostjager Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. Is this an old db where there are left-overs from something?

Loading

Copy link
Contributor Author

@champo champo Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there shouldn't be? The DB was generated using the path finding test harness code a few months ago.

Loading

Copy link
Collaborator

@joostjager joostjager Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you find out what causes those non-33 byte keys to be written? I'd be hesitant to add the extra condition without fully understanding why.

Loading

Copy link
Contributor Author

@champo champo Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only point I found is SetSourceNode uses a hardcoded "source" as key in this bucket. I can't tell if there's any other place.

Loading

channeldb/graph.go Outdated Show resolved Hide resolved
Loading
distHeap := distanceHeap{
pubkeyIndices: make(map[route.Vertex]int),
pubkeyIndices: make(map[route.Vertex]int, numNodes),
Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want to set the capacity of the distanceHeap.nodes slice?

Loading

Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numNodes is really an upper bound for the size of the heap. It contains only the active set, the frontier of what path finding is exploring. Worst case the frontier can be all nodes (one center node with all other nodes connected to it), but it will be less typically. I don't think it is problem and also wouldn't know how to get a better estimated of the required size for the heap. But maybe this is something that can be explained (if the reasoning is correct) in a comment.

Loading

routing/pathfind.go Outdated Show resolved Hide resolved
Loading
routing/pathfind.go Outdated Show resolved Hide resolved
Loading
currentNode := source
for currentNode != target { // TODO(roasbeef): assumes no cycles
// Determine the next hop forward using the next map.
nextNode := next[currentNode]
nextNode, ok := distance[currentNode]
Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean change to currentNodeDist or something

Loading

@@ -588,6 +587,7 @@ func findPath(g *graphParams, r *RestrictParams, cfg *PathFindingConfig,
// from the heap.
partialPath := heap.Pop(&nodeHeap).(*nodeWithDist)
pivot := partialPath.node
pivotWithDist := distance[pivot]
Copy link
Collaborator

@joostjager joostjager Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again pointing to something not right with what we store in the heap.

Loading

Copy link
Collaborator

@halseth halseth left a comment

Cool PR :)

Loading

log.Tracef("path finding probability: fromnode=%v, tonode=%v, "+
"probability=%v", fromVertex, toNode, edgeProbability)
if log.Level() <= btclog.LevelTrace {
log.Tracef("path finding probability: fromnode=%v, tonode=%v, "+
Copy link
Collaborator

@halseth halseth Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer using a logClosure, if that works

Loading

// for the node set with a distance of "infinity". graph.ForEachNode
// also returns the source node, so there is no need to add the source
// node explicitly.
// Holds the current best distance for a given node
Copy link
Collaborator

@halseth halseth Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: punctation

Loading

channeldb/graph.go Outdated Show resolved Hide resolved
Loading
// First we'll initialize an empty heap which'll help us to quickly
// locate the next edge we should visit next during our graph
// traversal.
nodeHeap := newDistanceHeap(numNodes)
Copy link
Collaborator

@halseth halseth Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of fetching the exact number of nodes in the db, could we make an educated constant guess on this number? Avoids the extra db code, and I don't think it matters since we don't know if it will be filled anyway.

Loading

Copy link
Collaborator

@joostjager joostjager Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what way of making that guess do you have in mind?

Loading

Copy link
Contributor Author

@champo champo Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, counting from the DB caused little to no impact in the benchmarks. Since it's only using the index it's most likely already cached in memory. I tried to use some boltd guesstimates but they were really far from the real number (like number of leafs).

Loading

Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous commit removed the graph traversal, and this reeintroduces it. This is probably less expensive, since we only count, but I wonder if we would get the same effect from just setting numNodes = 1000 for instance.

Loading

Copy link
Collaborator

@joostjager joostjager Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @halseth offline. Doubts remain whether the perf. improvement is worth the extra db scan through all nodes. Maybe a good middle ground is to add a reasonable constant now. For example 10000 nodes. This is little extra memory, because it is only a list of pointers, and it should do the job too. Then, when the cache is being used, we can possibly take a count from the cache. That way, we'll have a better estimate after a few runs.

Loading

Copy link
Contributor Author

@champo champo Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add a constant for this. The only real gain with having this call is a bit of "future proofing".

Loading

@@ -666,7 +671,7 @@ func findPath(g *graphParams, r *RestrictParams, cfg *PathFindingConfig,

// Use the nextHop map to unravel the forward path from source to
// target.
pathEdges := make([]*channeldb.ChannelEdgePolicy, 0, len(next))
pathEdges := make([]*channeldb.ChannelEdgePolicy, 0)
Copy link
Collaborator

@halseth halseth Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shy this change?

Loading

Copy link
Contributor Author

@champo champo Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you're pointing at

Loading

Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*why

I was wondering what removing the cap size achieves.

Loading

Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just do var pathedges []*channeldb.ChannelEdgePolicy perhaps?

Loading

Copy link
Contributor Author

@champo champo Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the thing here is that next doesnt exist anymore, and it's not easy to find out the length of the path with the current structures. Considering a path will probably be a between 1-10 hops, pre-setting the capacity didn't make much sense. I'll update changing the style.

Loading

@champo champo force-pushed the routing_is_fast_now branch from 930a9b8 to 03d19a7 Sep 6, 2019
Copy link
Contributor Author

@champo champo left a comment

I've updated the PR addresing your feedback

Loading

return nodes.ForEach(func(pubKey, _ []byte) error {
if len(pubKey) == 33 {
result += 1
}
Copy link
Contributor Author

@champo champo Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the if from

if bytes.Equal(pubKey, sourceKey) || len(pubKey) != 33 {

The first impl for this didnt have it and reported about ~1000 extra "nodes" that weren't listed by ForEachNode

Loading

routing/pathfind.go Outdated Show resolved Hide resolved
Loading
@@ -666,7 +671,7 @@ func findPath(g *graphParams, r *RestrictParams, cfg *PathFindingConfig,

// Use the nextHop map to unravel the forward path from source to
// target.
pathEdges := make([]*channeldb.ChannelEdgePolicy, 0, len(next))
pathEdges := make([]*channeldb.ChannelEdgePolicy, 0)
Copy link
Contributor Author

@champo champo Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you're pointing at

Loading

routing/pathfind.go Outdated Show resolved Hide resolved
Loading
channeldb/graph.go Outdated Show resolved Hide resolved
Loading
Copy link
Collaborator

@halseth halseth left a comment

Awesome work on this PR! With the latest iteration, the size of the PR is very manageable, and the commit structure and messages makes it easy to follow and review! 👍

To me this is pretty much good for merge, only had a few questions/suggestions. Also, would it make sense to add the performance benchmark to the codebase (as a new PR that we could merge before this change) so that we could make sure we don't have regressions in the future?

Loading

@@ -563,7 +543,7 @@ func findPath(g *graphParams, r *RestrictParams, cfg *PathFindingConfig,
// route, return. It is important to also return if the distance
// is equal, because otherwise the algorithm could run into an
// endless loop.
if tempDist >= distance[fromVertex].dist {
if current, ok := distance[fromVertex]; ok && tempDist >= current.dist {
Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: split over two lines

Loading

if tx != nil {
err = count(tx)
} else {
err = c.db.View(count)
Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this codepath ever taken?

Loading

Copy link
Contributor Author

@champo champo Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I followed the pattern in other methods in this file. In any case, when I update to use a hardcoded node count this code will be removed.

Loading

// First we'll initialize an empty heap which'll help us to quickly
// locate the next edge we should visit next during our graph
// traversal.
nodeHeap := newDistanceHeap(numNodes)
Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous commit removed the graph traversal, and this reeintroduces it. This is probably less expensive, since we only count, but I wonder if we would get the same effect from just setting numNodes = 1000 for instance.

Loading

@@ -666,7 +671,7 @@ func findPath(g *graphParams, r *RestrictParams, cfg *PathFindingConfig,

// Use the nextHop map to unravel the forward path from source to
// target.
pathEdges := make([]*channeldb.ChannelEdgePolicy, 0, len(next))
pathEdges := make([]*channeldb.ChannelEdgePolicy, 0)
Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*why

I was wondering what removing the cap size achieves.

Loading

routing/heap.go Show resolved Hide resolved
Loading
routing/pathfind.go Show resolved Hide resolved
Loading
nextNode := next[currentNode]
currentNodeWithDist, ok := distance[currentNode]

if !ok || currentNodeWithDist.nextHop == nil {
Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the currentNodeWithDist.nextHop == nil case expected to ever happen? Might be worth returning a different error in that case.

Loading

Copy link
Contributor Author

@champo champo Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually, no! It was expected before removing the distance prepopulation. What are you thinking in terms of different error? It's probably a programming error.

Loading

Copy link
Collaborator

@halseth halseth Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is always sent when adding nodes to the distance map, we can just assume it is there I think

Loading

@@ -587,6 +586,7 @@ func findPath(g *graphParams, r *RestrictParams, cfg *PathFindingConfig,
// from the heap.
partialPath := heap.Pop(&nodeHeap).(*nodeWithDist)
pivot := partialPath.node
pivotWithDist := distance[pivot]
Copy link
Collaborator

@halseth halseth Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from partialPath?

Loading

Copy link
Contributor Author

@champo champo Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not, removed!

Loading

@halseth
Copy link
Collaborator

@halseth halseth commented Sep 10, 2019

Also, would it make sense to add the performance benchmark to the codebase (as a new PR that we could merge before this change) so that we could make sure we don't have regressions in the future?

I wouldn't consider this blocking.

Loading

@champo champo force-pushed the routing_is_fast_now branch from 03d19a7 to 43ea8a2 Sep 15, 2019
Copy link
Contributor Author

@champo champo left a comment

@halseth the benchmark I have needs a pre-built DB that weights about 20mb. Some aspects of the perf can be measured without so many nodes.

I've updated with the requested changes :)

Loading

@@ -562,7 +542,9 @@ func findPath(g *graphParams, r *RestrictParams, cfg *PathFindingConfig,
// route, return. It is important to also return if the distance
// is equal, because otherwise the algorithm could run into an
// endless loop.
if tempDist >= distance[fromVertex].dist {
if current, ok := distance[fromVertex];
Copy link
Collaborator

@halseth halseth Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead do

current, ok := ...
if ok && tempDist ...

Loading


// estimatedNodeCount is used to preallocate the path finding structures
// to avoid resizing and copies. It should be number on the same order as
// the number of active noeds in the network.
Copy link
Collaborator

@halseth halseth Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: noeds -> nodes

Loading

routing/heap.go Show resolved Hide resolved
Loading
nextNode := next[currentNode]
currentNodeWithDist, ok := distance[currentNode]

if !ok || currentNodeWithDist.nextHop == nil {
Copy link
Collaborator

@halseth halseth Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is always sent when adding nodes to the distance map, we can just assume it is there I think

Loading

@halseth
Copy link
Collaborator

@halseth halseth commented Oct 9, 2019

@champo This is close! Do you want to make the last fixups? :)

Loading

@champo
Copy link
Contributor Author

@champo champo commented Oct 9, 2019

@halseth I'll work on it this weekend :D

Loading

@champo champo force-pushed the routing_is_fast_now branch from 43ea8a2 to f118b91 Oct 12, 2019
@champo
Copy link
Contributor Author

@champo champo commented Oct 12, 2019

@halseth done!

Loading

Copy link
Collaborator

@halseth halseth left a comment

This LGTM now, pending the last comments about the estimatedNodeCount constant 👍

Loading

// estimatedNodeCount is used to preallocate the path finding structures
// to avoid resizing and copies. It should be number on the same order as
// the number of active nodes in the network.
estimatedNodeCount = 0
Copy link
Collaborator

@halseth halseth Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set this to an actual number. Worst case it will be the number of reachable nodes on the network, so maybe 10k or something should be good?

Loading

Copy link
Collaborator

@halseth halseth Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also name it heapSize?

Loading

Copy link
Collaborator

@joostjager joostjager Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick experiment with this again on testnet. Set the source node in the queryroutes call to something that doesn't exist, so that it will explore a relatively large portion of the graph. Result:

2019-10-23 12:03:04.616 [DBG] CRTR: Pathfinding perf metrics debug: distance_size=690, heap_key_map_size=372, heap_slice_size=372
2019-10-23 12:03:04.616 [DBG] CRTR: Pathfinding perf metrics: nodes=690, edges=4841, time=86.268991ms

It seems that the two slices and map don't really grow that large in comparison to the number of edges that are explored. Also, I didn't see much difference in execution time.

We've been discussing setting this upper bound for a while now. There isn't a natural default for it and querying from the db is slow. We could make it adaptive based on the previous run, but in the end I doubt whether it all matters much. There is also still that other slice in the heap that isn't pre-sized in the current version of this PR.

I wouldn't mind dropping this commit really, considering what we know now.

Loading

Copy link
Collaborator

@halseth halseth Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that mainnet is much larger, do you get the same results there?

Loading

Copy link
Contributor Author

@champo champo Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I tested all changes using a (old) snapshot of mainnet with ~6000 nodes and these changes had measurable impact. Though I tested it with the channel cache changeset applied so smaller changes were more noticeable.

Loading

Copy link
Collaborator

@joostjager joostjager Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then also preallocated the slice in the heap and set the constant to 10k?

Loading

@halseth
Copy link
Collaborator

@halseth halseth commented Oct 23, 2019

Also, linter is failing (run make lint).

Loading

@halseth halseth added this to the 0.9.0 milestone Oct 23, 2019
@halseth halseth added this to Needs Review in v0.9.0-beta Oct 23, 2019
@champo champo force-pushed the routing_is_fast_now branch 2 times, most recently from 0bd776c to 4d5750d Oct 23, 2019
@champo
Copy link
Contributor Author

@champo champo commented Oct 23, 2019

@halseth @joostjager updated with the estimated node count changes and linter fixes :)

Loading

Copy link
Collaborator

@halseth halseth left a comment

Excellent set of changes, and thanks for sticking through the whole review process! I think the result turned out great. LGTM 👍

Loading

// target.
var pathEdges []*channeldb.ChannelEdgePolicy
currentNode := source
for currentNode != target { // TODO(roasbeef): assumes no cycles
// Determine the next hop forward using the next map.
nextNode := next[currentNode]
currentNodeWithDist, ok := distance[currentNode]

Copy link
Collaborator

@halseth halseth Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove newline

Loading

currentNodeWithDist, ok := distance[currentNode]

if !ok {
// If the node doesnt have a next hop it means we didn't find a path
Copy link
Collaborator

@halseth halseth Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing period.

Loading