Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
routing, channeldb: several optimizations for path finding #3418
This PR brings down path finding time by using the channel cache and some other small optimizations. The goal was to bring the time down to ~10ms range. Using a benchmark, described below, it started at 220ms/72mb per path finding operation and with these changes it comes down to 9ms/1.1mb.
I've written a benchmark based on a dump of the main net graph (its a few months old now) in order to test these changes. The bench is roughly as follows:
A second bench, almost equal to the first one runs the nested benchmark in parallel to stress concurrency.
If you wish to run it, first uncomment the DB generation code and run the test with a ~20m timeout to let it generate the full DB. Then comment that again and run your benchmarks.
Bench @ master
At first glance path finding spent a lot of its time in three things:
Adding the channel cache brings down the time and allocations considerably:
This is mostly because it avoids hitting disk and allocating new copies of channels. The wire format parsing code seems to be heavy on small allocations, so avoiding that is also a huge win.
The next find was that the hot path has a log on trace level. That log creates a lot of small allocations regardless of the current log level. Adding a log level check around it avoids that for most configs around and removes quite a bit of allocations, also bringing down the runtime by 18%:
Path finding uses several structures that grow to hold references to a big part of the graph. Tweaking the capacity for those resulted in lower memory usage and some modest speed bump:
There were 3 structures used:
The last insight was that a function introduced to access the channel cache was returning a
Pull Request Checklist
joostjager left a comment
This is an awesome pr. It goes after a bottleneck that we've had for a long time. So long that it is actually the oldest currently open pr #379.
Especially on low powered devices, path finding consumes a very significant part of the total time required to complete the payment.
One question that I have is regarding increased memory usage. I don't mean allocation/deallocation or gc action, but just the requirements to keep the full channel set in memory. For path finding, a worst case scenario can be triggered by specifying a source node that isn't in the graph. The algorithm will then work backwards from the target to find the source, but it won't find it. When it terminates, it has explored nearly all edges.
If we already used channel cache in a way previously that ended with having all channels in memory, then it seems there is no new problem. But not sure if that is the case.
The overall memory usage should remain similar. The cache is limited to a number of channels set by a config value. If the number of channels exceeds that value, this code will still fallback to reading from the DB. If the graph is big in relation to the cache, the performance gains will be lower.
I'll address the coments in the next few days
Good changes again
Main question is what to do with the added
Final comment is about commit structure. In the end, these commits should be reorganized so that it becomes a logical stack. For example (some of these are already a separate commit now):
What you generally want to avoid in one pr is to make changes in a commit and change them again later. For example, initializing the map first with 10000 and later replacing it by
You could also consider making two prs. One with the internal path finding improvements that are pretty straight-forward to review and get approval on. Then a second one to carefully do the cache.
halseth left a comment
Awesome work on this PR! With the latest iteration, the size of the PR is very manageable, and the commit structure and messages makes it easy to follow and review!
To me this is pretty much good for merge, only had a few questions/suggestions. Also, would it make sense to add the performance benchmark to the codebase (as a new PR that we could merge before this change) so that we could make sure we don't have regressions in the future?