-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace std's unmaintained bench with criterion #2235
Replace std's unmaintained bench with criterion #2235
Conversation
Tagging 116 as this is the first step towards a large scorer refactor which I think we should prioritize as it substantially improves success rates. |
a9f44aa
to
b08541f
Compare
Oh this is nice. Actually was unrelatedly looking at criterion. I believe it’s also trivial to generate graphs of benchmark changes with it (or maybe that needs a separate crate locally). Anyway, concept ACK. |
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #2235 +/- ##
==========================================
+ Coverage 90.91% 91.53% +0.61%
==========================================
Files 104 104
Lines 52760 52034 -726
Branches 52760 52034 -726
==========================================
- Hits 47969 47628 -341
+ Misses 4791 4406 -385
☔ View full report in Codecov by Sentry. |
b08541f
to
51d7777
Compare
Addressed comments originally on #2176. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done some pretty extensive test runs (beyond those of last week) with this and not running into any issues. The stats are pretty handy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, generally looks good after first pass!
use crate::ln::features::InvoiceFeatures; | ||
use crate::routing::gossip::NetworkGraph; | ||
use crate::util::config::UserConfig; | ||
use crate::util::ser::ReadableArgs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this shadows the ReadableArgs
in L. 5660, which hence unused in cargo test
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're in separate modules, but the fixup i pushed to switch to the other util makes the one on L5660 uneccessary now.
51d7777
to
3debd6a
Compare
LGTM and works for me. Happy for squash |
Let's land #2237 first cause it'll probably conflict a good bit and I'll take the rebase hit. |
There's a few route tests which do the same thing as the benchmarks as they're also a good test. However, they didn't share code, which is somewhat wasteful, so we fix that here.
When benchmarking our router, we previously only ever tested with amounts under 1,000 sats, which is an incredibly small amount. While this ensures we have the maximal number of available channels to consider, it prevents our scorer from getting exercise across its range. Further, we only score the immediate path we are expecting to to send over, and not randomly but rather based on the amount sent. Here we try to make the benchmarks a bit more realistic by adding a new benchmark which attempts to send around 100K sats, which is a reasonable amount to send over a channel today. We also convert the scoring data to be randomized based on the seed as well as attempt to (possibly) find a new route for a much larger value and score based on that. This potentially allows us to score multiple potential paths between the source and destination as the large route-find may return an MPP result.
Rather than using the std benchmark framework (which isn't maintained and is unlikely to get any further maintenance), we swap for criterion, which at least gets us a variable number of test runs so our benchmarks don't take forever. We also fix the RGS benchmark to pass now that the file in use is stale compared to today's date.
3debd6a
to
4b27cc4
Compare
Squashed and rebased, more mechanical changes but it changed the patchset a good bit. |
No lol, next week is fine, we just are starting to pile up the PRs so trying to move things forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reACK 4b27cc4
LGTM. Re-reviewed and no objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Rather than using the std benchmark framework (which isn't
maintained and is unlikely to get any further maintenance), we swap
for criterion, which at least gets us a variable number of test
runs so our benchmarks don't take forever.
We also fix the RGS benchmark to pass now that the file in use is
stale compared to today's date.