-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a bechmark for adding transactions to the mempool #4400
Add a bechmark for adding transactions to the mempool #4400
Conversation
a convenience function to create an override for the mempool capacity using the provided number bytes.
1ce84b2
to
ead040a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Ideally we would like to dump/publish those results somewhere and enable comparisons across changes, otherwise I am afraid we won't reap the benefits of this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some remarks!
EDIT: I've noticed that CI does not build yet build benchmark suites, so we should probably enable that
@@ -326,3 +326,45 @@ test-suite test-infra | |||
-Wredundant-constraints | |||
-Wmissing-export-lists | |||
-fno-ignore-asserts | |||
|
|||
benchmark bench-mempool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we plan on writing more benchmarks for other consensus components, would we want to have a single benchmark executable or one for each component that we target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about one per component? Why? To keep them isolated. But I don't know if the other approach as more advantages. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to having one per component. An advantage to one executable for all is that we can reuse code, but I don't know how often we will actually be in a situation where we can reuse code. I think one advantage of having multiple executables is that benchmark running times tend to be on the lengthy side, so not having one monolith executable that takes ages to run is probably a good thing.
-Wredundant-constraints | ||
-Wmissing-export-lists | ||
-Wno-unticked-promoted-constructors | ||
-fno-ignore-asserts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the code under benchmark contains relatively costly assertions, not ignoring assertions might inflate the results. This is just something to take into consideration, it is not something that should change now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! I missed that. Thanks!
- Added `mkCapacityBytesOverride`, a convenience function to create an override | ||
for the mempool capacity using the provided number bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide on present or past tense? I'm mainly using present tense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We have no consistency at the moment. I'll raise this with the team.
-- | Apply the payload to a ticked state directly to the payload dependent state | ||
-- portion of it, leaving the rest of the input ticked state unaltered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- | Apply the payload to a ticked state directly to the payload dependent state | |
-- portion of it, leaving the rest of the input ticked state unaltered. | |
-- | Apply the payload directly to the payload dependent state | |
-- portion of a ticked state , leaving the rest of the input ticked state unaltered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much simpler ❤️ Thanks!
(Ticked (LedgerState (TestBlockWith ptype))) | ||
applyDirectlyToPayloadDependentState (TickedTestLedger st) tx = do | ||
payloadDepSt' <- applyPayload (payloadDependentState st) tx | ||
pure $ TickedTestLedger $ st { payloadDependentState = payloadDepSt' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pure $ TickedTestLedger $ st { payloadDependentState = payloadDepSt' } | |
pure $ TickedTestLedger $ st { payloadDependentState = payloadDepSt' } | |
length mempoolTxs @?= n | ||
] | ||
where | ||
benchAddNTxs n = bench (show n) $ nfIO $ addNTxs' $ mkNTryAddTxs n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cost of adding the txs is what you want to benchmark, and not opening the mocked mempool or creating the mocked transactions you want to add, I would suggest using env
/envWithCleanup
to do that work before running the benchmark, and then only calling the function-under-bench with those inputs. The inputs will by then already have been forced to normal form by env
, such that the benchmark only measures the time spent adding the txs. Using env
might not be strictly useful if the setup is much cheaper than adding the txs, but I don't think it hurts to use env
regardless
It would look something like:
benchAddNTxs n = bench (show n) $ nfIO $ addNTxs' $ mkNTryAddTxs n | |
benchAddNTxs n = env (doSetup n) $ \(mempool, txs) -> | |
bench (show n) $ nfIO $ addNTxs' mempool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! I overlooked this. Thank you 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, env
's documentation recommends using withResource
, which does not seem to perform any normal form evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generating transactions looked relatively cheap, so maybe opening the mempool was the costly part
[ testCase "Transactions are added" $ do | ||
let | ||
n = 1000 | ||
txs = mkNTryAddTxs 1000 | ||
mempool <- addNTxs txs | ||
mempoolTxs <- getTxs mempool | ||
mempoolTxs @?= txsAddedInCmds txs | ||
length mempoolTxs @?= n | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure, I would run this test for each of the n
you use in the benchmarks above here, e.g., by moving the test into benchAddNTxs
. Each benchmark then has a sanity check
Another addition would be to add a regular quickcheck test that runs the test for arbitrary values of n
, such that we're also leveraging property testing in addition to sanity checks for the benchmarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. In this case, I wanted to avoid the cost of having to test each of the values of n
. Eventually, we should test the interpretation of commands by means of the mempool property tests, but I think it makes sense to test each case.
-------------------------------------------------------------------------------} | ||
|
||
addNTxs' :: [MempoolCmd TestBlock] -> IO () | ||
addNTxs' = void . addNTxs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't check now because the module containing MemppolWithMockedLedgerItf
is not available on the branch, but could the void
here be preventing some evaluation inside the mempool that we would want to force?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this function in favour of the new approach using env
.
ead040a
to
3ecdba7
Compare
ouroboros-consensus-test/bench-mempool/Bench/Consensus/MempoolWithMockedLedgerItf.hs
Show resolved
Hide resolved
|
||
import Ouroboros.Consensus.Mempool (Mempool) | ||
import qualified Ouroboros.Consensus.Mempool as Mempool | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Do you have any references I could look into to see how this is done? We can do this in a following step. |
We can definitely do this in a following step. The way I do it is usually quite simple: generate some markdown file containing your data, publish it as an artifact as part of the CI, then download it somewhere else (eg. in the docs) for publication. |
bors r+ |
Part of #4382
Checklist
changelog.d
directory created usingscriv
. If in doubt, see the Consensus release process.interface-CHANGELOG.md