-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: draft MicroReadTVarIO.hs microbenchmark #513
Conversation
My next step here would be another microbenchmark that is more similar to the real code's scenario (notably: indirecting through Edit: see next message. |
c665ff7
to
6dca85f
Compare
I favored my momentum, so I added |
e041297
to
84c7964
Compare
I tidied it up, esp by separating out the bash scripts. I also fixed the final timing estimates, since I had originally ran them with |
ed29a44
to
d927842
Compare
d927842
to
5d4a150
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move these benchmarks to another repo, like ouroboros-consensus-tools
?
# Introduction | ||
|
||
The goal of the microbenchmarks in this directory is to demystify the consequences of replacing an `atomically . readTVar` with `readTVarIO`. | ||
In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think we need to replace the GH link.
- I wouldn't include the Slack link. I understand we need to keep track of things internally, but perhaps we could move the entire micro-benchmark experiment to a separate repository (where we know the consensus team will be the only likely contributors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted on the call. I'll make it more explicit that I'm summarizing the thread here.
# Introduction | ||
|
||
The goal of the microbenchmarks in this directory is to demystify the consequences of replacing an `atomically . readTVar` with `readTVarIO`. | ||
In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1). | |
In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compared to the baseline of `8.5.0-pre` (circa 2023 Nov 1). |
|
||
# Simple Case | ||
|
||
In the simplest possible case, the `atomically . readTVar` variant costs about 20 nanoseconds more to call (21.7 nanoseconds unoptimized, 1.37 nanoseconds optimized) than does the equivalent `readTVarIO`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a reference to this? I mean, where does this information come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are the results of me running the benchmark defined in this PR.
|
||
The system-level benchmarks are generally quite reproducible, as evidenced by occasional variance analaysis runs (repeats of the same run showing final stats within 1 millisecond, eg). | ||
So it seems likely that there is in fact a savings on the order of 10%. | ||
But our working hypothesis is that that savings is only indirectly cause by this particular manual optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we could expand the explanation of how these indirect savings are enabled by the manual optimization (or by the reduction of 48 bytes per call in heap allocation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I can address this question any better than the sentence that starts the next paragraph already does. Do you have any idea how to flesh out that list of "things that could be behaving surprisingly" without too much extra research?
The underlying motivation of this PR was to explain a regression in acquiring the BlockFetch context. This has recently been explained by a computation in the node as part of an enriched tracer, completely unrelated to Consensus. Therefore, we can close this PR; the document will remain available; it might be useful in another context. |
A microbenchmark investigation of the benefits of
readTVarIO
versusatomically . readTVar
. See the directory'sREADME.md
for the motivation.