docs: draft MicroReadTVarIO.hs microbenchmark #513

nfrisby · 2023-11-16T18:52:51Z

A microbenchmark investigation of the benefits of readTVarIO versus atomically . readTVar. See the directory's README.md for the motivation.

nfrisby · 2023-11-16T18:59:06Z

My next step here would be another microbenchmark that is more similar to the real code's scenario (notably: indirecting through MonadSTM without specialization, and preventing sharing of the thunk across calls to atomically).

Edit: see next message.

nfrisby · 2023-11-16T21:49:58Z

I favored my momentum, so I added MicroReadTVarIO2.hs already.

nfrisby · 2023-11-16T22:24:26Z

I tidied it up, esp by separating out the bash scripts. I also fixed the final timing estimates, since I had originally ran them with -ticky, which has significant run-time overhead.

dnadales

Would it make sense to move these benchmarks to another repo, like ouroboros-consensus-tools?

dnadales · 2024-01-11T14:44:27Z

docs/technical-reports/readTVarIO-optimization/README.md

+# Introduction
+
+The goal of the microbenchmarks in this directory is to demystify the consequences of replacing an `atomically . readTVar` with `readTVarIO`.
+In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1).


I think we need to replace the GH link.

I wouldn't include the Slack link. I understand we need to keep track of things internally, but perhaps we could move the entire micro-benchmark experiment to a separate repository (where we know the consensus team will be the only likely contributors).

We chatted on the call. I'll make it more explicit that I'm summarizing the thread here.

dnadales · 2024-01-11T14:44:59Z

docs/technical-reports/readTVarIO-optimization/README.md

+# Introduction
+
+The goal of the microbenchmarks in this directory is to demystify the consequences of replacing an `atomically . readTVar` with `readTVarIO`.
+In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1).


Suggested change

In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1).

In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compared to the baseline of `8.5.0-pre` (circa 2023 Nov 1).

dnadales · 2024-01-11T14:45:48Z

docs/technical-reports/readTVarIO-optimization/README.md

+
+# Simple Case
+
+In the simplest possible case, the `atomically . readTVar` variant costs about 20 nanoseconds more to call (21.7 nanoseconds unoptimized, 1.37 nanoseconds optimized) than does the equivalent `readTVarIO`.


Do we have a reference to this? I mean, where does this information come from?

Those are the results of me running the benchmark defined in this PR.

dnadales · 2024-01-11T14:51:00Z

docs/technical-reports/readTVarIO-optimization/README.md

+
+The system-level benchmarks are generally quite reproducible, as evidenced by occasional variance analaysis runs (repeats of the same run showing final stats within 1 millisecond, eg).
+So it seems likely that there is in fact a savings on the order of 10%.
+But our working hypothesis is that that savings is only indirectly cause by this particular manual optimization.


I don't know if we could expand the explanation of how these indirect savings are enabled by the manual optimization (or by the reduction of 48 bytes per call in heap allocation).

I don't think I can address this question any better than the sentence that starts the next paragraph already does. Do you have any idea how to flesh out that list of "things that could be behaving surprisingly" without too much extra research?

amesgen · 2024-05-31T15:17:07Z

The underlying motivation of this PR was to explain a regression in acquiring the BlockFetch context. This has recently been explained by a computation in the node as part of an enriched tracer, completely unrelated to Consensus. Therefore, we can close this PR; the document will remain available; it might be useful in another context.

nfrisby added the documentation Improvements or additions to documentation label Nov 16, 2023

nfrisby force-pushed the nfrisby/readTVarIO branch from c665ff7 to 6dca85f Compare November 16, 2023 21:48

nfrisby force-pushed the nfrisby/readTVarIO branch 4 times, most recently from e041297 to 84c7964 Compare November 16, 2023 21:57

nfrisby force-pushed the nfrisby/readTVarIO branch 2 times, most recently from ed29a44 to d927842 Compare November 17, 2023 14:36

amesgen mentioned this pull request Nov 20, 2023

Add getCurrentChainIO, use it for BlockContext acquisition #462

Closed

docs: draft MicroReadTVarIO.hs microbenchmark

5d4a150

nfrisby force-pushed the nfrisby/readTVarIO branch from d927842 to 5d4a150 Compare January 10, 2024 16:35

nfrisby marked this pull request as ready for review January 10, 2024 16:36

nfrisby requested a review from a team as a code owner January 10, 2024 16:36

nfrisby requested review from dnadales and amesgen January 10, 2024 16:37

dnadales reviewed Jan 15, 2024

View reviewed changes

dnadales mentioned this pull request Jan 16, 2024

Investigate 8.5 regressions #461

Open

1 task

dnadales assigned nfrisby Jan 16, 2024

amesgen closed this May 31, 2024

amesgen deleted the nfrisby/readTVarIO branch May 31, 2024 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: draft MicroReadTVarIO.hs microbenchmark #513

docs: draft MicroReadTVarIO.hs microbenchmark #513

nfrisby commented Nov 16, 2023 •

edited

Loading

nfrisby commented Nov 16, 2023 •

edited

Loading

nfrisby commented Nov 16, 2023

nfrisby commented Nov 16, 2023

dnadales left a comment

dnadales Jan 11, 2024

nfrisby Jan 16, 2024

dnadales Jan 11, 2024

dnadales Jan 11, 2024

nfrisby Jan 16, 2024

dnadales Jan 11, 2024

nfrisby Jan 16, 2024

amesgen commented May 31, 2024 •

edited

Loading


		# Simple Case

		In the simplest possible case, the `atomically . readTVar` variant costs about 20 nanoseconds more to call (21.7 nanoseconds unoptimized, 1.37 nanoseconds optimized) than does the equivalent `readTVarIO`.

docs: draft MicroReadTVarIO.hs microbenchmark #513

docs: draft MicroReadTVarIO.hs microbenchmark #513

Conversation

nfrisby commented Nov 16, 2023 • edited Loading

nfrisby commented Nov 16, 2023 • edited Loading

nfrisby commented Nov 16, 2023

nfrisby commented Nov 16, 2023

dnadales left a comment

Choose a reason for hiding this comment

dnadales Jan 11, 2024

Choose a reason for hiding this comment

nfrisby Jan 16, 2024

Choose a reason for hiding this comment

dnadales Jan 11, 2024

Choose a reason for hiding this comment

dnadales Jan 11, 2024

Choose a reason for hiding this comment

nfrisby Jan 16, 2024

Choose a reason for hiding this comment

dnadales Jan 11, 2024

Choose a reason for hiding this comment

nfrisby Jan 16, 2024

Choose a reason for hiding this comment

amesgen commented May 31, 2024 • edited Loading

nfrisby commented Nov 16, 2023 •

edited

Loading

nfrisby commented Nov 16, 2023 •

edited

Loading

amesgen commented May 31, 2024 •

edited

Loading