Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: draft MicroReadTVarIO.hs microbenchmark #513

Closed
wants to merge 1 commit into from

Conversation

nfrisby
Copy link
Contributor

@nfrisby nfrisby commented Nov 16, 2023

A microbenchmark investigation of the benefits of readTVarIO versus atomically . readTVar. See the directory's README.md for the motivation.

@nfrisby nfrisby added the documentation Improvements or additions to documentation label Nov 16, 2023
@nfrisby
Copy link
Contributor Author

nfrisby commented Nov 16, 2023

My next step here would be another microbenchmark that is more similar to the real code's scenario (notably: indirecting through MonadSTM without specialization, and preventing sharing of the thunk across calls to atomically).

Edit: see next message.

@nfrisby
Copy link
Contributor Author

nfrisby commented Nov 16, 2023

I favored my momentum, so I added MicroReadTVarIO2.hs already.

@nfrisby nfrisby force-pushed the nfrisby/readTVarIO branch 4 times, most recently from e041297 to 84c7964 Compare November 16, 2023 21:57
@nfrisby
Copy link
Contributor Author

nfrisby commented Nov 16, 2023

I tidied it up, esp by separating out the bash scripts. I also fixed the final timing estimates, since I had originally ran them with -ticky, which has significant run-time overhead.

@nfrisby nfrisby force-pushed the nfrisby/readTVarIO branch from d927842 to 5d4a150 Compare January 10, 2024 16:35
@nfrisby nfrisby marked this pull request as ready for review January 10, 2024 16:36
@nfrisby nfrisby requested a review from a team as a code owner January 10, 2024 16:36
@nfrisby nfrisby requested review from dnadales and amesgen January 10, 2024 16:37
Copy link
Member

@dnadales dnadales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move these benchmarks to another repo, like ouroboros-consensus-tools?

# Introduction

The goal of the microbenchmarks in this directory is to demystify the consequences of replacing an `atomically . readTVar` with `readTVarIO`.
In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think we need to replace the GH link.
  • I wouldn't include the Slack link. I understand we need to keep track of things internally, but perhaps we could move the entire micro-benchmark experiment to a separate repository (where we know the consensus team will be the only likely contributors).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted on the call. I'll make it more explicit that I'm summarizing the thread here.

# Introduction

The goal of the microbenchmarks in this directory is to demystify the consequences of replacing an `atomically . readTVar` with `readTVarIO`.
In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compare to the baseline of `8.5.0-pre` (circa 2023 Nov 1).
In particular, a benchmarking run reported in [this #perf-announce message](https://input-output-rnd.slack.com/archives/C4Q7MF25U/p1698842470166369) on the IOG Slack shows that the system-level benchmarks indicate that [making that change in the `forkBlockForging` function](https://github.com/input-output-hk/ouroboros-consensus/commit/ea76c4662743e129bf56d206fd212e2fe45685c9) yields an improvement on the order of 10% CPU usage and 10% allocation compared to the baseline of `8.5.0-pre` (circa 2023 Nov 1).


# Simple Case

In the simplest possible case, the `atomically . readTVar` variant costs about 20 nanoseconds more to call (21.7 nanoseconds unoptimized, 1.37 nanoseconds optimized) than does the equivalent `readTVarIO`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a reference to this? I mean, where does this information come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the results of me running the benchmark defined in this PR.


The system-level benchmarks are generally quite reproducible, as evidenced by occasional variance analaysis runs (repeats of the same run showing final stats within 1 millisecond, eg).
So it seems likely that there is in fact a savings on the order of 10%.
But our working hypothesis is that that savings is only indirectly cause by this particular manual optimization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we could expand the explanation of how these indirect savings are enabled by the manual optimization (or by the reduction of 48 bytes per call in heap allocation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can address this question any better than the sentence that starts the next paragraph already does. Do you have any idea how to flesh out that list of "things that could be behaving surprisingly" without too much extra research?

@amesgen
Copy link
Member

amesgen commented May 31, 2024

The underlying motivation of this PR was to explain a regression in acquiring the BlockFetch context. This has recently been explained by a computation in the node as part of an enriched tracer, completely unrelated to Consensus. Therefore, we can close this PR; the document will remain available; it might be useful in another context.

@amesgen amesgen closed this May 31, 2024
@amesgen amesgen deleted the nfrisby/readTVarIO branch May 31, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants