Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of warm-up runs should not be the same every time #1565

Open
twharmon opened this issue Jan 12, 2024 · 1 comment
Open

Number of warm-up runs should not be the same every time #1565

twharmon opened this issue Jan 12, 2024 · 1 comment

Comments

@twharmon
Copy link
Contributor

I found an irregularity on a handful of benchmarks where the duration was 10 times the typical duration, just because the assigned number of warm-up runs led to garbage collection on the timed run (or something like that).

In my example, select row took ~6ms when the number of warm-up runs was 0, 10, or 20. But it took 60ms with 5 warm-up runs.

I was using tinygo to compile to wasm, so it could be that specific allocator / runtime that leads to this issue.

The difference is easy to see in the screenshot below. goui-alt is the same app compiled with the standard go compiler. Tinygo performs better than Go for the most part, but the warm-up runs in a few instances can cause tinygo's runtime to be slow in a consistent way.
results

Just changing the number of warm-up runs to 8 completely changed the results:
8warmups

Suggestion

Instead of 5 warm-ups for everything, why don't we make it rand(5), rand(10), or nth-run % 5?

@twharmon twharmon changed the title Number of warm-up runs should be random within a range Number of warm-up runs should not be the same every time Jan 12, 2024
@krausest
Copy link
Owner

krausest commented Jan 12, 2024

Originally I wanted to include GC in the results (and thus punish high GC pressure), but it turned out that this conflicted with the goal to reduce variance. Thus some time ago I decided to run GC before each benchmark iteration:
https://github.com/krausest/js-framework-benchmark/blob/master/webdriver-ts/src/forkedBenchmarkRunnerPuppeteer.ts#L127

So I think the problem you describe occurs only for WASM frameworks since they managed the memory on their own.
I'm not sure what's the best way to handle that.

  • Keeping GC for JS might penalize WASM frameworks
  • Removing GC from JS introduces noise that makes comparison between close contenders worthless

I think I'll check the next few days how big the impact is when including GC again. Maybe then randomizing the warmup count will make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants