Implement automatic warmup timing #135

evanphx · 2024-01-13T00:52:26Z

The idea is to run the item until the cycles per 100ms timing
is within 1% of the previous run. This means that the default is still
2 seconds like it was originally, but now if those 2 seconds didn't
yield runs that were close enough together, the warmup will continue to
run.

It will run for a maximum of 30 seconds.

There are undoubtedly more sophisticated ways to terminate the warmup, but this is a simple metric that is easy for end users to understand.

Fixes #92

eregon

This looks convenient but I'm worried/concerned that it will not work well for some JITs and some benchmarks.
For instance if there is a lot to compile the performance might look stable for some time but might not be peak performance.

For a practical example let's look at this warmup plot:

(from this blog post)
We can see multiple regions where it's flat but yet it's far from peak performance.

This is also what papers like https://arxiv.org/pdf/1602.00602.pdf found:

Kalibera and Jones 2013 convincingly show the limitations of such approaches, presenting instead a manual approach to determining if and when a steady state has been reached

that in the general case it is not possible to detect warmup reliably.
On a given set of benchmarks and Ruby engines it might be, but on unknown benchmarks it's not.

eregon · 2024-01-13T17:23:08Z

lib/benchmark/ips/job.rb

+              prev / per
+            end
+
+            if diff - 1.0 <= 0.1


Suggested change

if diff - 1.0 <= 0.1

if diff - 1.0 <= 0.01

For 1%

eregon · 2024-01-13T17:43:46Z

Auto-detecting warmup is always tempting and certainly convenient when it works and for instance I wrote this (PR) for yjit-bench.
It starts with a 0.1% goal and uses the median absolute deviation over all iterations so far.
This means it has a better idea of warmup because it considers all iterations and not just the last two (the current approach in this PR).
It also increases the goal by 0.01% per iteration so that it allows more margin for more noisy benchmarks (and gives up after 20 minutes).
It's not perfect though and e.g. can fail to find steady state if the benchmark is quite noisy or if there is a very long flat region (i.e. approximately as long as half of all iterations so far) which is not actually peak.

So I think the safest would be to not add auto warmup to benchmark-ips, because it will fail in some cases and then it would be misleading.
If we still want to add it then I think we should try to detect when it fails and warn in such a case and check it on many benchmarks and Ruby implementations.

evanphx · 2024-01-13T23:30:43Z

@eregon Good context. Perhaps another way to look at this is, by having a simple autowarm by default that isn't sophisticated, it's better than the default (strictly 2 seconds) without making people overly confident that it's always correct.

Folks testing JITs, etc are always going to want to specify the warmup, I don't we could come up with a general purpose algorithm that would work for those folks and folks using benchmark-ips to just test out 2 different chunks of code.

This would be aimed at that later group, where the warmup can catch potential areas where their default warmup wasn't sufficiant.

eregon · 2024-01-15T13:52:58Z

I think these two groups overlap quite a bit, e.g., folks just using benchmark-ips and running on YJIT.
Or of course folks just using benchmark-ips and running on JRuby/TruffleRuby (although I guess less frequent).

For CRuby interpreter no-JIT, then autowarmup is not needed, and the default 2 seconds is fine.
For YJIT it's less clear, we would need to look at warmup curves on bigger benchmarks and see if autowarmup detects it correctly there.
On small benchmarks, the 2 seconds should be enough for YJIT and probably for TruffleRuby Native too, while it might not be enough for TruffleRuby JVM and JRuby (where it's likely most difficult to autowarmup correctly).

I think we'd need to look at some bigger benchmarks and see if the autowarmup is better than the default 2 seconds.

nateberkopec · 2024-02-03T10:17:38Z

Should there me a message printed if you hit the full 30 seconds? It's kind of a "failure" in some sense.
1% feels aggressive to me. The only time I see 1% variance in a benchmark-ips result is when I'm testing the most braindead low-level Ruby stuff... certainly any Rails bench involving database interaction at all usually ends up with a few % in either direction. Maybe changing the goal over time (as eregon suggested) is the solution.

CRuby interpreter no-JIT, then autowarmup is not needed, and the default 2 seconds is fine.

Plenty of people use benchmark-ips to test application-level code which may be ~0.5-1second per iteration. I like autowarmup because it will automatically give these users longer, more useful warmups.

Implement automatic warmup timing

218e2ee

evanphx requested a review from nateberkopec January 13, 2024 00:52

eregon reviewed Jan 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement automatic warmup timing #135

Implement automatic warmup timing #135

evanphx commented Jan 13, 2024

eregon left a comment

eregon Jan 13, 2024

eregon commented Jan 13, 2024 •

edited

Loading

evanphx commented Jan 13, 2024

eregon commented Jan 15, 2024

nateberkopec commented Feb 3, 2024

Implement automatic warmup timing #135

Are you sure you want to change the base?

Implement automatic warmup timing #135

Conversation

evanphx commented Jan 13, 2024

eregon left a comment

Choose a reason for hiding this comment

eregon Jan 13, 2024

Choose a reason for hiding this comment

eregon commented Jan 13, 2024 • edited Loading

evanphx commented Jan 13, 2024

eregon commented Jan 15, 2024

nateberkopec commented Feb 3, 2024

eregon commented Jan 13, 2024 •

edited

Loading