Make runtime benchmark harness more flexible to use #1644
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some runtime benchmarks need only read-only access to some input data on which they operate on. Currently, it wasn't easily possible to express this in the runtime benchmark harness, and even though a benchmark only needed to read from some data, the data had to be re-created for each iteration of the benchmark (twice actually, since we run each benchmark two times, to gather instructions and time).
This meant that the data for each benchmark had to be re-created many times, which could slow down the execution of the whole benchmark group. And also the data was always created anew, possibly on a new memory address, which could introduce more noise into the benchmark, because each iteration of the benchmark was operating potentially on different regions of memory.
After this change, benchmarks that only need read-only access to some input data should only create that data once and then read it repeatedly. This can in theory mean that the data will be more available in L1/L2/L3 caches, but that's an orthogonal problem. It is even a question whether we want to avoid this, we could instead warm up the cache and perform measurements on a "prepared cache state", which could further reduce measurement variation. In any case, I have some code prepared to flush the caches if this would be considered a problem.