CI benchmarks: consider using an adaptive noise threshold

The current CI benchmarks use a hardcoded threshold of 0.20%. If there is an instruction count difference greater than 0.20% for any benchmark, the CI job fails and the reviewer is expected to triage the benchmark results.

In a conversation with @nnethercote (known for his effort optimizing the Rust compiler), he strongly adviced to use an adaptive threshold instead (i.e. look at the history of measurements for the specific benchmark and automatically derive a threshold based on them). This is necessary to deal with the fact that even stable benchmarks can suddenly become noisy without a clear cause (see [dealing with noise](https://kobzol.github.io/rust/rustc/2023/08/18/rustc-benchmark-suite.html#dealing-with-noise) by @kobzol, who also works benchmarking Rust). I do think it makes sense to keep 0.20% as a hardcoded lower bound.

There is some complexity associated with using an adaptive threshold, though:

1. We need to keep track of past measurements somewhere (will it require setting up additional infrastructure?)
2. We need to come up with a sound way to derive the threshold from the historic data (maybe we can just copy whatever Rust is doing)

IMO there is also a chance we might get away with the current hardcoded approach. Though we have more than 20 benchmarks, they all boil down to 3 basic scenarios (full handshake, resumed handshake, data transfer), and things seem to be working well since we hooked up the benchmarks to the CI. Maybe manually modifying the noise thresholds upon need requires less work than developing and maintaining an adaptive noise setup.

@djc @ctz @cpu since you do most of the PR reviewing and are in that sense users of the CI benchmarks, do you have an opinion on this?

@nnethercote @kobzol feel free to comment if you have any additional insights to share

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI benchmarks: consider using an adaptive noise threshold #1485

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

CI benchmarks: consider using an adaptive noise threshold #1485

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions