Skip to content

CI benchmarks: consider using an adaptive noise threshold #1485

@aochagavia

Description

@aochagavia

The current CI benchmarks use a hardcoded threshold of 0.20%. If there is an instruction count difference greater than 0.20% for any benchmark, the CI job fails and the reviewer is expected to triage the benchmark results.

In a conversation with @nnethercote (known for his effort optimizing the Rust compiler), he strongly adviced to use an adaptive threshold instead (i.e. look at the history of measurements for the specific benchmark and automatically derive a threshold based on them). This is necessary to deal with the fact that even stable benchmarks can suddenly become noisy without a clear cause (see dealing with noise by @Kobzol, who also works benchmarking Rust). I do think it makes sense to keep 0.20% as a hardcoded lower bound.

There is some complexity associated with using an adaptive threshold, though:

  1. We need to keep track of past measurements somewhere (will it require setting up additional infrastructure?)
  2. We need to come up with a sound way to derive the threshold from the historic data (maybe we can just copy whatever Rust is doing)

IMO there is also a chance we might get away with the current hardcoded approach. Though we have more than 20 benchmarks, they all boil down to 3 basic scenarios (full handshake, resumed handshake, data transfer), and things seem to be working well since we hooked up the benchmarks to the CI. Maybe manually modifying the noise thresholds upon need requires less work than developing and maintaining an adaptive noise setup.

@djc @ctz @cpu since you do most of the PR reviewing and are in that sense users of the CI benchmarks, do you have an opinion on this?

@nnethercote @Kobzol feel free to comment if you have any additional insights to share

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions