Skip to content

CI benchmarks: DX tweaks and next steps #1515

@aochagavia

Description

@aochagavia

As mentioned in #1487, we now have a dedicated server to run the CI benchmarks. Before we start using it, we need to think about two points that have an impact on DX (developer experience).

  1. The benchmarks will run in an environment that is no longer ephemeral, which means we need to be extra careful when allowing benchmarks to run (because we are running arbitrary code from the internet);
  2. We can use custom software running on the server to coordinate the benchmark runs, giving us more freedom in how we trigger them and how we report results.

Here are some ideas on how this affects DX:

  1. Automatically triggering icount benchmarks: the automatic trigger has already been useful to identify regressions (e.g. here) and it'd be nice to keep it. We can still use a GitHub Actions job to trigger the icount bench run, as long as we make the job require maintainer approval for users who are not collaborators of the repository. Implementation wise, the benchmarks would no longer run in the GitHub runner itself, but the runner would instead send a signed POST request to our server, telling it to enqueue a bench run. That could happen in two flavors:
    1. The server keeps the HTTP connection open until the benchmarks are done and sends the results in the response. This could take ~2 minutes in the best case, or longer if we have to wait for previously enqueued runs to complete (wasting GitHub Actions time and potentially causing contention if multiple runners are blocked waiting for the bench server). The main benefit here is that we could keep reporting significant results in the same way we are doing now (i.e. as a CI failure + a summary). Having a CI failure is nice, because it is practically impossible to miss. Waiting for a response by the server would also surface any unexpected server errors.
    2. The server immediately returns an empty response (i.e. the GitHub Actions job is always successful, unless there are connectivity issues). Later, when the results become available, they are posted as a comment to the PR (in a way similar to the test coverage comment, which gets updated automatically after each push). This is more friendly towards the GitHub Actions runners, but we lose the ability to let the CI fail. There is also a chance that internal errors in the bench runner go unnoticed (e.g. a crash that kills the current bench run and prevents the results comment from ever being posted to GitHub).
  2. Manually triggering time-based benchmarks: these benchmarks are not necessary for most PRs, but they could be manually triggered by posting a comment to the PR addressed at a bot (something like @rustls-bot bench). The results would be posted back by the bot as a comment, similar to how this is done for the development of the Rust compiler.
  3. Automatically triggering icount + walltime benchmark runs for each push to main: this allows us to keep track of results over time, both for visualization purposes and for calculating the appropriate noise threshold per benchmark. We should definitely keep the results around locally, and it might make sense to push them to something like bencher.dev to get fancy graphs.

After this wall of text, here are two questions:

  1. For the icount benchmarks, my gut feeling is that we should let the GitHub Actions job finish early. The lack of CI failures is IMO compensated by the PR comment with the bench results. Do you agree? Regarding the possibility of server failures going unnoticed, I think we can make the server robust enough that the only point of failure is GitHub being unavailable when we try to post the results (in case of an error, we would post that as a comment too).
  2. Do you have any other remarks or suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions