Add continuous benchmarks #4537

MadVikingGod · 2023-09-20T20:29:39Z

Problem Statement

I want to understand how the go SDK's performance has changed over time.

Proposed Solution

Use the newly available dedicated environments in the Otel Org to run benchmarks automatically.

Tasks to make this happen

Decide on which 3-4 benchmarks are important. [1]
Make a branch that the Github Pages that can be hosted from. [2]
(Optional) Run the benchmarks against historic versions to generate data
Setup workflow similar to other languages, like Java

[1]: Currently, there are approximately 300 benchmarks with 3 data points each (ns/op, allocations/op, and B/op). The current tools will make a graph for each one, which is overwhelming.
[2]: This can be done via:

git checkout --orphan benchmarks
git commit --allow-empty -m "Init"
git push origin benchmarks

Other Notes

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

The text was updated successfully, but these errors were encountered:

dmathieu · 2023-09-21T07:16:41Z

See also #2791

We already run go test -bench=. in a GH Action on pushed to main.
https://github.com/open-telemetry/opentelemetry-go/blob/main/.github/workflows/benchmark.yml

However, that run is rather unreliable, as the runs may have noisy neighbors. Switching to a dedicated environment would definitely help.

XSAM · 2024-06-19T19:34:26Z

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

@MadVikingGod, I missed the context here. Could you please elaborate on the security issue of running a benchmark CI against every PR? Thank you

dmathieu · 2024-06-20T08:28:12Z

Also note that pushing the benchmark results to GitHub pages seems like outside the acceptable use for GitHub apps/extensions, since bots should only be reading data, not writing to the repository.

https://github.com/open-telemetry/community/blob/main/docs/using-github-extensions.md

MadVikingGod · 2024-06-20T15:21:47Z

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

@MadVikingGod, I missed the context here. Could you please elaborate on the security issue of running a benchmark CI against every PR? Thank you

We accept PRs from everyone, and because those PRs can change what commands are run on the dedicated hardware, it is a security risk to have them run on every PR. GitHub does not recommend using dedicated runners with a public repository

XSAM · 2024-06-25T18:53:36Z

@MadVikingGod, is it OK if I assign this to myself? I want to push this issue.

XSAM · 2024-06-26T05:20:21Z

To make the continuous benchmark work, we at least need to solve the following issues:

Have a stable environment to run the benchmark so we can do the comparison.
Remove alert-threshold: "400%" in the benchmark.yml, as this basically disables the alert for performance degradation.
Against every commit pushed to the main branch or the PR, so we know a more precise granularity on the historical performance data.
Publish visual historical data (probably on the github page), so we can easily know the trending on performance and find the turning point. However, there is a limitation on the permission a bot can have. https://github.com/open-telemetry/community/blob/main/docs/using-github-extensions.md#using-github-apps-extensions-and-bots

Proposal:

Enable self-hosted runner to have a dedicated environment to run benchmarks and gain stable results.
Once we have stable results, we can remove alert-threshold: "400%".
Against every commit pushed to the main branch instead of against every PR. The workflows on the main branch should be safe enough to run the benchmarks on the self-hosted runner.
We can let the benchmark-action action generate the HTML files, and then we use our own command in the action to push them. With this setup, even if the benchmark-action action is compromised, it can only affect our github page, not our main branch.

Other optional goals:

Mark 3-4 important benchmarks in front of the visual page. So we don't need to check hundreds of benchmark graphs every time. This may require making a contribution to the benchmark-action repo to have such a feature.
Add benchmark comparison with other projects, like comparing the metrics benchmark with Prometheus SDK. Related: Performance vs. Prometheus SDK #5542

dmathieu · 2024-06-26T07:20:02Z

Enable self-hosted runner to have a dedicated environment to run benchmarks and gain stable results.

There has been several discussions about this at the community level, to set this up in an automated way where no single human would have the keys. No definitive solution arose yet (the closest one would be actuated, but it's not dedicated runners AFAIK).
Related: open-telemetry/community#1162

XSAM · 2024-06-26T20:37:32Z

IIUC, we have two types of github runners available to use:

Dedicated machine from Equinix. We can use runs-on: self-hosted to run github action on these machines.
- The OTel Java repo uses this type of runner to run benchmark. https://github.com/open-telemetry/opentelemetry-java/blob/9bfb81b66b38c4d2667239182fcb59ba23a0d749/.github/workflows/benchmark.yml#L11
- Request - Equinix bare metal to use for OpenTelemetry benchmark performance tests cncf/cluster#245. The machine is m3.small.x86. It has 8 cores 64 GiB. https://deploy.equinix.com/product/servers/m3-small/
Isolated VM from actuated. We can use runs-on: actuated-arm64-4cpu-4gb to run github action.
- The OTel collector uses this type of runner for testing on the main branch and every pull request (if they put certain label on the PR) https://github.com/open-telemetry/opentelemetry-collector/blob/227fb823dbf33e6a62b842aef5a70e10ed1db84f/.github/workflows/build-and-test-arm.yml#L27
- The rust project uses this for CI against every PR. https://github.com/open-telemetry/opentelemetry-rust/blob/a9b8621cd1004e7f083002071b9f9a57748c6b99/.github/workflows/ci.yml#L25
- Actuated uses a fresh, immutable, Firecracker VM for every build which boots up in less than 1 second and is destroyed after the build completes, which removes this risk.
  
  https://actuated.dev/blog/arm-ci-cncf-ampere
  https://github.com/open-telemetry/community/blob/main/docs/using-actuated.md

Related issue:

I can try actuated's machines first to see if they can produce stable benchmark results. If it doesn't, I can try self-hosted as a backup.

yurishkuro · 2024-06-26T20:42:17Z

The OTel collector uses this type of runner for testing

testing, but not benchmarks?

XSAM · 2024-06-26T22:11:16Z

The OTel collector uses this type of runner for testing

testing, but not benchmarks?

Yes, not benchmark, only testing. https://github.com/open-telemetry/opentelemetry-collector/blob/227fb823dbf33e6a62b842aef5a70e10ed1db84f/.github/workflows/build-and-test-arm.yml#L27

IIUC, the collector CI is not running any benchmark.

XSAM · 2024-07-17T17:03:47Z

Update: The actuated runner cannot run for the forked repository https://github.com/XSAM/opentelemetry-rust/actions/runs/9968339009/job/27543447028 (it will wait for a runner to pick up this job forever)

XSAM · 2024-07-18T00:26:00Z

Tasks

Use actuated runner to run benchmark #5635 for ~~every PR and~~ commit on the main branch. (actuated runner cannot run for a forked repo, and no other otel projects actually used actuated runner for benchmarks. We have to test by ourselves)
- If the test result is stable, then we can proceed. Enable all benchmarks in CI #5644
- Fix benchmark that does not compare the exact result from the previous commit #5664
- Replace fail-on-alert with comment-on-alert on benchmark test #5671
- Save benchmark cache even the job is failed #5694
Use self hosted runner to run benchmark #5695
Stabilize benchmark result of BenchmarkValueEqual #5717
Remove alert-threshold: "400%" in the benchmark.yml
Pick 3-4 benchmarks are most important to us.
Publish the result on github pages.
Mention the benchmark result page on the README.

MrAlias · 2024-07-18T18:39:19Z

@XSAM the resource limits I was thinking about in the SIG meeting were actually just Go flags: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/4301d25c8fbe97a8153eca91acc370b990011af2/.github/workflows/build-and-test-windows.yml#L53-L54

Not sure that will have an impact on the consistency of benchmarks. You can likely just disregard.

Part of #4537 Changes: - Against every commit on the main branch. - Use actuated runner `actuated-arm64-4cpu-4gb` for benchmark.

Part of #4537 Here is the recent benchmark result from the CI: - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10067601185 - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10068023613 - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10072452529 Though the results are pretty stable, they only run for very limited benchmarks, which cannot reflect the stability when running more benchmarks. --- This PR enables all benchmarks in CI so we can obtain accurate stability results. I also removed the `timeout` flag for the benchmark test, as some benchmarks contain a lot of sub-benchmarks, which will timeout anyway if we have a 60-second timeout. (Go does not offer timeout for single benchmark or test)

…s commit (#5664) Part of #4537 The benchmark action does not compare the exact result from the previous commit. For instance, a recent action https://github.com/open-telemetry/opentelemetry-go/actions/runs/10175392022 compares 2a454a7, which is a commit that pushed a week ago. This is because `actions/cache@v4` will not refresh the cache when the key exists. This is the logs from https://github.com/open-telemetry/opentelemetry-go/actions/runs/10175392022/job/28170968531 on `Post Download previous benchmark data` step. ``` Cache hit occurred on the primary key Linux-benchmark, not saving cache. ``` Moreover, GitHub action only invalidates the key that has not been accessed in over 7 days, which means forever in our case as long as we push a new commit to the main branch. https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy --- To fix this, I use the current sha as the cache key and then use the previous sha to fetch the cache. So, our benchmark can compare the exact result from the previous commit. Action result from my fork: https://github.com/XSAM/opentelemetry-go/actions/runs/10189773887/job/28188416879

Part of #4537 These are some benchmark results from the `actuated` runner. It is pretty clear to me that this runner does not have a stable environment to produce stable benchmark results: - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10278899889 - https://github.com/open-telemetry/opentelemetry-go/actions/runs/10275384692 Thus, I am switching the runner to the self-hosted runner, which uses bare metal, to run benchmarks. This is the request to use this type of runner. open-telemetry/community#2266 The underlying machine is m3.small.x86. It has 8 cores 64 GiB. https://deploy.equinix.com/product/servers/m3-small/ --- This is an example of using this runner from otel java. https://github.com/open-telemetry/opentelemetry-java/actions/runs/10277337397/job/28439208528 --- This runner cannot be triggered on a forked repo: XSAM@faab7b9. So, it is quite safe if we only let it run on the main branch.

Part of #4537, replace #5671 I tried to use `save-always` to simplify it, but this flag never worked. actions/cache#1315. So, I split the cache action into two, one for restore and one for save. The save step would always run even if a step failed. Some actions I run to prove it could work: - On success: https://github.com/XSAM/opentelemetry-go/actions/runs/10292883964/job/28488154161 - On failure: https://github.com/XSAM/opentelemetry-go/actions/runs/10292907887/job/28488227777

XSAM · 2024-08-16T19:17:37Z

benchmark-action does not recognize the Go package name; thus, it would consider benchmarks with the same name from different packages as identical benchmarks. This causes the result of benchmark comparison on the summary to be incorrect, as it would compare the wrong benchmark.

For example, the BenchmarkBool/AsBool from https://github.com/open-telemetry/opentelemetry-go/actions/runs/10424262745 action shows twice.

Benchmark suite	Current: `2db4ef2`	Previous: `f587241`	Ratio
`BenchmarkBool/AsBool`	`0.9276` ns/op 0 B/op 0 allocs/op	`1.001` ns/op 0 B/op 0 allocs/op	`0.93`
`BenchmarkBool/AsBool`	`2.087` ns/op 0 B/op 0 allocs/op	`1.001` ns/op 0 B/op 0 allocs/op	`2.08`

It might be worth introducing a group field to the result of benchmark-action.

MadVikingGod added the enhancement New feature or request label Sep 20, 2023

pellared changed the title ~~Add continious Benchmarks to the go repository.~~ Add continuous benchmarks Sep 21, 2023

This was referenced Sep 26, 2023

Update the benchmarks to only record/display select benchmarks. #4560

Merged

REQUEST: Repository maintenance on go and go-contrib open-telemetry/community#1695

Closed

XSAM self-assigned this Jun 25, 2024

XSAM mentioned this issue Jun 26, 2024

Enable benchmark summary for release #5527

Merged

XSAM mentioned this issue Jul 21, 2024

Use actuated runner to run benchmark #5635

Merged

XSAM added a commit that referenced this issue Jul 23, 2024

Use actuated runner to run benchmark (#5635)

2a454a7

Part of #4537 Changes: - Against every commit on the main branch. - Use actuated runner `actuated-arm64-4cpu-4gb` for benchmark.

XSAM mentioned this issue Jul 25, 2024

Enable all benchmarks in CI #5644

Merged

XSAM mentioned this issue Aug 1, 2024

Fix benchmark that does not compare the exact result from the previous commit #5664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add continuous benchmarks #4537

Add continuous benchmarks #4537

MadVikingGod commented Sep 20, 2023 •

edited

Loading

dmathieu commented Sep 21, 2023

XSAM commented Jun 19, 2024 •

edited

Loading

dmathieu commented Jun 20, 2024

MadVikingGod commented Jun 20, 2024

XSAM commented Jun 25, 2024

XSAM commented Jun 26, 2024 •

edited

Loading

dmathieu commented Jun 26, 2024

XSAM commented Jun 26, 2024 •

edited

Loading

yurishkuro commented Jun 26, 2024

XSAM commented Jun 26, 2024

XSAM commented Jul 17, 2024

XSAM commented Jul 18, 2024 •

edited

Loading

MrAlias commented Jul 18, 2024

XSAM commented Aug 16, 2024 •

edited

Loading

Add continuous benchmarks #4537

Add continuous benchmarks #4537

Comments

MadVikingGod commented Sep 20, 2023 • edited Loading

Problem Statement

Proposed Solution

Tasks to make this happen

Other Notes

dmathieu commented Sep 21, 2023

XSAM commented Jun 19, 2024 • edited Loading

dmathieu commented Jun 20, 2024

MadVikingGod commented Jun 20, 2024

XSAM commented Jun 25, 2024

XSAM commented Jun 26, 2024 • edited Loading

dmathieu commented Jun 26, 2024

XSAM commented Jun 26, 2024 • edited Loading

yurishkuro commented Jun 26, 2024

XSAM commented Jun 26, 2024

XSAM commented Jul 17, 2024

XSAM commented Jul 18, 2024 • edited Loading

Tasks

MrAlias commented Jul 18, 2024

XSAM commented Aug 16, 2024 • edited Loading

MadVikingGod commented Sep 20, 2023 •

edited

Loading

XSAM commented Jun 19, 2024 •

edited

Loading

XSAM commented Jun 26, 2024 •

edited

Loading

XSAM commented Jun 26, 2024 •

edited

Loading

XSAM commented Jul 18, 2024 •

edited

Loading

XSAM commented Aug 16, 2024 •

edited

Loading