Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add continuous benchmarks #4537

Open
1 of 4 tasks
MadVikingGod opened this issue Sep 20, 2023 · 14 comments
Open
1 of 4 tasks

Add continuous benchmarks #4537

MadVikingGod opened this issue Sep 20, 2023 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@MadVikingGod
Copy link
Contributor

MadVikingGod commented Sep 20, 2023

Problem Statement

I want to understand how the go SDK's performance has changed over time.

Proposed Solution

Use the newly available dedicated environments in the Otel Org to run benchmarks automatically.

Tasks to make this happen

  • Decide on which 3-4 benchmarks are important. [1]
  • Make a branch that the Github Pages that can be hosted from. [2]
  • (Optional) Run the benchmarks against historic versions to generate data
  • Setup workflow similar to other languages, like Java

[1]: Currently, there are approximately 300 benchmarks with 3 data points each (ns/op, allocations/op, and B/op). The current tools will make a graph for each one, which is overwhelming.
[2]: This can be done via:

git checkout --orphan benchmarks
git commit --allow-empty -m "Init"
git push origin benchmarks

Other Notes

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

@MadVikingGod MadVikingGod added the enhancement New feature or request label Sep 20, 2023
@dmathieu
Copy link
Member

See also #2791

We already run go test -bench=. in a GH Action on pushed to main.
https://github.com/open-telemetry/opentelemetry-go/blob/main/.github/workflows/benchmark.yml

However, that run is rather unreliable, as the runs may have noisy neighbors. Switching to a dedicated environment would definitely help.

@pellared pellared changed the title Add continious Benchmarks to the go repository. Add continuous benchmarks Sep 21, 2023
@XSAM
Copy link
Member

XSAM commented Jun 19, 2024

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

@MadVikingGod, I missed the context here. Could you please elaborate on the security issue of running a benchmark CI against every PR? Thank you

@dmathieu
Copy link
Member

Also note that pushing the benchmark results to GitHub pages seems like outside the acceptable use for GitHub apps/extensions, since bots should only be reading data, not writing to the repository.

https://github.com/open-telemetry/community/blob/main/docs/using-github-extensions.md

@MadVikingGod
Copy link
Contributor Author

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

@MadVikingGod, I missed the context here. Could you please elaborate on the security issue of running a benchmark CI against every PR? Thank you

We accept PRs from everyone, and because those PRs can change what commands are run on the dedicated hardware, it is a security risk to have them run on every PR. GitHub does not recommend using dedicated runners with a public repository

@XSAM
Copy link
Member

XSAM commented Jun 25, 2024

@MadVikingGod, is it OK if I assign this to myself? I want to push this issue.

@XSAM XSAM self-assigned this Jun 25, 2024
@XSAM
Copy link
Member

XSAM commented Jun 26, 2024

To make the continuous benchmark work, we at least need to solve the following issues:

  • Have a stable environment to run the benchmark so we can do the comparison.
  • Remove alert-threshold: "400%" in the benchmark.yml, as this basically disables the alert for performance degradation.
  • Against every commit pushed to the main branch or the PR, so we know a more precise granularity on the historical performance data.
  • Publish visual historical data (probably on the github page), so we can easily know the trending on performance and find the turning point. However, there is a limitation on the permission a bot can have. https://github.com/open-telemetry/community/blob/main/docs/using-github-extensions.md#using-github-apps-extensions-and-bots

Proposal:

  • Enable self-hosted runner to have a dedicated environment to run benchmarks and gain stable results.
  • Once we have stable results, we can remove alert-threshold: "400%".
  • Against every commit pushed to the main branch instead of against every PR. The workflows on the main branch should be safe enough to run the benchmarks on the self-hosted runner.
  • We can let the benchmark-action action generate the HTML files, and then we use our own command in the action to push them. With this setup, even if the benchmark-action action is compromised, it can only affect our github page, not our main branch.

Other optional goals:

  • Mark 3-4 important benchmarks in front of the visual page. So we don't need to check hundreds of benchmark graphs every time. This may require making a contribution to the benchmark-action repo to have such a feature.
  • Add benchmark comparison with other projects, like comparing the metrics benchmark with Prometheus SDK. Related: Performance vs. Prometheus SDK #5542

@dmathieu
Copy link
Member

Enable self-hosted runner to have a dedicated environment to run benchmarks and gain stable results.

There has been several discussions about this at the community level, to set this up in an automated way where no single human would have the keys. No definitive solution arose yet (the closest one would be actuated, but it's not dedicated runners AFAIK).
Related: open-telemetry/community#1162

@XSAM
Copy link
Member

XSAM commented Jun 26, 2024

IIUC, we have two types of github runners available to use:

Related issue:


I can try actuated's machines first to see if they can produce stable benchmark results. If it doesn't, I can try self-hosted as a backup.

@yurishkuro
Copy link
Member

The OTel collector uses this type of runner for testing

testing, but not benchmarks?

@XSAM
Copy link
Member

XSAM commented Jun 26, 2024

The OTel collector uses this type of runner for testing

testing, but not benchmarks?

Yes, not benchmark, only testing. https://github.com/open-telemetry/opentelemetry-collector/blob/227fb823dbf33e6a62b842aef5a70e10ed1db84f/.github/workflows/build-and-test-arm.yml#L27

IIUC, the collector CI is not running any benchmark.

@XSAM
Copy link
Member

XSAM commented Jul 17, 2024

Update: The actuated runner cannot run for the forked repository https://github.com/XSAM/opentelemetry-rust/actions/runs/9968339009/job/27543447028 (it will wait for a runner to pick up this job forever)

@XSAM
Copy link
Member

XSAM commented Jul 18, 2024

Tasks

@MrAlias
Copy link
Contributor

MrAlias commented Jul 18, 2024

@XSAM the resource limits I was thinking about in the SIG meeting were actually just Go flags: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/4301d25c8fbe97a8153eca91acc370b990011af2/.github/workflows/build-and-test-windows.yml#L53-L54

Not sure that will have an impact on the consistency of benchmarks. You can likely just disregard.

XSAM added a commit that referenced this issue Jul 23, 2024
Part of #4537

Changes:
- Against every commit on the main branch.
- Use actuated runner `actuated-arm64-4cpu-4gb` for benchmark.
XSAM added a commit that referenced this issue Jul 30, 2024
Part of #4537

Here is the recent benchmark result from the CI:
-
https://github.com/open-telemetry/opentelemetry-go/actions/runs/10067601185
-
https://github.com/open-telemetry/opentelemetry-go/actions/runs/10068023613
-
https://github.com/open-telemetry/opentelemetry-go/actions/runs/10072452529

Though the results are pretty stable, they only run for very limited
benchmarks, which cannot reflect the stability when running more
benchmarks.

---

This PR enables all benchmarks in CI so we can obtain accurate stability
results. I also removed the `timeout` flag for the benchmark test, as
some benchmarks contain a lot of sub-benchmarks, which will timeout
anyway if we have a 60-second timeout. (Go does not offer timeout for
single benchmark or test)
XSAM added a commit that referenced this issue Aug 1, 2024
…s commit (#5664)

Part of #4537

The benchmark action does not compare the exact result from the previous
commit. For instance, a recent action
https://github.com/open-telemetry/opentelemetry-go/actions/runs/10175392022
compares 2a454a7, which is a commit
that pushed a week ago.

This is because `actions/cache@v4` will not refresh the cache when the
key exists.

This is the logs from
https://github.com/open-telemetry/opentelemetry-go/actions/runs/10175392022/job/28170968531
on `Post Download previous benchmark data` step.

```
Cache hit occurred on the primary key Linux-benchmark, not saving cache.
```

Moreover, GitHub action only invalidates the key that has not been
accessed in over 7 days, which means forever in our case as long as we
push a new commit to the main branch.


https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy

---

To fix this, I use the current sha as the cache key and then use the
previous sha to fetch the cache. So, our benchmark can compare the exact
result from the previous commit.

Action result from my fork:
https://github.com/XSAM/opentelemetry-go/actions/runs/10189773887/job/28188416879
XSAM added a commit that referenced this issue Aug 9, 2024
Part of #4537

These are some benchmark results from the `actuated` runner. It is
pretty clear to me that this runner does not have a stable environment
to produce stable benchmark results:
-
https://github.com/open-telemetry/opentelemetry-go/actions/runs/10278899889
-
https://github.com/open-telemetry/opentelemetry-go/actions/runs/10275384692

Thus, I am switching the runner to the self-hosted runner, which uses
bare metal, to run benchmarks.

This is the request to use this type of runner.
open-telemetry/community#2266

The underlying machine is m3.small.x86. It has 8 cores 64 GiB.
https://deploy.equinix.com/product/servers/m3-small/

---

This is an example of using this runner from otel java.
https://github.com/open-telemetry/opentelemetry-java/actions/runs/10277337397/job/28439208528

---

This runner cannot be triggered on a forked repo:
XSAM@faab7b9.
So, it is quite safe if we only let it run on the main branch.
XSAM added a commit that referenced this issue Aug 9, 2024
Part of #4537,
replace #5671

I tried to use `save-always` to simplify it, but this flag never worked.
actions/cache#1315.

So, I split the cache action into two, one for restore and one for save.
The save step would always run even if a step failed.

Some actions I run to prove it could work:
- On success:
https://github.com/XSAM/opentelemetry-go/actions/runs/10292883964/job/28488154161
- On failure:
https://github.com/XSAM/opentelemetry-go/actions/runs/10292907887/job/28488227777
@XSAM
Copy link
Member

XSAM commented Aug 16, 2024

benchmark-action does not recognize the Go package name; thus, it would consider benchmarks with the same name from different packages as identical benchmarks. This causes the result of benchmark comparison on the summary to be incorrect, as it would compare the wrong benchmark.

For example, the BenchmarkBool/AsBool from https://github.com/open-telemetry/opentelemetry-go/actions/runs/10424262745 action shows twice.

Benchmark suite Current: 2db4ef2 Previous: f587241 Ratio
BenchmarkBool/AsBool 0.9276 ns/op 0 B/op 0 allocs/op 1.001 ns/op 0 B/op 0 allocs/op 0.93
BenchmarkBool/AsBool 2.087 ns/op 0 B/op 0 allocs/op 1.001 ns/op 0 B/op 0 allocs/op 2.08

It might be worth introducing a group field to the result of benchmark-action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants