Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PR benchmarking #750

Merged
merged 1 commit into from
Jun 28, 2024
Merged

Add PR benchmarking #750

merged 1 commit into from
Jun 28, 2024

Conversation

ivanvc
Copy link
Member

@ivanvc ivanvc commented May 7, 2024

This PR introduces benchmarking in a PR, comparing the results vs. the base of that PR. It currently displays the results in the Job summary. If the performance difference is greater than 5%, it will make the job fail.

Supersedes #691.

.github/workflows/compare-benchmarks.yaml Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@ahrtr ahrtr mentioned this pull request May 8, 2024
@ahrtr
Copy link
Member

ahrtr commented May 8, 2024

Please also consider to document/explain the results in the markdown table.

Is it possible to mark the workflow check as failed if the performance diff > a threshold, i.e. 5%?

@ivanvc
Copy link
Member Author

ivanvc commented May 8, 2024

Hi @ahrtr, sorry I forgot to say that this was in a PoC state, I was just looking to get the desired output. I'll clean this up when we decide what direction we want to follow.

@ivanvc ivanvc force-pushed the add-pr-benchmark branch 2 times, most recently from 7f35340 to ad4a6ae Compare May 17, 2024 03:41
@ivanvc
Copy link
Member Author

ivanvc commented May 17, 2024

I've cleaned up the code and squashed commits. It's very similar to the original implementation from #691 but adds some configuration and timeout so the job won't fail.

However, what doesn't convince me is that for the same conditions, the op/sec seems to be highly variable, for example, the latest run (https://github.com/etcd-io/bbolt/actions/runs/9122610992). Without changes to the code, the difference is 4.51%.

Is it possible to mark the workflow check as failed if the performance diff > a threshold, i.e. 5%?

I could scan the results, check for a threshold, and make the check fail. However, see my previous paragraph. I don't know if this benchmark (or at least op/sec) is accurate (or representative).

I noticed locally that the more runs, the better and the more accurate the results. However, Go benchmarks in this repository are slow. Running a count of 5 (what Benchstat suggests for 95% confidence) takes about 45 minutes per run (meaning twice for checking the base and head for a PR).

Would you say this is good enough for a first approach at benchmarking, @ahrtr? Document the markdown table, add a threshold, and get it going?

@ahrtr
Copy link
Member

ahrtr commented May 17, 2024

Would you say this is good enough for a first approach at benchmarking, @ahrtr? Document the markdown table, add a threshold, and get it going?

Thanks for the work. Overall looks good to me.

We can continue to enhance or revert the change if we see any big problem in future.

@ivanvc
Copy link
Member Author

ivanvc commented May 23, 2024

@ahrtr, do you have any suggestion where to document the table? I was about to add it as part of the output, but I'm not sure if you think it will be better just in the code.

@ahrtr
Copy link
Member

ahrtr commented May 23, 2024

I was about to add it as part of the output

It works for me.

@ivanvc ivanvc force-pushed the add-pr-benchmark branch 3 times, most recently from fe29042 to 1250765 Compare May 24, 2024 20:23
@ivanvc ivanvc changed the title [do not merge] Add pr benchmark Add PR benchmarking May 24, 2024
@ivanvc ivanvc marked this pull request as ready for review May 24, 2024 21:28
@ivanvc
Copy link
Member Author

ivanvc commented May 24, 2024

@ahrtr, could you PTAL at this. I set it as ready for review. I worked on what we agreed, and the results for this PR are here: https://github.com/etcd-io/bbolt/actions/runs/9229526551?pr=750

@ivanvc ivanvc requested a review from ahrtr May 24, 2024 21:29
Makefile Outdated Show resolved Hide resolved
scripts/compare_benchmarks.sh Outdated Show resolved Hide resolved
.github/workflows/benchmark-pr.yaml Outdated Show resolved Hide resolved
@tjungblu
Copy link
Contributor

that's neat, can we get it to post the resulting MD table to a comment as well?

Makefile Show resolved Hide resolved
@ivanvc
Copy link
Member Author

ivanvc commented May 28, 2024

that's neat, can we get it to post the resulting MD table to a comment as well?

@tjungblu, I was researching a GitHub action that can comment (and update the original comment for new runs). I think it would be even better if we add a prow command (or a GitHub label) and just run these benchmarks if the label exists... However, I want to get first this PR merged, then add the next round of improvements :)

@ivanvc
Copy link
Member Author

ivanvc commented Jun 1, 2024

Please hold reviewing before I address the build failure

@ivanvc
Copy link
Member Author

ivanvc commented Jun 5, 2024

Yes, Using the existing bench is the simplest solution. In that case, can you still use benchstat to computes statistical summaries?

No, unless we change our bench output format to match go test -bench's.

Refer to cmd/benchstat documentation:

Each input file should be in the Go benchmark format (https://golang.org/design/14313-benchmark-format), such as the output of “go test -bench .”.
...

@ahrtr
Copy link
Member

ahrtr commented Jun 6, 2024

Sorry if this is repetitive and all the back and forth. But IMHO, what would make sense is:

  1. Close this PR
  2. Define a set of running parameters for our internal bench command
  3. Open a new PR with similar behavior to this one, using 2.

We could even do 2 and 3 in an iterative approach so we can start measuring and making progress sooner rather than later.

I am OK with the approach for now.

You need to do the statistical summaries yourself. pebble/ycsb.go might be a good reference (I haven't got time to dig into it yet, thx @tjungblu for providing the reference)

@tjungblu
Copy link
Contributor

tjungblu commented Jun 6, 2024

I have contributed etcd in https://github.com/pingcap/go-ycsb/tree/master/db and there's one for boltdb - if you guys are not totally hooked into reusing the bench command.

I think we can otherwise also wrap the bench command in a testing.B loop with little refactoring. Inversely, testing.BenchmarkResult could also be used to aggregate the results, so we would get the correct output format for free.

@tjungblu
Copy link
Contributor

tjungblu commented Jun 6, 2024

also quickly cobbled together the YCSB in pingcap/go-ycsb#300 - also really interesting to see the perf differences between boltdb and bbolt :)

Might be another alternative to bench cmd, because the workloads are fairly well defined.
We would need to checkout their repo though, update the go.mod to point to a given commit hash/branch and rebuild + run from there. Comparison is more manual with it too...

@ivanvc
Copy link
Member Author

ivanvc commented Jun 6, 2024

I like that YCSB has a defined set of benchmarks to run. However, the reasons why I would prefer to use our bench command are:

  1. With add gobench output option #765, it's trivial to add benchstat and benefit from using it to generate the statistical summary. As pointed out, with go-ycsb, we'll need to build it manually.
  2. The setup to have go-ycsb working with a specific commit sounds more intricate.

But if you guys feel that's a better direction, we can pursue that path :)

@tjungblu
Copy link
Contributor

tjungblu commented Jun 7, 2024

I don't mind tbh, in the end we should choose something representative for etcd and k8s as a benchmark profile.
Just noticed that YCSB doesn't really have a delete component, that's important for us because of the events/leases and fragmentation that we should account for.

@ivanvc
Copy link
Member Author

ivanvc commented Jun 7, 2024

Just noticed that YCSB doesn't really have a delete component.

I think you have also just exposed that we don't benchmark deletes 😅.

@ahrtr
Copy link
Member

ahrtr commented Jun 8, 2024

I tend to use our own bench,

  • We have more flexibility to change & tune the test.
  • I had a quick review on add bbolt pingcap/go-ycsb#300, each bbolt TXN has only one write(put) operation. Our bench support multiple write operations in each transaction.

@ivanvc
Copy link
Member Author

ivanvc commented Jun 15, 2024

I did a PoC with our bench. Using the first benchmark identified by #720 (params: -profile-mode n -count 100000 -batch-size 25000). It looks more promising than using go test -bench.

Running 10 iterations, with no changes in the source code, the results are:

        │    BASE     │                HEAD                │
        │   sec/op    │   sec/op     vs base               │
Write     726.8n ± 3%   720.1n ± 4%       ~ (p=0.315 n=10)
Read      8.134n ± 3%   8.110n ± 0%       ~ (p=0.280 n=10)
geomean   76.89n        76.42n       -0.61%

Running 10 iterations against the commit before merging #741

        │    BASE     │                HEAD                 │
        │   sec/op    │   sec/op     vs base                │
Write     865.2n ± 5%   713.2n ± 3%  -17.56% (p=0.000 n=10)
Read      8.726n ± 5%   8.682n ± 4%        ~ (p=0.436 n=10)
geomean   86.89n        78.69n        -9.43%

@ahrtr
Copy link
Member

ahrtr commented Jun 24, 2024

Looks good. Regarding the parameters, please refer to #739 (comment). For example,

  • options.BatchSize: 10,000 (10K)
  • options.Iterations: 2,000,000 (2M)

@ivanvc
Copy link
Member Author

ivanvc commented Jun 26, 2024

Looks good. Regarding the parameters, please refer to #739 (comment). For example,

* options.BatchSize: 10,000 (10K)

* options.Iterations: 2,000,000 (2M)

@ahrtr, I updated the pull request with these changes and to use our cmd/bbolt bench. This is the summary for the job from this PR: https://github.com/etcd-io/bbolt/actions/runs/9687730093?pr=750

scripts/compare_benchmarks.sh Show resolved Hide resolved
.github/workflows/benchmark-pr.yaml Outdated Show resolved Hide resolved
.github/workflows/benchmark-pr.yaml Outdated Show resolved Hide resolved
scripts/compare_benchmarks.sh Outdated Show resolved Hide resolved
scripts/compare_benchmarks.sh Outdated Show resolved Hide resolved
This adds benchmarking using cmd/bbolt's bench, inspired on what it's
used in kube-state-matrics.

Co-authored-by: Manuel Rüger <manuel@rueg.eu>
Signed-off-by: Ivan Valdes <ivan@vald.es>

wip

Signed-off-by: Ivan Valdes <ivan@vald.es>
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Thank you! @ivanvc

cc @fuweid @tjungblu to take a look

@tjungblu
Copy link
Contributor

/lgtm
as well, thanks @ivanvc - let's see what we can find with it in the other PRs :)

@ahrtr ahrtr merged commit cc78a5a into etcd-io:main Jun 28, 2024
17 checks passed
@ivanvc ivanvc deleted the add-pr-benchmark branch June 28, 2024 15:51
@fuweid
Copy link
Member

fuweid commented Jul 2, 2024

LGTM

@ivanvc Thank you. It looks good to me.

        │    BASE     │                HEAD                │
        │   sec/op    │   sec/op     vs base               │
Write     1.133µ ± 3%   1.129µ ± 3%       ~ (p=0.540 n=10)
Read      16.68n ± 1%   16.73n ± 1%       ~ (p=0.402 n=10)
geomean   137.5n        137.4n       -0.01%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

6 participants