Abbreviate benchmark results posted to pull requests #6124

antiagainst · 2021-06-05T15:17:46Z

We are accumulating more and more benchmarks. Posting all of the
results to the pull request can be a daunting amount of raw data.
Instead, we should categorize and sort them and just provide
abbreviated tables, leaving the full tables somewhere else.

This commit adjusts the pull request comment script with above
improvements. The full benchmark tables are posted as Github Gists,
which is still on GitHub infrastructure and provides nice markdown
rendering and such.

We are accumulating more and more benchmarks. Posting all of the results to the pull request can be a daunting amount of raw data. Instead, we should categorizea and sort them and just provide abbreviated tables, leaving the full tables somewhere else. This commit adjusts the pull request comment script with above improvements. The full benchmark tables are posted as Github Gists, which is still on GitHub infrastructure and provides nice markdown rendering and such.

antiagainst

Abbreviated Benchmark Summary

@ commit 08438eb9ae9993195460e95e3fd8992f62a872cf (vs. base ea65a2e14accc3af40470499c2d13250ce71474f)

Regressed Benchmarks 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	95 (vs. 81, 17.28%↑)	98	12
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	374 (vs. 342, 9.36%↑)	373	2
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	239 (vs. 224, 6.70%↑)	236	27

Improved Benchmarks 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	30 (vs. 34, 11.76%↓)	30	1

For more information:

antiagainst · 2021-06-05T16:01:02Z

Well, I'm not expecting to see regressed/improved benchmarks for this change. Looks there are stability issue with the S20, which is at my desk without a cooling pad. Probably it gotten rebooted due to overheat after pinning down the GPU frequency. We will have a rooted phone in the lab soon and hopefully it's more stable there. But at least the above summary shows what I'd like to see. :)

antiagainst

Abbreviated Benchmark Summary

@ commit e845163a2331c939eeca8dc858441012d53d5e9b (vs. base ea65a2e14accc3af40470499c2d13250ce71474f)

Regressed Benchmarks 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	375 (vs. 342, 9.65%↑)	375	2
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	236 (vs. 224, 5.36%↑)	234	18

Improved Benchmarks 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	29 (vs. 34, 14.71%↓)	29	1
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	413 (vs. 461, 10.41%↓)	408	48

For more information:

iree-github-actions-bot

Abbreviated Benchmark Summary

@ commit e58b79faa5a354cbd215128a5646a70a55a7a625 (vs. base 0dbd1ac8730c5c73a0ef74c888badddca1d2dde5)

Improved Benchmarks 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	23 (vs. 28, 17.86%↓)	23	0
MobileNetV2 [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	58 (vs. 66, 12.12%↓)	59	12
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	429 (vs. 458, 6.33%↓)	424	40

[Top 3 out of 4 benchmark results showed]

For more information:

benvanik · 2021-06-08T15:09:21Z

why is the bot added to reviewers? that seems weird and will mess up dashboards for tracking things like whether a PR is blocked on a review (as the bot will always show as blocking).

could we replace existing comments instead of always appending? once you've pushed changes the prior results are just noise (and can still be viewed if they were replaced via the history) - this is much better than the full dump that is seen in #6143 but still feels like spam as it drowns out any human communication that may be happening on the PR. I'd not want to run benchmarks until right before submitting as I wouldn't want the trail of a dozen benchmarks forever in the PR history :(

benvanik · 2021-06-08T15:16:03Z

(randomly googling, https://github.com/peter-evans/find-comment seems to be useful for getting the existing comment-id to use with an update action like https://github.com/marketplace/actions/create-or-update-comment - don't know how its getting posted today though :)

antiagainst · 2021-06-08T16:18:02Z

why is the bot added to reviewers? that seems weird and will mess up dashboards for tracking things like whether a PR is blocked on a review (as the bot will always show as blocking).

I believe it's automatically added when I call GitHub API to create an review using the bot's account. (The behavior isn't specified in the API doc though.)

The bot wasn't really added a "reviewer" as you don't find it in the "3 pending reviewers" at the end and you are request review from it again in the top right reviewers section (it right now does not have a tick ahead of it). I'm afraid anybody left a comment on the pull request will show up in the top right corner? Not super familiar with this part of GitHub. It's annoying; but I'm not sure how we can avoid that entirely. I can investigate deeper later.

could we replace existing comments instead of always appending? once you've pushed changes the prior results are just noise (and can still be viewed if they were replaced via the history) - this is much better than the full dump that is seen in #6143 but still feels like spam as it drowns out any human communication that may be happening on the PR.

Sure thing! We can always look into improve the flow better!

This commit is about categorizing and sorting multiple tables. I'd prefer to land it if it's a good improvement on itself. I can have followup targeted pull requests for those features. Promised. :)

I'd not want to run benchmarks until right before submitting as I wouldn't want the trail of a dozen benchmarks forever in the PR history :(

Right now this is also doable. You just need to add the buildkite:benchmark label when you are ready. Without it, the pull request won't be benchmarked.

(randomly googling, https://github.com/peter-evans/find-comment seems to be useful for getting the existing comment-id to use with an update action like https://github.com/marketplace/actions/create-or-update-comment - don't know how its getting posted today though :)

Those are beasts, each with 6000+ LOC. :D It's for sure supporting more features than we need. I think we can do simpler logic here: just list all reviews and filter with the bot's account and the "Abbreviated Benchmark Summary" keyword to figure out which one to update (or create a new one).

Need to do everything in buildkite though. But it's just a matter of figure out the GitHub APIs.

benvanik · 2021-06-08T16:21:44Z

I've never seen bots show up in the reviewers list on github before so I think it's something on our side configured weirdly.

ScottTodd · 2021-06-08T16:30:08Z

I've never seen bots show up in the reviewers list on github before so I think it's something on our side configured weirdly.

It is possible to add comments without leaving a "review".

The /reviews section of this code is suspect to me:
https://github.com/google/iree/blob/da141ef1ab287b004c4adfd8aa1e645316080cac/build_tools/android/post_benchmarks_as_pr_comment.py#L215-L218

https://docs.github.com/en/rest/reference/pulls#create-a-review-comment-for-a-pull-request

Creates a review comment in the pull request diff. To add a regular comment to a pull request timeline, see "Create an issue comment."

Issue numbers and PR numbers are roughly interchangeable, so a regular comment would POST to /repos/{owner}/{repo}/issues/{issue_number}/comments.

See also APIs for updating comments on reviews:
https://docs.github.com/en/rest/reference/pulls#update-a-review-comment-for-a-pull-request
edit: and on issue comments: https://docs.github.com/en/rest/reference/issues#update-an-issue-comment

antiagainst · 2021-06-08T16:30:57Z

I've never seen bots show up in the reviewers list on github before so I think it's something on our side configured weirdly.

Also maybe it's because our bot (iree-github-actions-bot) is a "real" GitHub user account, which is different from bots like google-cla, which is a GitHub app?

antiagainst · 2021-06-08T21:20:14Z

I'll merge this as-is: the concerns are addressed in a following #6145. :)

This is a follow up to #6124, where we categorized and sorted benchmark result tables. But it's still noisy given we are posting many abbreviated summary tables. This tries to reduce that by trying to find the last previous comment from the bot's account and update that. Also, using issue API endpoints instead of pull request ones, in the hope to avoid putting the bot as a "reviewer".

antiagainst added infrastructure Relating to build systems, CI, or testing infrastructure/benchmark Relating to benchmarking infrastructure (deprecated) buildkite:benchmark-android Deprecated. Please use benchmarks:android-* labels Jun 5, 2021

antiagainst requested a review from benvanik June 5, 2021 15:17

antiagainst added this to In progress in Benchmarking Flow Improvement via automation Jun 5, 2021

google-cla bot added the cla: yes label Jun 5, 2021

antiagainst mentioned this pull request Jun 5, 2021

Deduplicate artifacts in generated benchmark suites #6108

Merged

antiagainst linked an issue Jun 5, 2021 that may be closed by this pull request

Better filtering for benchmark results posted on pull requests #6109

Closed

3 tasks

antiagainst commented Jun 5, 2021

View reviewed changes

Print percentage for similar benchmarks too

e845163

antiagainst commented Jun 5, 2021

View reviewed changes

Use iree-github-actions-bot account

e58b79f

iree-github-actions-bot reviewed Jun 7, 2021

View reviewed changes

antiagainst enabled auto-merge (squash) June 7, 2021 15:33

antiagainst disabled auto-merge June 8, 2021 12:08

antiagainst requested review from hanhanW and ScottTodd June 8, 2021 12:53

antiagainst mentioned this pull request Jun 8, 2021

Re-enable operand fusion for Vulkan GPU benchmarks #6143

Merged

antiagainst mentioned this pull request Jun 8, 2021

Create or update comments for benchmark results #6145

Merged

ScottTodd approved these changes Jun 8, 2021

View reviewed changes

antiagainst merged commit 4775aff into iree-org:main Jun 8, 2021

Benchmarking Flow Improvement automation moved this from In progress to Done Jun 8, 2021

antiagainst deleted the abbr-pr-comment branch June 8, 2021 21:21

NatashaKnk mentioned this pull request Jun 9, 2021

Merge main -> google #6164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abbreviate benchmark results posted to pull requests #6124

Abbreviate benchmark results posted to pull requests #6124

antiagainst commented Jun 5, 2021

antiagainst left a comment

antiagainst commented Jun 5, 2021

antiagainst left a comment

iree-github-actions-bot left a comment

benvanik commented Jun 8, 2021

benvanik commented Jun 8, 2021

antiagainst commented Jun 8, 2021

benvanik commented Jun 8, 2021

ScottTodd commented Jun 8, 2021 •

edited

Loading

antiagainst commented Jun 8, 2021

antiagainst commented Jun 8, 2021

Abbreviate benchmark results posted to pull requests #6124

Abbreviate benchmark results posted to pull requests #6124

Conversation

antiagainst commented Jun 5, 2021

antiagainst left a comment

Choose a reason for hiding this comment

Abbreviated Benchmark Summary

Regressed Benchmarks 🚩

Improved Benchmarks 🎉

antiagainst commented Jun 5, 2021

antiagainst left a comment

Choose a reason for hiding this comment

Abbreviated Benchmark Summary

Regressed Benchmarks 🚩

Improved Benchmarks 🎉

iree-github-actions-bot left a comment

Choose a reason for hiding this comment

Abbreviated Benchmark Summary

Improved Benchmarks 🎉

benvanik commented Jun 8, 2021

benvanik commented Jun 8, 2021

antiagainst commented Jun 8, 2021

benvanik commented Jun 8, 2021

ScottTodd commented Jun 8, 2021 • edited Loading

antiagainst commented Jun 8, 2021

antiagainst commented Jun 8, 2021

ScottTodd commented Jun 8, 2021 •

edited

Loading