Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abbreviate benchmark results posted to pull requests #6124

Merged
merged 3 commits into from
Jun 8, 2021

Conversation

antiagainst
Copy link
Contributor

We are accumulating more and more benchmarks. Posting all of the
results to the pull request can be a daunting amount of raw data.
Instead, we should categorize and sort them and just provide
abbreviated tables, leaving the full tables somewhere else.

This commit adjusts the pull request comment script with above
improvements. The full benchmark tables are posted as Github Gists,
which is still on GitHub infrastructure and provides nice markdown
rendering and such.

We are accumulating more and more benchmarks. Posting all of the
results to the pull request can be a daunting amount of raw data.
Instead, we should categorizea and sort them and just provide
abbreviated tables, leaving the full tables somewhere else.

This commit adjusts the pull request comment script with above
improvements. The full benchmark tables are posted as Github Gists,
which is still on GitHub infrastructure and provides nice markdown
rendering and such.
@antiagainst antiagainst added infrastructure Relating to build systems, CI, or testing infrastructure/benchmark Relating to benchmarking infrastructure (deprecated) buildkite:benchmark-android Deprecated. Please use benchmarks:android-* labels Jun 5, 2021
@antiagainst antiagainst requested a review from benvanik June 5, 2021 15:17
@antiagainst antiagainst added this to In progress in Benchmarking Flow Improvement via automation Jun 5, 2021
@google-cla google-cla bot added the cla: yes label Jun 5, 2021
@antiagainst antiagainst linked an issue Jun 5, 2021 that may be closed by this pull request
3 tasks
Copy link
Contributor Author

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abbreviated Benchmark Summary

@ commit 08438eb9ae9993195460e95e3fd8992f62a872cf (vs. base ea65a2e14accc3af40470499c2d13250ce71474f)

Regressed Benchmarks 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) 95 (vs. 81, 17.28%↑) 98 12
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A) 374 (vs. 342, 9.36%↑) 373 2
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) 239 (vs. 224, 6.70%↑) 236 27

Improved Benchmarks 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) 30 (vs. 34, 11.76%↓) 30 1

For more information:

@antiagainst
Copy link
Contributor Author

Well, I'm not expecting to see regressed/improved benchmarks for this change. Looks there are stability issue with the S20, which is at my desk without a cooling pad. Probably it gotten rebooted due to overheat after pinning down the GPU frequency. We will have a rooted phone in the lab soon and hopefully it's more stable there. But at least the above summary shows what I'd like to see. :)

Copy link
Contributor Author

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abbreviated Benchmark Summary

@ commit e845163a2331c939eeca8dc858441012d53d5e9b (vs. base ea65a2e14accc3af40470499c2d13250ce71474f)

Regressed Benchmarks 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A) 375 (vs. 342, 9.65%↑) 375 2
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) 236 (vs. 224, 5.36%↑) 234 18

Improved Benchmarks 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) 29 (vs. 34, 14.71%↓) 29 1
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) 413 (vs. 461, 10.41%↓) 408 48

For more information:

Copy link
Contributor

@iree-github-actions-bot iree-github-actions-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abbreviated Benchmark Summary

@ commit e58b79faa5a354cbd215128a5646a70a55a7a625 (vs. base 0dbd1ac8730c5c73a0ef74c888badddca1d2dde5)

Improved Benchmarks 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) 23 (vs. 28, 17.86%↓) 23 0
MobileNetV2 [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) 58 (vs. 66, 12.12%↓) 59 12
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) 429 (vs. 458, 6.33%↓) 424 40

[Top 3 out of 4 benchmark results showed]

For more information:

@benvanik
Copy link
Collaborator

benvanik commented Jun 8, 2021

why is the bot added to reviewers? that seems weird and will mess up dashboards for tracking things like whether a PR is blocked on a review (as the bot will always show as blocking).

could we replace existing comments instead of always appending? once you've pushed changes the prior results are just noise (and can still be viewed if they were replaced via the history) - this is much better than the full dump that is seen in #6143 but still feels like spam as it drowns out any human communication that may be happening on the PR. I'd not want to run benchmarks until right before submitting as I wouldn't want the trail of a dozen benchmarks forever in the PR history :(

@benvanik
Copy link
Collaborator

benvanik commented Jun 8, 2021

(randomly googling, https://github.com/peter-evans/find-comment seems to be useful for getting the existing comment-id to use with an update action like https://github.com/marketplace/actions/create-or-update-comment - don't know how its getting posted today though :)

@antiagainst
Copy link
Contributor Author

why is the bot added to reviewers? that seems weird and will mess up dashboards for tracking things like whether a PR is blocked on a review (as the bot will always show as blocking).

I believe it's automatically added when I call GitHub API to create an review using the bot's account. (The behavior isn't specified in the API doc though.)

The bot wasn't really added a "reviewer" as you don't find it in the "3 pending reviewers" at the end and you are request review from it again in the top right reviewers section (it right now does not have a tick ahead of it). I'm afraid anybody left a comment on the pull request will show up in the top right corner? Not super familiar with this part of GitHub. It's annoying; but I'm not sure how we can avoid that entirely. I can investigate deeper later.

could we replace existing comments instead of always appending? once you've pushed changes the prior results are just noise (and can still be viewed if they were replaced via the history) - this is much better than the full dump that is seen in #6143 but still feels like spam as it drowns out any human communication that may be happening on the PR.

Sure thing! We can always look into improve the flow better!

This commit is about categorizing and sorting multiple tables. I'd prefer to land it if it's a good improvement on itself. I can have followup targeted pull requests for those features. Promised. :)

I'd not want to run benchmarks until right before submitting as I wouldn't want the trail of a dozen benchmarks forever in the PR history :(

Right now this is also doable. You just need to add the buildkite:benchmark label when you are ready. Without it, the pull request won't be benchmarked.

(randomly googling, https://github.com/peter-evans/find-comment seems to be useful for getting the existing comment-id to use with an update action like https://github.com/marketplace/actions/create-or-update-comment - don't know how its getting posted today though :)

Those are beasts, each with 6000+ LOC. :D It's for sure supporting more features than we need. I think we can do simpler logic here: just list all reviews and filter with the bot's account and the "Abbreviated Benchmark Summary" keyword to figure out which one to update (or create a new one).

Need to do everything in buildkite though. But it's just a matter of figure out the GitHub APIs.

@benvanik
Copy link
Collaborator

benvanik commented Jun 8, 2021

I've never seen bots show up in the reviewers list on github before so I think it's something on our side configured weirdly.

@ScottTodd
Copy link
Collaborator

ScottTodd commented Jun 8, 2021

I've never seen bots show up in the reviewers list on github before so I think it's something on our side configured weirdly.

It is possible to add comments without leaving a "review".

The /reviews section of this code is suspect to me:
https://github.com/google/iree/blob/da141ef1ab287b004c4adfd8aa1e645316080cac/build_tools/android/post_benchmarks_as_pr_comment.py#L215-L218

https://docs.github.com/en/rest/reference/pulls#create-a-review-comment-for-a-pull-request

Creates a review comment in the pull request diff. To add a regular comment to a pull request timeline, see "Create an issue comment."

Issue numbers and PR numbers are roughly interchangeable, so a regular comment would POST to /repos/{owner}/{repo}/issues/{issue_number}/comments.

See also APIs for updating comments on reviews:
https://docs.github.com/en/rest/reference/pulls#update-a-review-comment-for-a-pull-request
edit: and on issue comments: https://docs.github.com/en/rest/reference/issues#update-an-issue-comment

@antiagainst
Copy link
Contributor Author

I've never seen bots show up in the reviewers list on github before so I think it's something on our side configured weirdly.

Also maybe it's because our bot (iree-github-actions-bot) is a "real" GitHub user account, which is different from bots like google-cla, which is a GitHub app?

@antiagainst
Copy link
Contributor Author

I'll merge this as-is: the concerns are addressed in a following #6145. :)

@antiagainst antiagainst merged commit 4775aff into iree-org:main Jun 8, 2021
Benchmarking Flow Improvement automation moved this from In progress to Done Jun 8, 2021
@antiagainst antiagainst deleted the abbr-pr-comment branch June 8, 2021 21:21
antiagainst added a commit that referenced this pull request Jun 9, 2021
This is a follow up to #6124,
where we categorized and sorted benchmark result tables.
But it's still noisy given we are posting many abbreviated
summary tables. This tries to reduce that by trying to find
the last previous comment from the bot's account and
update that.

Also, using issue API endpoints instead of pull request ones,
in the hope to avoid putting the bot as a "reviewer".
@NatashaKnk NatashaKnk mentioned this pull request Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
(deprecated) buildkite:benchmark-android Deprecated. Please use benchmarks:android-* infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Better filtering for benchmark results posted on pull requests
4 participants