-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abbreviate benchmark results posted to pull requests #6124
Conversation
We are accumulating more and more benchmarks. Posting all of the results to the pull request can be a daunting amount of raw data. Instead, we should categorizea and sort them and just provide abbreviated tables, leaving the full tables somewhere else. This commit adjusts the pull request comment script with above improvements. The full benchmark tables are posted as Github Gists, which is still on GitHub infrastructure and provides nice markdown rendering and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abbreviated Benchmark Summary
@ commit 08438eb9ae9993195460e95e3fd8992f62a872cf (vs. base ea65a2e14accc3af40470499c2d13250ce71474f)
Regressed Benchmarks 🚩
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
MobileNetV3Small [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) | 95 (vs. 81, 17.28%↑) | 98 | 12 |
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A) | 374 (vs. 342, 9.36%↑) | 373 | 2 |
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) | 239 (vs. 224, 6.70%↑) | 236 | 27 |
Improved Benchmarks 🎉
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) | 30 (vs. 34, 11.76%↓) | 30 | 1 |
For more information:
Well, I'm not expecting to see regressed/improved benchmarks for this change. Looks there are stability issue with the S20, which is at my desk without a cooling pad. Probably it gotten rebooted due to overheat after pinning down the GPU frequency. We will have a rooted phone in the lab soon and hopefully it's more stable there. But at least the above summary shows what I'd like to see. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abbreviated Benchmark Summary
@ commit e845163a2331c939eeca8dc858441012d53d5e9b (vs. base ea65a2e14accc3af40470499c2d13250ce71474f)
Regressed Benchmarks 🚩
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A) | 375 (vs. 342, 9.65%↑) | 375 | 2 |
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) | 236 (vs. 224, 5.36%↑) | 234 | 18 |
Improved Benchmarks 🎉
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) | 29 (vs. 34, 14.71%↓) | 29 | 1 |
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) | 413 (vs. 461, 10.41%↓) | 408 | 48 |
For more information:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abbreviated Benchmark Summary
@ commit e58b79faa5a354cbd215128a5646a70a55a7a625 (vs. base 0dbd1ac8730c5c73a0ef74c888badddca1d2dde5)
Improved Benchmarks 🎉
Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
---|---|---|---|
MobileNetV3Small [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) | 23 (vs. 28, 17.86%↓) | 23 | 0 |
MobileNetV2 [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77) | 58 (vs. 66, 12.12%↓) | 59 | 12 |
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) | 429 (vs. 458, 6.33%↓) | 424 | 40 |
[Top 3 out of 4 benchmark results showed]
For more information:
why is the bot added to reviewers? that seems weird and will mess up dashboards for tracking things like whether a PR is blocked on a review (as the bot will always show as blocking). could we replace existing comments instead of always appending? once you've pushed changes the prior results are just noise (and can still be viewed if they were replaced via the history) - this is much better than the full dump that is seen in #6143 but still feels like spam as it drowns out any human communication that may be happening on the PR. I'd not want to run benchmarks until right before submitting as I wouldn't want the trail of a dozen benchmarks forever in the PR history :( |
(randomly googling, https://github.com/peter-evans/find-comment seems to be useful for getting the existing comment-id to use with an update action like https://github.com/marketplace/actions/create-or-update-comment - don't know how its getting posted today though :) |
I believe it's automatically added when I call GitHub API to create an review using the bot's account. (The behavior isn't specified in the API doc though.) The bot wasn't really added a "reviewer" as you don't find it in the "3 pending reviewers" at the end and you are request review from it again in the top right reviewers section (it right now does not have a tick ahead of it). I'm afraid anybody left a comment on the pull request will show up in the top right corner? Not super familiar with this part of GitHub. It's annoying; but I'm not sure how we can avoid that entirely. I can investigate deeper later.
Sure thing! We can always look into improve the flow better! This commit is about categorizing and sorting multiple tables. I'd prefer to land it if it's a good improvement on itself. I can have followup targeted pull requests for those features. Promised. :)
Right now this is also doable. You just need to add the
Those are beasts, each with 6000+ LOC. :D It's for sure supporting more features than we need. I think we can do simpler logic here: just list all reviews and filter with the bot's account and the "Abbreviated Benchmark Summary" keyword to figure out which one to update (or create a new one). Need to do everything in buildkite though. But it's just a matter of figure out the GitHub APIs. |
I've never seen bots show up in the reviewers list on github before so I think it's something on our side configured weirdly. |
It is possible to add comments without leaving a "review". The https://docs.github.com/en/rest/reference/pulls#create-a-review-comment-for-a-pull-request
Issue numbers and PR numbers are roughly interchangeable, so a regular comment would POST to See also APIs for updating comments on reviews: |
Also maybe it's because our bot (iree-github-actions-bot) is a "real" GitHub user account, which is different from bots like google-cla, which is a GitHub app? |
I'll merge this as-is: the concerns are addressed in a following #6145. :) |
This is a follow up to #6124, where we categorized and sorted benchmark result tables. But it's still noisy given we are posting many abbreviated summary tables. This tries to reduce that by trying to find the last previous comment from the bot's account and update that. Also, using issue API endpoints instead of pull request ones, in the hope to avoid putting the bot as a "reviewer".
We are accumulating more and more benchmarks. Posting all of the
results to the pull request can be a daunting amount of raw data.
Instead, we should categorize and sort them and just provide
abbreviated tables, leaving the full tables somewhere else.
This commit adjusts the pull request comment script with above
improvements. The full benchmark tables are posted as Github Gists,
which is still on GitHub infrastructure and provides nice markdown
rendering and such.