Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

99th_percentile_frame_rasterizer_time_millis in [flutter_gallery_ios__transition_perf, flutter_gallery__transition_perf_e2e_ios32] regression #68316

Closed
kf6gpe opened this issue Oct 16, 2020 · 24 comments
Labels
c: performance Relates to speed or footprint issues (see "perf:" labels) c: regression It was better in the past than it is now engine flutter/engine repository. See also e: labels. P0 Critical issues such as a build break or regression perf: speed Performance issues related to (mostly rendering) speed team: benchmark Performance issues found by inspecting benchmarks

Comments

@kf6gpe
Copy link
Contributor

kf6gpe commented Oct 16, 2020

https://flutter-flutter-perf.skia.org/e/?begin=1602654448&end=1602714273&keys=X1ff0283333c366cd3e7c124d9966ab3e&num_commits=50&request_type=1&xbaroffset=22001

Started at commit #68128, but I'm wondering if it could be due to #68148.

@kf6gpe kf6gpe added c: regression It was better in the past than it is now engine flutter/engine repository. See also e: labels. c: performance Relates to speed or footprint issues (see "perf:" labels) perf: speed Performance issues related to (mostly rendering) speed team: benchmark Performance issues found by inspecting benchmarks P0 Critical issues such as a build break or regression labels Oct 16, 2020
@gaaclarke
Copy link
Member

@jonahwilliams Is there an expected performance change from #68128?

@kf6gpe In this issue, too, you suspect the engine roll. Do you think there is an error in our performance tests where attribution is incorrectly recorded?

@jonahwilliams
Copy link
Member

The change linked only uninstalls applications after the driver script has run. It should not be affecting performance metrics

@gaaclarke
Copy link
Member

@jonahwilliams Unless the time it takes to uninstall the app is now being incorporated into the measurement. The only other explanation is that our performance tests are incorrect, right?

@jonahwilliams
Copy link
Member

The apps were being uninstalled before, so that seems unlikely. There was only a period of 5-10 commits where they weren't always uninstalled

@gaaclarke
Copy link
Member

This appears to be another case of misattribution. The commit that perf tool points to doesn't seem to create a problem when tested locally. Here were the results I got:

33903c1 "99th_percentile_frame_rasterizer_time_millis": 47.611,
767fb98 "99th_percentile_frame_rasterizer_time_millis": 47.24

@gaaclarke
Copy link
Member

gaaclarke commented Oct 16, 2020

I tested #68148 locally and saw a change of 5ms which doesn't match the 30ms increase shown in skia performance. I'll dig deeper.

ed2a480 - "99th_percentile_frame_rasterizer_time_millis": 46.346,
12bea61 - "99th_percentile_frame_rasterizer_time_millis": 41.092,

edit: 33903c1 is actually before ed2a480 and it measured locally 47.611 so there is no regression there and 41ms seems an outlier.

@gaaclarke
Copy link
Member

Hey @keyonghan I'm getting a 30ms jump in skia performance monitoring that I can't reproduce locally. Is there something messing with the devicelab device? Amber alert, etc?

@keyonghan
Copy link
Contributor

Unfortunately we currently have no way to remotely monitor/dismiss popped up windows for the ios32 device..

@jonahwilliams
Copy link
Member

What device are you running on locally @gaaclarke ?

@gaaclarke
Copy link
Member

iPhone 8 iOS 13.x, this is only on the 32bit iOS device? finger gun in mouth

@jonahwilliams
Copy link
Member

In that case you're probably going to see a difference in perf between the two device types

@gaaclarke
Copy link
Member

That's not what the graph is showing, the graph is showing that there is a regression for the 32bit device and 64bit device. Which makes it sound less likely to be a hardware issue, but still doesn't explain why I can't reproduce it locally.

@gaaclarke
Copy link
Member

gaaclarke commented Oct 16, 2020

Here's a recap of where we are:

  1. The author of the commit expressing the regression is confident that it isn't to blame. I agree it is very very unlikely the cause.
  2. Local testing of that commit doesn't produce the regression
  3. An engine roll was suspected, local testing shows no problem there
  4. The regression appears on 32bit devices and 64bit devices which seems to suggest it isn't a hardware problem with one device (things like amber alerts would show up on both devices, however). It also lessens the likelihood that the problem is device specific.

There is a chance that this problem for some reason don't show up in my iOS 13 iPhone 8 but does show up in both the devicelab devices. @keyonghan what is the 64bit device we are using in devicelab?

@gaaclarke gaaclarke changed the title 99th_percentile_frame_rasterizer_time_millis flutter_gallery__transition_perf_e2e_ios32flutter_gallery_ios__transition_perf regression 99th_percentile_frame_rasterizer_time_millis in [flutter_gallery_ios__transition_perf, flutter_gallery__transition_perf_e2e_ios32] regression Oct 16, 2020
@keyonghan
Copy link
Contributor

All other iphones are with 64bit (go/flutter-devicelab-hardware).

@gaaclarke
Copy link
Member

I'm not sure how to proceed here. The commit associated with the regression is one that the author doesn't believe caused a regression; I agree with the assessment; local testing doesn't show a regression, in fact it shows an improvement; yet the regression shows up on 2 of our devicelab units. I have no plausible explanation why the commit in question would potentially cause a regression let alone cause a regression on one type of device and not another.

I don't even know why the engine roll was suspected since it has no proximity to the appearance of the regression. If someone could suggest a plausible hypothesis for this regression I could continue exploring this otherwise I'm just stabbing around in the dark.

@gaaclarke
Copy link
Member

I'm going to revert #68128 and see if the number drop back down.

@jonahwilliams
Copy link
Member

Please do not revert devicelab stability fixes

@jonahwilliams
Copy link
Member

If the tests cannot function properly with the IPA/APK being uninstalled after each run, they need to be disabled.

@cbracken
Copy link
Member

cbracken commented Oct 16, 2020

The benchmarks clearly show a regression at that commit. Looking at it, I agree that if that commit is responsible then there's something wrong with the benchmark itself. Also agreed that benchmarks and tests should be leaving devices in the state that they found them so uninstalling at the end is very definitely the right behaviour.

We should revert the commit, let it sit for five or six commits to check the effect, then re-land the commit. If, during the revert period, we see the benchmarks revert back to where they were, we should investigate what's wrong with the benchmark itself.

Keeping the commit reverted is definitely not the right answer but verifying the benchmark behaviour during the revert period is incredibly useful in diagnosing this. We've definitely seen weirder things before.

@gaaclarke
Copy link
Member

You know, now that I think about it. If a regression in performance happened after we started uninstalling the app, one likely culprit would be deleting the shader cache which could result in slower performance. Thus a dip in performance would be expected.

@gaaclarke
Copy link
Member

I was finally able to reproduce this at 33903c1! it requires running the test multiple times, I went from ~46ms to ~6ms because of the shader cache.

So, this test needs the be re-baselined and we need to add a new test that measures performance with a full shader cache. I'll file a bug for that and close the one I created about misattribution. I'll also close the PR of the revert, thanks @jonahwilliams and @cbracken for the help. That one was a doozie.

@gaaclarke
Copy link
Member

@kf6gpe it looks like the tests auto rebaseline? Can you make sure that it's baselined correctly please.

@cbracken
Copy link
Member

Awesome find. Thanks for digging into this @gaaclarke !

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
c: performance Relates to speed or footprint issues (see "perf:" labels) c: regression It was better in the past than it is now engine flutter/engine repository. See also e: labels. P0 Critical issues such as a build break or regression perf: speed Performance issues related to (mostly rendering) speed team: benchmark Performance issues found by inspecting benchmarks
Projects
None yet
Development

No branches or pull requests

5 participants