Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

color_filter_and_fade_perf__timeline_summary regression worst_frame_rasterizer_time_millis #54095

Closed
kf6gpe opened this issue Apr 6, 2020 · 18 comments · Fixed by #54220
Closed
Assignees
Labels
c: performance Relates to speed or footprint issues (see "perf:" labels) c: regression It was better in the past than it is now perf: speed Performance issues related to (mostly rendering) speed team: benchmark Performance issues found by inspecting benchmarks team Infra upgrades, team productivity, code health, technical debt. See also team: labels.

Comments

@kf6gpe
Copy link
Contributor

kf6gpe commented Apr 6, 2020

Clear jump at this engine roll: #53890

Screen Shot 2020-04-06 at 7 58 17 AM

@kf6gpe kf6gpe added engine flutter/engine repository. See also e: labels. c: performance Relates to speed or footprint issues (see "perf:" labels) ⚠ TODAY perf: speed Performance issues related to (mostly rendering) speed team: benchmark Performance issues found by inspecting benchmarks labels Apr 6, 2020
@tvolkert
Copy link
Contributor

tvolkert commented Apr 6, 2020

@liyuqian @chinmaygarde thoughts?

@liyuqian
Copy link
Contributor

liyuqian commented Apr 6, 2020

The Clang upgrade flutter/engine#17483 (CC @dnfield ) might be the only PR in the engine roll that could have this much impact on this Android performance test? I'll try to reproduce this regression locally and check if this is an engine issue or an infra issue.

@gw280
Copy link
Contributor

gw280 commented Apr 6, 2020

I don't have an Android device I'm afraid :(

@chinmaygarde
Copy link
Member

I'll try to reproduce this regression locally and check if this is an engine issue or an infra issue.

@liyuqian I didn't review flutter/engine#17486 so not sure what the context was but if we are rendering to a texture now instead of a surface, the extra onscreen composition could totally cause this regression. I would try bisection from there. But yeah, the toolchain thing is a massive change too. But that would have affected absolutely all targets.

@liyuqian liyuqian self-assigned this Apr 6, 2020
@liyuqian
Copy link
Contributor

liyuqian commented Apr 6, 2020

I cannot locally reproduce this regression by running the test on 85664be (#53890) and its predecessor a8b3d1b. The worst frame time seems to be similar on those 2 commits.

Also considering that a later commit 29c8808 has a fast worst frame, I think this regression is caused by infra issues.

CC @godofredoc

@liyuqian liyuqian removed their assignment Apr 6, 2020
@chinmaygarde chinmaygarde removed the engine flutter/engine repository. See also e: labels. label Apr 6, 2020
@tvolkert
Copy link
Contributor

tvolkert commented Apr 6, 2020

@godofredoc can you see if there's some issue with the device running this benchmark?

@keyonghan
Copy link
Contributor

I am looking at this issue.

@tvolkert tvolkert assigned keyonghan and unassigned godofredoc Apr 6, 2020
@keyonghan
Copy link
Contributor

This task has been run across four agents (mac 17, 19, 20, and 21). There are no issue found on any of them. Since we are not able to capture android screen from a mac at this moment(#51802), we are not sure if any window pop up on these devices. For now, I issued a reboot for all devices and hosts.

@tvolkert
Copy link
Contributor

tvolkert commented Apr 6, 2020

Thanks @keyonghan!

@liyuqian if the values don't return to normal now that the hosts and devices have been rebooted, more investigation will be needed, since we've ruled out infra issues.

@tvolkert tvolkert assigned liyuqian and unassigned keyonghan Apr 6, 2020
@jason-simmons
Copy link
Member

I can reproduce this on a Moto G4 (comparing 85664be and a8b3d1b)

Trying to bisect now.

@jason-simmons
Copy link
Member

This started with flutter/engine@abc72933e

@iskakaushik

@tvolkert
Copy link
Contributor

tvolkert commented Apr 6, 2020

Can we revert and then re-apply with a fix?

@tvolkert tvolkert assigned iskakaushik and jason-simmons and unassigned liyuqian Apr 6, 2020
@liyuqian
Copy link
Contributor

liyuqian commented Apr 7, 2020

Thanks @jason-simmons ! With your discovery, I now figured out why I couldn't reproduce the regression early: --trace-startup is needed to see the regression. It seems that @iskakaushik 's commit flutter/engine@abc72933e broke some tracing contracts when --trace-startup is given.

Specifically, with --trace-startup (flutter drive --profile -t test_driver/color_filter_and_fade_perf.dart --trace-startup), we see frame times like

    609,
    372340,

which seems to be adding the time of 2 frames into 1.

Without --trace-startup (flutter drive --profile -t test_driver/color_filter_and_fade_perf.dart), it would be more like

    207123,
    188726,

See color_filter_with_or_without_trace_startup.zip for more details.

Hence I'm removing the TODAY label as this is not an actual performance regression, but more like a tracing issue.

@liyuqian liyuqian removed the ⚠ TODAY label Apr 7, 2020
@jason-simmons
Copy link
Member

There is definitely something wrong with the tracing format when the SceneDisplayLag event is added.

In the attached timeline, look at the events between timestamps 7267951896 and 7268116859. There are two GPURasterizer::Draw end events in a row, with a SceneDisplayLag event in between.

The two GPURasterizer::Draw end events are probably causing bad results in the benchmark script when it tries to find begin/end pairs in the timeline.

bad_timeline_1.json.txt

@iskakaushik
Copy link
Contributor

I think this is a problem with the way we are parsing the timeline events in the test rather than the timeline events itself. I'm working on fixing this along with taking a look at #54205

iskakaushik pushed a commit to iskakaushik/flutter that referenced this issue Apr 7, 2020
Parser would earlier alternate after finding the first
begin event, not it looks for pairs.

Fixes: flutter#54095
@jason-simmons
Copy link
Member

jason-simmons commented Apr 7, 2020

I'm still concerned that something is wrong with how the timeline events are being recorded.

I tried adding logging to runtime/vm/timeline.cc in Dart and found that some calls to Dart_TimelineEvent are being serialized out of order in the timeline JSON file downloaded by flutter_driver.

The reordered events happen when the SceneDisplayLag event is logged with a begin timestamp in the past. Prior to that event the rasterizer had called Dart_TimelineEvent to log a GPURasterizer::Draw begin event with a later timestamp.
But that GPURasterizer::Draw begin record shows up after the matching GPURasterizer::Draw end record in the JSON. That causes errors in the flutter_driver code that tries to match begin/end pairs.

iskakaushik pushed a commit to iskakaushik/engine that referenced this issue Apr 7, 2020
Working theory is that adding async events with begin times
in the past causes the timeline to skip some events.

See: flutter/flutter#54095
iskakaushik pushed a commit to iskakaushik/engine that referenced this issue Apr 7, 2020
Working theory is that adding async events with begin times
in the past causes the timeline to skip some events.

See: flutter/flutter#54095
@iskakaushik
Copy link
Contributor

iskakaushik commented Apr 7, 2020

I reached out to @rmacnak-google , It seems like the underlying issue is that as a client we need to sort the events on the timestamps before consuming them. Pasting my conversation here for posterity. Based on this I'm going to sort the events by their timestamp before exporting the json.

me: When we create timeline event with Dart_Timeline_Event_Async_Begin as the event type, is the event allowed to specify the timestamp that is in the past? In particular, does the caller need to do anything to ensure that the Dart_TimelineEvent implementation will buffer events that are logged out of order and then write them to JSON in sorted order?

ryan: The answer somewhat depends on which underlying recorder is being used. If the recorder is a VM buffer or Fuchsia tracing, then it should be fine if an event is from the past because the client will have to sort things anyway because of buffering. If the recorder is Android systrace (or iOS signpost, I think) then when we forward the event the underlying record does not take a timestamp as an argument and uses the current time instead.

me: Upon further investigating the attached trace file in #54095 (comment), it seems like the missing event between the two Es occurs after them

{
   "name":"GPURasterizer::Draw",
   "cat":"Embedder",
   "tid":13450,
   "pid":13434,
   "ts":7267952095,
   "tts":2246953,
   "ph":"B",
   "args":{}
}

iskakaushik pushed a commit to iskakaushik/engine that referenced this issue Apr 7, 2020
This event goes from now -> current vsync target time
to avoid the limitations as seen in flutter/flutter#54095 (comment)

This event also tags additional metadata to capture
`vsync_transitions_missed` considering the refresh rate of the display.
iskakaushik added a commit to flutter/engine that referenced this issue Apr 7, 2020
Working theory is that adding async events with begin times
in the past causes the timeline to skip some events.

See: flutter/flutter#54095
iskakaushik pushed a commit to iskakaushik/engine that referenced this issue Apr 7, 2020
This event goes from now -> current vsync target time
to avoid the limitations as seen in flutter/flutter#54095 (comment)

This event also tags additional metadata to capture
`vsync_transitions_missed` considering the refresh rate of the display.
iskakaushik added a commit to flutter/engine that referenced this issue Apr 8, 2020
This event goes from now -> current vsync target time
to avoid the limitations as seen in flutter/flutter#54095 (comment)

This event also tags additional metadata to capture
`vsync_transitions_missed` considering the refresh rate of the display.
iskakaushik added a commit that referenced this issue Apr 10, 2020
Parser would earlier alternate after finding the first
begin event, not it looks for pairs.

Fixes: #54095
@jmagman jmagman added the team Infra upgrades, team productivity, code health, technical debt. See also team: labels. label Apr 14, 2020
@liyuqian liyuqian added the c: regression It was better in the past than it is now label Apr 14, 2020
goderbauer pushed a commit to goderbauer/engine that referenced this issue Apr 16, 2020
Working theory is that adding async events with begin times
in the past causes the timeline to skip some events.

See: flutter/flutter#54095
goderbauer pushed a commit to goderbauer/engine that referenced this issue Apr 16, 2020
This event goes from now -> current vsync target time
to avoid the limitations as seen in flutter/flutter#54095 (comment)

This event also tags additional metadata to capture
`vsync_transitions_missed` considering the refresh rate of the display.
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
c: performance Relates to speed or footprint issues (see "perf:" labels) c: regression It was better in the past than it is now perf: speed Performance issues related to (mostly rendering) speed team: benchmark Performance issues found by inspecting benchmarks team Infra upgrades, team productivity, code health, technical debt. See also team: labels.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants