Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

25% regression in web_benchmarks_skwasm:bench_material3_scroll_semantics.skwasm.drawFrameDuration.average #144254

Closed
gaaclarke opened this issue Feb 27, 2024 · 7 comments
Assignees
Labels
c: regression It was better in the past than it is now dependency: dart Dart team may need to help us P0 Critical issues such as a build break or regression team-web Owned by Web platform team triaged-web Triaged by Web platform team

Comments

@gaaclarke
Copy link
Member

Introduced in #144208

skia perf

cc @eyebrowsoffire

@gaaclarke gaaclarke added c: regression It was better in the past than it is now P0 Critical issues such as a build break or regression team-web Owned by Web platform team labels Feb 27, 2024
@gaaclarke gaaclarke changed the title regression in web_benchmarks_skwasm:bench_material3_scroll_semantics.skwasm.drawFrameDuration.average 25% regression in web_benchmarks_skwasm:bench_material3_scroll_semantics.skwasm.drawFrameDuration.average Feb 27, 2024
@eyebrowsoffire
Copy link
Contributor

The PR that triggered this change was a engine roll with a number of changes. I ran benchmarks against a number of the intermediate commits in order to further isolate the issue, and it appears to have happened with this Dart SDK roll specifically: flutter/engine#50979

There are a few dart2wasm-specific changes in that roll, but I haven't yet isolated it down to a specific dart commit.

@eyebrowsoffire eyebrowsoffire added the dependency: dart Dart team may need to help us label Feb 27, 2024
@yjbanov yjbanov added the triaged-web Triaged by Web platform team label Feb 27, 2024
@mkustermann
Copy link
Member

mkustermann commented Feb 28, 2024

The "root cause" of the regression is according to my analysis - dart-lang/sdk@c625797 - which implemented proper dart:developer.

  • After that CL landed, the APIs in dart:developer actually do something as supposed to before where they were stubbed out.
  • The fact that this shows as 25% regression means that what this benchmark is measuring is to a large extend measurement logic itself (e.g. timline events etc) itself.
  • This can be seen in the profiles of before/after quite clearly: Timeline.startSync / Timeline.finishSync are high on the profile (e.g. called by PipelineOwner.flushSemantics)
  • When running the benchmark manually in the browser window, it jitters around, but doesn't actually sroll down and up
  • Given the above, I assume the flutter benchmarking system is benchmarking --profile mode and not --release mode?
  • Together this somewhat puts into question the usefulness of this benchmark.

In any case, I'll close this as working as intended.

@gaaclarke
Copy link
Member Author

The fact that this shows as 25% regression means that what this benchmark is measuring is to a large extend measurement logic itself (e.g. timline events etc) itself.
Given the above, I assume the flutter benchmarking system is benchmarking --profile mode and not --release mode?

I wouldn't dismiss a benchmark because it is measuring a large portion of profiling overhead. That is the point of --profile mode, to be profiled =) If that mode creates so much overhead that the profiles are no longer useful, that's a problem. Would it be prudent to revisit the performance of dart:developer on the web?

@eyebrowsoffire
Copy link
Contributor

It's not that the profiles are not useful, the baselines have just changed. I think that's perfectly reasonable. I think it's a good idea to just rebaseline this and triage the skia perf notification and close this bug. I just wanted to make sure we weren't actually regressing rendering performance before closing it.

@mkustermann
Copy link
Member

mkustermann commented Feb 29, 2024

I wouldn't dismiss a benchmark because it is measuring a large portion of profiling overhead.

If the program I want to measure behaves - when measuring it - 25% differently from real world, then that's a problem. The numbers just won't be representative of the real app (and we only care about the real app). When profiling & optimizing real world apps one needs to get metrics for the real app.

If that mode creates so much overhead that the profiles are no longer useful, that's a problem. Would it be prudent to revisit the performance of dart:developer on the web?

The problem may not be the dart:developer implementation per se, it just forwards calls to the browser APIs. It may be more due to the framework doing significantly more work in --profile mode than in --release mode. Could it be that we issue too many timeline events (given the work that happens in-between)?

Two more things

  • the actual benchmark in the web page doesn't seem to scroll properly (try running it) it just moves few pixels down & up, instead of scrolling new widgets in. Is this intentional?
  • the fact that the profile ran just fine before the CL, means that those timeline events aren't needed for the benchmark to run. Can those benchmarks run in --release mode instead?

@gaaclarke
Copy link
Member Author

Could it be that we issue too many timeline events (given the work that happens in-between)?

Maybe that's something we could help with? Elide timeline events that happen at too high of a frequency? I imagine what is an acceptable frequency will depend on the target platform which will be difficult of users to get right in all cases.

Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
c: regression It was better in the past than it is now dependency: dart Dart team may need to help us P0 Critical issues such as a build break or regression team-web Owned by Web platform team triaged-web Triaged by Web platform team
Projects
None yet
Development

No branches or pull requests

4 participants