Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Impeller] Improve performance of CupertinoPicker with opacity peephole #40101

Merged
merged 13 commits into from Mar 15, 2023

Conversation

jonahwilliams
Copy link
Member

@jonahwilliams jonahwilliams commented Mar 7, 2023

Implements an EntityPassDelegate that applies an opacity peephole optimization. This has been tuned to be extremely conservative, and does not successfully remove all opacity layers from the cupertino example linked. Skia on the other hand does manage to, so perhaps it is using more precise coverage information from the glyphs.

Impeller Before

Impeller After

image

Skia

image

flutter/flutter#121476

@jonahwilliams
Copy link
Member Author

TBD: whether we should do it like this, testing, et cetera

// command wrapped in save layer. This would indicate something like an
// Opacity or FadeTransition wrapping a very simple widget, like in the
// CupertinoPicker.
if (entity_pass->GetEntityCount() > 3) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have with working on this optimization now: subpasses are extra expensive right now on the metal backend due to the whole waitForScheduled, and also potentially due to lack of multithreaded encoding.

We might tune these thresholds too high and end up trading a bunch of raster time for decreasing returns of GPU utilization.

Copy link
Member

@bdero bdero Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! We're terribly underutilizing the GPU here. Sibling EntityPasses are fundamentally independent and can be evaluated in parallel on the CPU (not a huge impact in this case) and executed in parallel on the GPU (absolutely enormous impact in this case). Once we have a proper synchronization solution, we can remove all of the host<->device syncs that are blocking the CPU and saturate the GPU with parallelizable work.

Copy link
Member

@bdero bdero Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think we added texture atlas sharing between EntityPasses at some point. That (and any other resource sharing between EntityPasses) needs to be reverted, of course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get around that by making glyph atlas construction happen eagerly? Otherwise, glyph atlas construction and upload takes a substantial amount of time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, more specifically, can we push construction of the glyph atlas including all text frames into a different thread and then use whatever resource tracking mechanism we create to ensure the subsequent commands that depend on this are ordered correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get around that by making glyph atlas construction happen eagerly

Yeah I think that's probably the best option -- just render the whole thing at once and upload everything just before the EntityPass run. Then the uploaded atlases become read only for the frame and don't restrict parallelism.

@@ -52,6 +52,13 @@ class EntityPass {

void IterateAllEntities(const std::function<bool(Entity&)>& iterator);

/// @brief Iterates all entities on this pass, returning if there is a
/// subpass.
bool IterateAllFlatEntities(const std::function<bool(Entity&)>& iterator);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of an unnecessary restriction, but I want to avoid traversing the entire tree. Ideally if all children accepted opacity and there was a subpass, we could check if the subpass accepts opacity and just push it one layer down.

This would help cases where we've nested FadeTransitions/Opacity

/// a way that makes accepting opacity impossible. It is always safe
/// to return false, especially if computing overlap would be
/// computationally expensive.
virtual bool CanAcceptOpacity(const Entity& entity) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be a single call. Say "virtual bool InheritOpacity(...)". The return value can indicate if the operation succeeded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to make sure the entities in a pass aren't overlapping too, so we can't combine them. basically there would need to be two levels of checks:

  1. For each entity in the pass, it needs to be able to accept opacity and not overlap other entities.
  2. Within an entities contents, its geometry needs to be non-overlapping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@jonahwilliams
Copy link
Member Author

From investigation, its not that Skia has better coverage info, its that the display list heuristics don't consider overlap: flutter/flutter#122266

@jonahwilliams jonahwilliams marked this pull request as ready for review March 10, 2023 18:05
@jonahwilliams
Copy link
Member Author

I think this is ready for review. It works for the cupertino picker case, and we can continue to invest in it should we find additional cases where we create way too many subpasses

@jonahwilliams
Copy link
Member Author

well, rebasing from github is hard!

new_layer_pass.SetDelegate(
std::make_unique<PaintPassDelegate>(paint, bounds));

// Only apply opacity peephole on default blending.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this deserve a TODO/bug to extend this to other blend modes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can in general.

}
bool all_can_accept = true;
std::vector<Rect> all_coverages;
auto had_subpass = entity_pass->IterateAllFlatEntities([&](Entity& entity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid over-capturing (don't use [&]). Instead explicitly capture what you need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto maybe_coverage = contents->GetCoverage(entity);
if (maybe_coverage.has_value()) {
auto coverage = maybe_coverage.value();
for (const auto& cv : all_coverages) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this even simpler by just bailing out if we don't get the structure in the comments above (i.e. clip, draw, restore)?

A small change above could lead to a very expensive change down here without realizing it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a good way to check what kind of entity something is.

@@ -4,6 +4,7 @@

#include "impeller/entity/contents/text_contents.h"

#include <iostream>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <iostream>

Not sure if github is showing that right, but what I'm trying to articulate is that this line can be deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it!

return true;
}
auto glyph_positions = runs_[0].GetGlyphPositions();
if (glyph_positions.size() > 10) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

What happens if there are 100 TextFrame objects each with 10 glyphs? This seems like the wrong level for optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is more than 1 run or more than 10 glyphs, we assume that its not worth computing the overlap - instead just assume the text frame has overlapping contents and don't apply the optimization. Seems reasonable to me?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why would we bail on 1 run that's 11 glyphs long, but chug along merrily on 1000 runs that are 10 glyphs long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L51, we don't

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can't get multiple TextFrames in a single pass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure what you mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, we need to do these checks because text rendering behaves differently than solid color fills. The optimization has a limitation in the number of entities we check per frame which is now documented on the PaintPassDelegate

@jonahwilliams
Copy link
Member Author

This also helps two places in the gallery:

  • Crane: Each image fades in individually. With this optimization that takes no sub passes
  • Rally: several of the icons have opacity applied when not focused. Small subpasses, but still good to get.

Copy link
Member

@bdero bdero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -52,6 +52,13 @@ class EntityPass {

void IterateAllEntities(const std::function<bool(Entity&)>& iterator);

/// @brief Iterates all entities on this pass, returning if there is a
/// subpass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Iterate entities in this pass up until the first subpass is found. This is useful for limiting look-ahead optimizations."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -52,6 +52,13 @@ class EntityPass {

void IterateAllEntities(const std::function<bool(Entity&)>& iterator);

/// @brief Iterates all entities on this pass, returning if there is a
/// subpass.
bool IterateAllFlatEntities(const std::function<bool(Entity&)>& iterator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, but maybe rename to "IterateUntilSubpass"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jonahwilliams jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label Mar 15, 2023
@auto-submit auto-submit bot merged commit 405c851 into flutter:main Mar 15, 2023
36 checks passed
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 15, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 15, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 15, 2023
@dnfield
Copy link
Contributor

dnfield commented Mar 15, 2023

(My concerns have been addressed)

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 15, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 15, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 15, 2023
zanderso pushed a commit to flutter/flutter that referenced this pull request Mar 15, 2023
…122746)

* 3c3cbc738 Manual roll Dart SDK from c766fffb626e to 3b109a04f783 (9 revisions) (flutter/engine#40296)

* dee0ae590 [Impeller] Make matrix image filters work as expected with nested saving layers (flutter/engine#40299)

* 835759f4c Roll Skia from 4d90ba479527 to 3b9131c65c01 (5 revisions) (flutter/engine#40304)

* bbde3a77b Roll Fuchsia Linux SDK from BRE9jdqYpdkbU0j7H... to YaWqKKuj-fAqfpKCm... (flutter/engine#40306)

* 1a7e7c468 Roll Dart SDK from 3b109a04f783 to 5c210933cdfe (2 revisions) (flutter/engine#40307)

* 66a3324cf Analyze more shaders (flutter/engine#40285)

* 405c8513e [Impeller] Improve performance of CupertinoPicker with opacity peephole (flutter/engine#40101)

* 9fc3246bf Reland "Make FlutterTest the default test font" (#40188) (flutter/engine#40245)

* 3ed9f1236 Reland: "Added wide-gamut color support for `ui.Image.toByteData` and `ui.Image.colorSpace`" (flutter/engine#40312)

* e143b309c Bump lower Dart SDK constraints to 3.0 (flutter/engine#40178)

* 086b1a022 [impeller] implement GetPositionUVBuffer (flutter/engine#40248)

* ec151bf2c Revert Dart SDK to c766fffb626e (flutter/engine#40315)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autosubmit Merge PR when tree becomes green via auto submit App e: impeller
Projects
No open projects
Archived in project
4 participants