Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track damage rect and only draw inside it #33939

Closed
liyuqian opened this issue Jun 5, 2019 · 85 comments
Closed

Track damage rect and only draw inside it #33939

liyuqian opened this issue Jun 5, 2019 · 85 comments
Labels
c: new feature Nothing broken; request for a new capability c: performance Relates to speed or footprint issues (see "perf:" labels) customer: dream (g3) engine flutter/engine repository. See also e: labels. framework flutter/packages/flutter repository. See also f: labels. P2 Important issues not at the top of the work list

Comments

@liyuqian
Copy link
Contributor

liyuqian commented Jun 5, 2019

Currently, Flutter redraws every pixel even if there's a very tiny part of the screen that's animating (e.g., CircularProgressIndicator, caret in TextField).

We should compute the damage rect (which shouldn't be hard due to our repaint mechanism), reuse the buffer from previous frames, and draw the new frame with the clip set to the damage rect.

Hopefully this could have a significant impact on CPU/GPU usages for issues like #31865

@liyuqian liyuqian added c: new feature Nothing broken; request for a new capability framework flutter/packages/flutter repository. See also f: labels. engine flutter/engine repository. See also e: labels. c: performance Relates to speed or footprint issues (see "perf:" labels) labels Jun 5, 2019
@liyuqian
Copy link
Contributor Author

liyuqian commented Jun 5, 2019

Just talked with @chinmaygarde on this issue. There might be blockers in OpenGL backends such as KHR_swap_buffers_with_damage is not available on some devices, and Android's public API (e.g., ANativeWindow) may not expose all things we need. So this is another incentive for us to push for Vulkan and Metal.

I'll talk to @drWulf for more details on the Android side.

@liyuqian
Copy link
Contributor Author

liyuqian commented Jun 7, 2019

Just talked with Derek (@drWulf) and got the following points:

  1. EGL_EXT_buffer_age will be the most useful thing. Using it, we can clip to the union of last buffer_age damage rects, and Skia will discard all draws outside of the clip. It's how Android's HWUI achieves its high performance.

  2. EGL_EXT_buffer_age is available on all recent Android devices. Overall, it's probably available for 80% of Android devices.

  3. It's unclear whether iOS has EGL_EXT_buffer_age, but Metal sure has its equivalence.

  4. EGL_KHR_partial_update and EGL_KHR_swap_buffers_with_damage may also be helpful, but much less than EGL_EXT_buffer_age.

  5. If the damage rect is too large (say 90% of the fullscreen), we shouldn't clip to it since the clip overhead will outweigh the saving.

It seems that we can get some significant performance gain for 80%+ Android devices (with EGL_EXT_buffer_age), and we'll bring it to all Metal iOS devices in the future.

cc @chinmaygarde

@drWulf
Copy link

drWulf commented Jun 7, 2019

I shouldn't be quoted on the 80% number. I'm aware of a variety of devices picking up EGL_EXT_buffer_age because of its benefits when running HWUI. I'm still trying to find accurate numbers for how many devices support that extension, but in any case on the newer devices that do the performance gains will be substantial.

@gaaclarke
Copy link
Member

@liyuqian Can you provide some profiling that shows the problem?

Here is a frame draw from the Metal backend:
Screen Shot 2019-06-11 at 12 27 34 PM It doesn't seem to be drawing more than it needs to.

@liyuqian
Copy link
Contributor Author

Due to #31865 (comment), this is unlikely to help #31865 much.

Instead, we should come up with a test case with a complex page that has only a blinking cursor. We'll aim to improve the performance of that page significantly using damage rects.

@liyuqian
Copy link
Contributor Author

liyuqian commented Jun 20, 2019

Alternatively, @flar may have a nice test case with BackdropFilter (#27677) where damage rects could be helpful.

@caijw
Copy link

caijw commented Aug 31, 2020

Any progress about tracking damage rect and drawing inside it?
I notice that Rasterizer already has LastLayerTree. Is it possible to diff LastLayerTree and CurrentLayerTree to track damage rect?

@knopp
Copy link
Member

knopp commented Sep 2, 2020

I don't think there is a metal API for partial updates, but partial updates can be approximated to a certain degree for both metal and opengl using IOSurface and multiple CALayers. The current situation where single spinner redraws entire 4K window is problematic when it comes to power consumption.

As far as I can tell EGL_EXT_buffer_age is not going to help figure out which part of layer tree has changed from last frame and need to be re-rendered, that still needs to be addressed, right?

@fredlee12345678
Copy link

@liyuqian do you have plan to this feature?

@flar
Copy link
Contributor

flar commented Sep 3, 2020

Last year I wrote up an analysis of the problems to solve to get to the point of tracking damage areas in the Flutter ecosystem and I'm looking at cleaning it up for publication in the next few weeks. To answer some quick questions:

  • While we do have a last layer tree, it isn't enough to determine the changes. Many of the components of the layer tree are opaque without a bit more work (i.e. Picture objects), and some layers may change identity between scenes even though the widget tree from which they came was stable so we'd have to do a deep content-based comparison rather than simply detecting which layers are new.

  • We can't assume that our screen buffer is stable between frames, so we'd have to potentially keep multiple previous layers and do a multi-frame diff. The platform can tell us how many buffers are in rotation and we need to be flexible to that.

  • Picture objects, which contain rendering commands like drawLine, etc. are unfortunately the least stable item in the tree as they are shared across whole swaths of the tree and any part of that tree that changes can cause all of them to repaint into a new Picture even if the resulting rendering command stream is identical.

Getting that document fleshed out and published is next on my plate after I finish a couple of other smaller tasks.

@knopp
Copy link
Member

knopp commented Sep 5, 2020

@flar, this has been bugging me for a while, and I had a bit of time to spend on it, so I pushed a proof of concept implementation in my fork. Right now it's limited to macOS, but it shows great potential with pathological cases such as the Progress Indicator example:

MacBook Pro i7, iGPU
Energy consumption idle: PKG ~3.7W
Progress Indicator example on 4K Stock Flutter: PKG ~10.5W
Progress Indicator example on 4K Damage Rects branch: PKG ~4.5W (!)

When implementing it I had 3 main usecases in mind:

  • Scrolling content in repaint boundary must only repaint scrolled area, not the area around
  • Things like progress indicators must not repaint the whole window (cupertino example needs repaint boundary btw)
  • Blinking text cursor must not repaint the whole window

Which pretty much means contents that is completely outside of repaint boundary is not touched at all (in most cases not even having to be clipped by Skia).

Couple of implementation remarks:

  • This implementation doesn't address cases where back buffer gets damaged during flip (in my case macOS simply blits from IOSurface, so it's not an issue). It should however be fairly easy to extend the implementation to calculate damage area from any arbitrary frame in past if needed.
  • There are two main layer types that actually contribute to contents on screen:
    • picture_layer: Flutter does pretty good job at caching Pictures in layers at Flutter side, if the render objects are not marked dirty. So for picture_layer, just comparing SkPicture for identity will get us quite far.
    • physical_shape_layer: These need to be actually compared for content, which is fairly easy
    • backdrop_filter_layer and image_filter_layer: Here the filters need to be compared; Right now the filters are serialized and the binary data is compared. Not sure if there's a better way.
  • This implementation considers layer between frames identical (which means they don't contribute to damage area) if all of these are true:
    • layer type matches
    • layer screen space coordinates match
    • layer content matches (see previous point)
    • layer mutator stack matches (so that we can detect change in clipping or opacity)
  • It also needs to be taken into account that:
    • backdrop_filter_layer affects screen space coordinates of layers below it (blurring it)
    • image_filter_layer affects screen space coordinates of child layers.
    • This is addressed in the implementation but needs to be tested more.

I can provide design doc if this seems useful.

Example screenshots showing actual screen updates using Quartz debug
Screenshot-DirtyRect-1
Screenshot-DirtyRect-2
Screenshot-DirtyRect-3

@flar
Copy link
Contributor

flar commented Sep 5, 2020

@knopp that sounds great so far. You might want to create a WIP PR with the changes so we can start a discussion about the implementation and what more might be needed. I'd love to see that design document as well.

How are you tracking changes (insertions, deletions, reorderings) to the children of a ContainerLayer?

@Noyon01631
Copy link

Good

@knopp
Copy link
Member

knopp commented Sep 5, 2020

@flar, the tracking is quite straightforward. It does some additional work during preroll in exchange for not having to diff a tree, just a set:

  • During preroll, every layer that draws any contents will report to DamageContext, which will build a set of LayerEntries for the frame.
  • each LayerEntry represents layer that draws content, has bounds in screen coordinates and equals to another LayerEntry iff:
    • it has equal bounds and
    • it has equal contents and
    • has same mutator stack

So we end up with set<LayerEntry> for every frame. To determine damage between two frames, we can simply do symmetric set difference of these two sets. This removes all layers that we know haven't changed between frames. The LayerEntries that we end up with are for layers that have either been added, removed, moved, or changed in some other way (i.e. mutators like alpha or different clipping shape). Each of these will contribute to damage area.

This has a nice benefit of easily being able to do diff between any frame in past for which we have set<LayerEntry> and current frame. Which may be handy for platforms that have multiple buffers.

Sidenote: If we knew that a layer (or at least a subrect) is fully opaque, during the first step we could easily determine occluded layers, which might be worth it for complex scenes.

@liyuqian
Copy link
Contributor Author

Thanks @knopp for implementing it! The overall direction looks right. Here are two suggestions that might be helpful:

1. Damage rect of two adjacent frames can derive the damage rect of multiple frames

Suppose there are frames F_0, F_1, F_2, ..., F_n. We only need to compute the damage rect D_1, D_2, ..., D_n where D_i is the damage rect between F_i and F_{i-1}. If one needs to figure out the overall damage rect among multiple frames F_a, F_{a+1}, ..., F_b, we can simply use the union of D_{a+1}, D_{a+2}, ..., D_b.

That's why we consider EGL_EXT_buffer_age useful. When F_n is rendered, we query the buffer age. Suppose the age is x. Then we need to compute the damage rect among F_{n-x}, F_{n-x+1}, ..., F_n and clip to it. We don't have to actually save all those frames F_i or their layers. We just store their damage rects, and return the union of D_{n-x+1}, D_{n-x+2}, ..., D_n.

Hopefully this

  • clarifies my original discussion with @drWulf :

    EGL_EXT_buffer_age will be the most useful thing. Using it, we can clip to the union of last buffer_age damage rects, and Skia
    will discard all draws outside of the clip. It's how Android's HWUI achieves its high performance.

  • and clears the confusion in

    As far as I can tell EGL_EXT_buffer_age is not going to help figure out which part of layer tree has changed from last frame and need to be re-rendered, that still needs to be addressed, right?

    (buffer age is not used to figure out the direct layer tree diff; it's used to calculate the union of damage rects.)

  • and answers @flar's question

    We can't assume that our screen buffer is stable between frames, so we'd have to potentially keep multiple previous layers and do a multi-frame diff. The platform can tell us how many buffers are in rotation and we need to be flexible to that.

    (we don't need to keep multiple previous layers; we just keep multiple previous damage rects)

2. Consider the two frame damage rect as a longest sub-sequence matching problem

As @flar pointed out in flutter/engine#21009, order matters so set comparison is insufficient. Hence we need to consider the sequence.

Consider the following classic algorithmic problem: given two strings (sequences of characters), what's the longest sub-sequence they could match. For example, "hello_world" and "hi_word" both matches a sub-sequence "h_wor". The unmatched is kind of like the damage "characters". Replace "characters" with "layers" would result in our problem.

In our case, we can flatten a layer tree to a layer sequence by pre-order traversals. Each layer can be abstracted as a rect and an id (all content, mutator stacks, ..., must match to have the same id). Then we want to compute the best layer sub-sequence match to minimize the damage rect which is the union of the rects of all unmatched layers.

An O(n^2) dynamic programming algorithm can be used to compute the optimal solution, or maybe we can use some O(n) greedy algorithms to compute an approximately good solution. I believe Flutter framework sometimes uses the following: match as many from the beginning, match as many from the end, and give up all the middle part. It would result in the linear reconciliation as described here. (CC @Hixie for confirmation.)

I think we can start with that simple linear greedy algorithm, and test the more sophisticated algorithm later once we have set up some integration tests (macrobenchmarks) as well as unit test benchmarks (microbenchmarks).

@knopp
Copy link
Member

knopp commented Sep 10, 2020

@liyuqian, thank you for the feedback. Regarding 2)

Any unmatched layer contributes to damage rect. For calculating of damage rect it doesn't really matter what order the unmatched layers were painted though, so here set comparison is enough.

The paint order is however relevant for layers that match between frames. If two layers intersect and match between frames, they need both need to contribute to damage area if they are painted in different order. I addressed that here after @flar's comment.

Can you give me any example of layer tree change between frames where this approach is not sufficient to calculate damage area?

Regarding 1) I agree that knowing the age of back buffer age is important, but I'm not sure that simply keeping the damage rect of frames in past is ideal. Suppose the following situation: You have 2 buffers, you add layer, show it for one frame and then remove it:

1 Empty Buffers

  BB        FB
+----+    +----+
|    |    |    |
+----|    +----|

2 Add layer and paint it

BB        FB
+----+    +----+
| XX |    |    |
+----|    +----|

3 Flip

BB        FB
+----+    +----+
|    |    | XX |
+----|    +----|

4 Remove layer, and paint it

BB        FB
+----+    +----+
|    |    | XX |
+----|    +----|

At step 4, if you do diff with previous frame + union with frame before that, you will need to paint the area below the removed layer. Whereas if you keep the FrameDescription for Frame n-2 you can diff it with current frame (n) and find out, that they are in fact identical and no painting needs to be done at all.

(this assumes situation where you flip entire framebuffers, with partial damage flip the situation is different)

@knopp
Copy link
Member

knopp commented Sep 10, 2020

Btw, I forgot to mention, in this branch , which is rebased on the wip pr, the macOS embedder is modified to only repaint damaged region and highlight the change (so it can be seen in practice, i.e. with flutter gallery)

@flar
Copy link
Contributor

flar commented Sep 10, 2020

It is true that simply accumulating the per-frame damage areas is enough for correctness, but it isn't optimal. However, I don't believe there are many practical cases where it is sub-optimal. The "layer comes and then goes" example defines the problem, but in practice layers don't appear and disappear in adjacent frames. Most layers that appear animate in and out so the appearance and disappearance are spread over a few frames that are going to be longer than the history of back buffers. I am not 100% certain on whether small elements like text cursors fade in or out, but even if they did no fading, their area is so small that the accumulation of damage and repainting just that tiny area would be fine.

Another technique that can be used is to render lightweight annotations like a text cursor on another layer and then we simply animate the appearance and disappearance of that additional item in the annotation layer without having to redraw the tree. Unfortunately, this constrains the app design as this technique doesn't work as well if the text cursor is on a layer that has other layers overlapping it.

@knopp
Copy link
Member

knopp commented Sep 10, 2020

Storing the frame description from past frames shouldn't bring much overhead, it does retain layers but judging from gallery most of them are either retained anyway or physicalshapelayers, which don't have lot of state. Diffing the tree is more or less linear operation (detecting reordered frames would be O(n^2) in worst case where all matching frames are painted in reverse order, which is unlikely to happen).

But I'm not feeling too strongly about this. The major usa cases are things like progress indicator in otherwise static page, scrolling small listview in large window or blinking text cursor, and I agree that here it won't make any meaningful difference.

@flar
Copy link
Contributor

flar commented Sep 10, 2020

One can treat re-ordered elements as "missing and new" flexibly in a single pass that only matches ordered elements. You have to use a "diff" algorithm because the first elements that you encounter that don't match could be due to a missing element in either tree and you can't determine that without some further traversal.

For a prototype that I wrote I basically just did a "match from the beginning + match from the end" on each container and treated all children between those points as damage. This is precisely what you need for an added child (or a few children added together) and precisely what you need for a deleted child (or a few children deleted together). I'm not sure that shuffling children is a common enough app design that we need to do better than that, especially for a first pass.

Here is a diff of the (massively incomplete) prototype I was working on last November (note that I believe that those changes to make ImageFilter comparable and therefore stable between frames are already in the tree pushed separately in preparation for this planned work)...
flutter/engine@master...flar:backdrop-dirty-region-tracking

@knopp
Copy link
Member

knopp commented Sep 10, 2020

@flar, in my implementation I first first detect which layer were added, removed or otherwise changed (i.e. moved). Those all contribute to dirty area. Those are layers for which we can't find matching LayerEntry between frames.

After that I'm left with two sets of layers, which have same elements, just the (paint) order is different. Determining which are reordered is simple, at that point there are no added / removed layers. knopp/engine@b50df38#diff-32e5e35893c1cb045caca1c8fcbcff8eR139

As for image filter / backdrop layer, I serialize the filter and compare serialized data to detect if layer is changed. I also adjust paint bounds of child layers (image filter) and layers below (backdrop filter) to account for additional area (i.e. blur)

@knopp
Copy link
Member

knopp commented Mar 21, 2021

@flar, I've added initial PR for wiring required for partial update on iOS and Android. There are couple if issue to resolve:

  1. How would we go about optionally enabling this? What would the configuration look like?
  2. I have an old android (6.0) device that claims to support all relevant GLES extension, but it glitches; so perhaps this should be limited to newer versions (i.e. API level 26+)
  3. EmbeddedExternalView unconditionally clears the surface without any clip. This interferes with partial redraw.

@flar
Copy link
Contributor

flar commented Mar 22, 2021

@knopp I would ask those questions in the PR. I've added some reviewers that I think might have some good answers for the various issues.

What glitches do you see on 6.0 devices?

@knopp
Copy link
Member

knopp commented Mar 22, 2021

@knopp I would ask those questions in the PR. I've added some reviewers that I think might have some good answers for the various issues.

What glitches do you see on 6.0 devices?

I only have one 6.0 device. The glitches I'm seeing is that the content outside of clip area (outside area specified by EGL_KHR_partial_update) is not preserved and turns into garbage after few frames. It doesn't happen in any of my other devices. I'll see if I can get another phone with Android 6 to test this.

@Hixie Hixie moved this from Reduce painting to Ongoing projects in Early-onset jank Mar 29, 2021
@Hixie Hixie moved this from Ongoing projects to On deck for Engine in Early-onset jank Mar 29, 2021
@kf6gpe
Copy link
Contributor

kf6gpe commented Jun 17, 2021

Escalating priority again so that the engine team can discuss when it meets next week per discussion with Google team interested in this work.

@kf6gpe kf6gpe added P2 and removed P3 Issues that are less important to the Flutter project labels Jun 17, 2021
@knopp
Copy link
Member

knopp commented Jun 17, 2021

Just an update on things DiffContext landed flutter/engine#21824 and the PR to wire is flutter/engine#25158, though I haven't had much time to test it after rebasing, and there are still unresolved issues, hence the WIP.

@kf6gpe
Copy link
Contributor

kf6gpe commented Jun 18, 2021

Excellent news! Thanks!

@chinmaygarde
Copy link
Member

Thanks for the update. Ray will work on gathering additional traces from the customer as well.

@chinmaygarde chinmaygarde added P2 Important issues not at the top of the work list and removed P2 labels Jun 21, 2021
@zanderso zanderso moved this from On deck for Engine to Ongoing projects in Early-onset jank Aug 29, 2021
@aytunch
Copy link

aytunch commented Dec 5, 2021

@knopp can we see this optimization with the next Flutter stable release?

Also do you have any statistics for fps, jank, cpu usage and battery consumption of the Flutter sample app before and after this PR?

I wonder if calculating the diff takes too much time or not.

Anyhow thank you very much for trying to increase performance on Flutter. We desperately need it because we are a video feed app and our battery usage is unacceptable:/

@linxuebin1990
Copy link
Member

@knopp @flar Why Disable partial repaint for devices older than Android 9 ?
https://github.com/flutter/engine/blob/main/shell/platform/android/android_context_gl.cc

   if (GetAPILevel() < 28) {
      // Disable partial repaint for devices older than Android 9. There
      // are old devices that have extensions below available but the
      // implementation causes glitches (i.e. Xperia Z3 with Android 6).
      partial_redraw_supported_ = false;
      return;
    }

We can see in the Android source code that Android 6.0 started to use eglSwapBuffersWithDamageKHR for swapbuffer. eglSetDamageRegionKHR was also introduced in Android 7.0.

eglSwapBuffersWithDamageKHR: https://cs.android.com/android/platform/superproject/+/android-6.0.0_r1:frameworks/base/libs/hwui/renderthread/EglManager.cpp;l=282
eglSetDamageRegionKHR: https://cs.android.com/android/platform/superproject/+/android-7.0.0_r1:frameworks/base/libs/hwui/renderthread/EglManager.cpp;l=319

@knopp
Copy link
Member

knopp commented Mar 24, 2022

@knopp @flar Why Disable partial repaint for devices older than Android 9 ?

I've had glitches with some devices on old android versions, but those seems to have been resolved with adding support for quadruple buffering and specifying larger clip alignment. Still, given the heterogeneity of android devices we decided to only enable it for newer API levels and possibly revisit the threshold later.

@linxuebin1990
Copy link
Member

linxuebin1990 commented Mar 24, 2022

@knopp @flar Why Disable partial repaint for devices older than Android 9 ?

I've had glitches with some devices on old android versions, but those seems to have been resolved with adding support for quadruple buffering and specifying larger clip alignment. Still, given the heterogeneity of android devices we decided to only enable it for newer API levels and possibly revisit the threshold later.

@knopp Thanks Reply.
This feature is very exciting, is it possible to add a parameter to enable or not set by the developer of flutter ?
For example, a certain flutter app can be tested on several models of the top and found that there is no problem, then this function can be turned on.

@Hixie
Copy link
Contributor

Hixie commented May 16, 2022

(As I understand it, the current status here is that this is implemented for Android and iOS but not yet the other platforms.)

@zanderso
Copy link
Member

From new feature request scan: The missing pieces on desktop platforms are around wiring up through the embedder API. Since we have issues filed for those elsewhere, I will close this issue.

@aytunch
Copy link

aytunch commented Nov 11, 2022

@zanderso Can you please link those issues so we can follow? Thanks

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
c: new feature Nothing broken; request for a new capability c: performance Relates to speed or footprint issues (see "perf:" labels) customer: dream (g3) engine flutter/engine repository. See also e: labels. framework flutter/packages/flutter repository. See also f: labels. P2 Important issues not at the top of the work list
Projects
Early-onset jank
Ongoing projects
Development

No branches or pull requests