Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically determine whether to use offscreen surface based on need #13976

Merged
merged 11 commits into from
Nov 27, 2019

Conversation

flar
Copy link
Contributor

@flar flar commented Nov 22, 2019

Keep track of layers that perform operations that read back from the surface they are rendering onto (primarily BackdropFilter) and provide that information to the platform in AcquireSurface so that the appropriate (offscreen vs screen) layer type can be chosen.

This change should have a dramatic effect on the energy usage of simple apps on iOS when no BackdropFilter is present. There are some caveats though:

  • Obviously, if the app uses a BackdropFilter anywhere in the scene they will not benefit from this change.
  • Cupertino widgets use BackdropFilters by default in some cases and so many Cupertino apps will not benefit.
  • The best place to make the decision would be after a PreRoll pass since some conditions that may help us to eliminate the offscreen may not be best determined until the PreRoll can figure out which layers can be cached. Unfortunately, the choice of surface is made before PreRoll is called and it will require quite a bit of surgery to the basic frame pumping to be able to do this.

There is still some testing to be done, but the basic testing of the offscreen layer only being used when there is a BackdropFilter is complete (for 1 or 2 simple examples so far) and the energy usage measured by the Xcode tools has demonstrated an improvement on, for instance, a simple LinearProgressIndicator app.

This change should provide some needed relief for flutter/flutter#31865

@flar
Copy link
Contributor Author

flar commented Nov 22, 2019

Notes to self - to be tested:

  • Verify correct behavior for apps that use SceneBuilder.addRetained()
  • Verify no performance regression for BackdropFilters (Done - see comment below)
  • Double check energy savings with battery run-down tests (Done - see data in comment below
  • Gather numbers for energy usage from a variety of apps (LinearProgressIndicator app so far)
  • Write tests for the new status flags (Done - in latest commit)

Copy link
Member

@chinmaygarde chinmaygarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting and a very necessary optimization!

I was wondering why calculating if the tree reads from the surface could not be done in the preroll. I suppose thats because the preroll happens after the surface frame has been acquired (please correct me if I am wrong). But there should be a reason the preroll needs a surface frame. If we moved it to be before the frame was acquired, could this be moved into the preroll?

BTW, testing this may be a whole lot easier with the patch under review that greatly fleshes out the Flow test harness #13949.

flow/layers/container_layer.cc Outdated Show resolved Hide resolved
flow/layers/container_layer.h Outdated Show resolved Hide resolved
@flar
Copy link
Contributor Author

flar commented Nov 22, 2019

I was wondering why calculating if the tree reads from the surface could not be done in the preroll. I suppose thats because the preroll happens after the surface frame has been acquired (please correct me if I am wrong). But there should be a reason the preroll needs a surface frame. If we moved it to be before the frame was acquired, could this be moved into the preroll?

That is correct, and I alluded to that in the main description. Determining how to reorder those basic frame-pumping methods (Surface::AcquireFrame vs Layer::PreRoll) seemed outside the scope of this fix. And indeed just worrying about that issue had been keeping me from realizing and trying this slimmer version until now as I wasn't yet familiar enough with the engine to perform that kind of surgery. This approach, which can all happen during the scene building phase, allows us to get to 75-90% on the solution (depending on how much the caveats water down its effectiveness) without having to reorder the way that frames are pumped. I also think I've engineered it in a way that it can naturally flow into a more dynamic form that will let us determine the surface type after PreRoll once someone figures out how to move AcquireSurface to after PreRoll.

@flar
Copy link
Contributor Author

flar commented Nov 22, 2019

Testing on the Gallery example app in the flutter framework I can see the offscreen come and go depending on whether I am viewing a Cupertino example. Most of the app is written in Material, but there is a Cupertino section. The list of Cupertino widgets is also shown in a Material list, but if you click on, say, the first item in the list (Alerts) you are taken to a page implemented in a Cupertino Page widget and you can see the offscreen show up in the observatory until you leave the Cupertino page. This is due to some of the Cupertino widgets installing a BackdropFilter widget automatically.

You can tell if the offscreen is there by zooming in on the observatory. You will be looking for 2 or 3 boxes under the PipelineConsume box in the Pipeline Item section. You should always see a "LayerTree::Paint" event box followed by an "SkCanvas::Flush" event. If we are painting to an offscreen surface then you should also see a "CopyTextureOnscreen" event between them. If that event doesn't happen between the two, then we are rendering directly to the screen.

@flar flar requested a review from Hixie November 22, 2019 20:48
@flar
Copy link
Contributor Author

flar commented Nov 22, 2019

Some data from the simple LinearProgressIndicator app.

                         before fix      after fix
                         ----------      ---------
Xcode energy rating:     Very High          Low
(within/for)              < 40sec         > 10min

GPU gauge run 1:           14.2%           8.4%
GPU gauge run 2:           16.2%           8.4%
GPU gauge run 3:           15.6%           8.6%
GPU gauge run 4:           15.6%           8.6%
GPU gauge average:         15.4%           8.5%    (45% reduction)

CPU gauge run 1:           20.2%           16.5%
CPU gauge run 2:           19.7%           16.3%
CPU gauge run 3:           19.8%           17.2%
CPU gauge run 4:           20.0%           17.0%
CPU gauge average:         19.9%           16.8%   (16% reduction)

Combined GPU/CPU                                   (53% reduction)

Real-life battery drain:    40%             34%    (15% reduction
Compared to blank screen:   16%             10%    (37.5% reduction)

Source for the app:

    return MaterialApp(
      title: 'Demo',
      home: Scaffold(
        body: Center(
          child: LinearProgressIndicator(),
        ),
      ),
    );

flow/layers/container_layer.cc Show resolved Hide resolved
flow/layers/container_layer.cc Outdated Show resolved Hide resolved
flow/layers/container_layer.cc Outdated Show resolved Hide resolved
@flar
Copy link
Contributor Author

flar commented Nov 25, 2019

Here are the summaries from two runs of the backdrop_filter_perf benchmark.

The difference in average frame build time was 3us, about .5%.
The difference in average render time was .2ms, about .14%.
Both differences were well within run-to-run discrepancies.

baseline_summary:
  "average_frame_build_time_millis": 0.6772058823529411,
  "90th_percentile_frame_build_time_millis": 0.892,
  "99th_percentile_frame_build_time_millis": 0.993,
  "worst_frame_build_time_millis": 0.993,
  "missed_frame_build_budget_count": 0,
  "average_frame_rasterizer_time_millis": 145.6070909090909,
  "90th_percentile_frame_rasterizer_time_millis": 146.794,
  "99th_percentile_frame_rasterizer_time_millis": 147.628,
  "worst_frame_rasterizer_time_millis": 147.628,
  "missed_frame_rasterizer_budget_count": 33,
  "frame_count": 34,
layer-readback-flag_summary:
  "average_frame_build_time_millis": 0.680705882352941,
  "90th_percentile_frame_build_time_millis": 0.907,
  "99th_percentile_frame_build_time_millis": 1.112,
  "worst_frame_build_time_millis": 1.112,
  "missed_frame_build_budget_count": 0,
  "average_frame_rasterizer_time_millis": 145.40757575757576,
  "90th_percentile_frame_rasterizer_time_millis": 147.95,
  "99th_percentile_frame_rasterizer_time_millis": 148.248,
  "worst_frame_rasterizer_time_millis": 148.248,
  "missed_frame_rasterizer_budget_count": 33,
  "frame_count": 34,

@flar flar force-pushed the offscreen-based-on-layer-readback branch from 155eb4b to 2fd78cc Compare November 26, 2019 01:46
flow/layers/container_layer.cc Outdated Show resolved Hide resolved
flow/layers/layer.h Outdated Show resolved Hide resolved
flow/layers/layer.h Show resolved Hide resolved
flow/layers/layer.h Show resolved Hide resolved
flow/layers/container_layer.cc Outdated Show resolved Hide resolved
flow/layers/layer_unittests.cc Show resolved Hide resolved
@Hixie
Copy link
Contributor

Hixie commented Nov 27, 2019

Would be good to have some correctness tests (tests that verify we're outputting the right pixels) in dynamic situations (where we're dynamically changing the scene from one which can benefit from this optimization to one that can't and back again).

In general, this seems fine, with the caveat that I am not very familiar with this code.

@flar
Copy link
Contributor Author

flar commented Nov 27, 2019

Would be good to have some correctness tests (tests that verify we're outputting the right pixels) in dynamic situations (where we're dynamically changing the scene from one which can benefit from this optimization to one that can't and back again).

In general, this seems fine, with the caveat that I am not very familiar with this code.

Theoretically true, but the interactions don't change what we render, just the type of drawable chosen. None of our code really makes any different choices as a result and we have plenty of cases of using an offscreen to perform rendering that we are staking on Skia's reputation will be identical to the screen. Isn't this essentially covered by the many golden tests we have already in the system which are all based on iOS rendering to an offscreen previously and now those same golden image tests will be rendering to the screen directly? If there is a golden image failure then we would essentially have our answer.

Or am I misunderstanding the suggestion?

@liyuqian
Copy link
Contributor

Would be good to have some correctness tests (tests that verify we're outputting the right pixels) in dynamic situations (where we're dynamically changing the scene from one which can benefit from this optimization to one that can't and back again).
In general, this seems fine, with the caveat that I am not very familiar with this code.

Theoretically true, but the interactions don't change what we render, just the type of drawable chosen. None of our code really makes any different choices as a result and we have plenty of cases of using an offscreen to perform rendering that we are staking on Skia's reputation will be identical to the screen. Isn't this essentially covered by the many golden tests we have already in the system which are all based on iOS rendering to an offscreen previously and now those same golden image tests will be rendering to the screen directly? If there is a golden image failure then we would essentially have our answer.

Or am I misunderstanding the suggestion?

Here's a quick comment on golden test and iOS (I didn't look into the details of this discussion): all our current golden tests are using software backend, and are running on Linux instead of iOS. Hence golden tests probably aren't guarding the changes in this PR.

@flar
Copy link
Contributor Author

flar commented Nov 27, 2019

Here's a quick comment on golden test and iOS (I didn't look into the details of this discussion): all our current golden tests are using software backend, and are running on Linux instead of iOS. Hence golden tests probably aren't guarding the changes in this PR.

Wouldn't that also mean that they can't be used to test the changes in the PR in the first place, as was the suggestion?

Copy link
Contributor

@liyuqian liyuqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the great work! Now let's wait for merging, rolling, and seeing the CPU/GPU usage metrics to drop down!

Also, currently I don't think that we have a device lab benchmark that covers the BackdropFilter after saveLayer optimization. It might be nice to add a test to verify it, and make sure that no one would accidentally break this optimization in the future (created issue flutter/flutter#45731).

@liyuqian
Copy link
Contributor

Here's a quick comment on golden test and iOS (I didn't look into the details of this discussion): all our current golden tests are using software backend, and are running on Linux instead of iOS. Hence golden tests probably aren't guarding the changes in this PR.

Wouldn't that also mean that they can't be used to test the changes in the PR in the first place, as was the suggestion?

Correct, I don't think that we have any correctness test yet that exercise real-device GPU renderings. It was very difficult before @Piinks 's Skia gold work. Now with the Skia gold, I think that's possible. Although I'm not sure what's the priority of having that done.

@liyuqian
Copy link
Contributor

@flar : BTW, I think Ian's comment might be to design a performance test to verify that you can disable the optimization after introducing a BackdropFilter, and bring that optimization back when that BackdropFilter is gone. I believe the best place to have that test is in our device lab, instead of in the engine repo because we don't have iOS devices in the engine repo.

@flar
Copy link
Contributor Author

flar commented Nov 27, 2019

OK, I'm happy to address any testing of these changes under flutter/flutter#45731

@flar flar merged commit a86ef94 into flutter:master Nov 27, 2019
@Hixie
Copy link
Contributor

Hixie commented Nov 27, 2019

I just don't want there to be a bug where a particular sequence of events causes us to paint the wrong thing because we're optimising it away or some such. I highly doubt that we have any tests that try to explicitly exercise this optimisation.

@liyuqian
Copy link
Contributor

After a second thought, I think there might be a correctness test with some compromises:

  1. Write some kind of ShellTest that mocks the iOS setup to trigger UseOffscreenSurface. You can already see some tests in that cc file that construct a customized layer tree.
  2. The ShellTest will use SwiftShader to emulate GPU (OpenGL or Vulkan) draw calls on CPU, and we can take a surface snapshot for correctness comparison. This won't test the exact iOS behaviors, but at least it can guarantee that we're not having some very basic OpenGL issues.

CC @chinmaygarde for further comments as he's currently refactoring ShellTest.

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Nov 28, 2019
@chinmaygarde
Copy link
Member

Guidance on writing a test for this: After #14023, you should be able to create a shell by explicitly specifying kClientRenderingAPIOpenGL. This will create a ShellTestSurfaceGL which vends instances of GPUSurfaceGL. In the unit-test harness, IOSSurfaceGL and ShellTestSurfaceGL are analogous. The UseOffscreenSurface call is on the former and any test that exercises that this optimization is in effect will need to check the value of that accessor (among other things). Since both the SurfaceGL variants inherit from GPUSurfaceGLDelegate, you can move the UseOffscreenSurface call one level up. After this bit of setup, you can update the ShellTest fixture to give the test a chance to introspect the delegate before each frame (via a callback is fine). Thats where your test assertions will go. Next, since the shell_unittests harness is already capable of running Dart code, write two methods that build scenes with and without the need for a render buffer readback. In your assertions, make sure the UseOffscreenSurface is what you expect for each scene. You should be able to do this before #14023 lands as well but it will make the setup steps a whole lot easier. Since this optimization should not affect the pixels rendered (just performance on certain platforms), I am ambivalent about the value of comparing pixels. But, that should be very straightforward in the ShellTest harness as well. We do extensive pixel tests in the embedder_unittests harness and we can move the GTEST pixel assertion predicates to //flutter/testing to make that less verbose in shell_unittests as well.

I highly recommend testing this however. There are no tests introduced in this patch that will fail before this optimization but pass now (and will fail again in case of regression). Also, this is fairly intricate implementation detail that higher level tests will have a hard time authoring tests that reliably exercise the case being optimized. This optimization is also platform specific. We can still have performance tests but having another test that just reliably verifies that the optimization is in effect will surely make me feel a lot better.

skia-flutter-autoroll pushed a commit to flutter/flutter that referenced this pull request Nov 29, 2019
* a86ef94 Dynamically determine whether to use offscreen surface based on need (flutter/engine#13976)

* 0fc7867 Roll src/third_party/dart 96e7a4ff30..73fdf19b56 (3 commits) (flutter/engine#14063)

* 6c605f8 Fix fml_unittests (flutter/engine#14062)
flar added a commit that referenced this pull request Dec 2, 2019
liyuqian added a commit to liyuqian/engine that referenced this pull request Dec 3, 2019
liyuqian added a commit that referenced this pull request Dec 3, 2019
* Revert "Add flow test fixtures and tests (#13986)"

This reverts commit 620f528.

* Revert "Dynamically determine whether to use offscreen surface based on need (#13976)"

This reverts commit a86ef94.
filmil pushed a commit to filmil/engine that referenced this pull request Mar 13, 2020
* Revert "Add flow test fixtures and tests (flutter#13986)"

This reverts commit 620f528.

* Revert "Dynamically determine whether to use offscreen surface based on need (flutter#13976)"

This reverts commit a86ef94.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants