[Impeller] Encode render passes concurrently on iOS. #42028

jonahwilliams · 2023-05-14T20:12:06Z

Allows pushing encoding of command buffers to a worker thread, relying on the fact that these buffers are always scheduled in the order that they are enqueued. This follows the guidelines from https://developer.apple.com/documentation/metal/mtlcommandbuffer?language=objc

jonahwilliams · 2023-05-14T20:12:46Z

impeller/playground/backend/metal/playground_impl_mtl.mm

@@ -63,16 +63,18 @@
 PlaygroundImplMTL::PlaygroundImplMTL(PlaygroundSwitches switches)
    : PlaygroundImpl(switches),
      handle_(nullptr, &DestroyWindowHandle),
-      data_(std::make_unique<Data>()) {
+      data_(std::make_unique<Data>()),
+      concurrent_loop_(fml::ConcurrentMessageLoop::Create()) {


This is more or less duplicating the problem that the VKContext has wherein we should only be creating a single concurrent message loop once per engine.

@jason-simmons @gaaclarke fyi

FWIW We switched around ContextVK so that it is made to support sharing a concurrent message loop. There's no known problem. This came from Chinmay's design goal of wanting to share these. I'm not sure what that means for your PR, I'm just clarifying that point.

I'll look at ContextVK then, when I started this PR ContextVK was still creating its own message loop

ContextVK is still creating its own concurrent message loop?

engine/shell/platform/android/android_context_vulkan_impeller.cc

Line 44 in 0ae3719

workers_(fml::ConcurrentMessageLoop::Create()) {

ContextVK takes in a shared_ptr to a task queue which can be shared across multiple engines, but currently is not:

engine/impeller/renderer/backend/vulkan/context_vk.h

Line 43 in 0ae3719

std::shared_ptr<fml::ConcurrentTaskRunner> worker_task_runner;

I had a PR out that made ContextVK own a single concurrent message loop but we decided we didn't want to shut the door on sharing concurrent message loops between engines. I haven't read this PR through, I'm just responding to the comment "the problem that the VKContext has wherein we should only be creating a single concurrent message loop once per engine". There is no problem that I know of with Vulkan and we made sure of that with our recent work. Let me know if you want me review this PR if you think it would be helpful.

jonahwilliams · 2023-05-14T20:12:59Z

impeller/renderer/backend/metal/command_buffer_mtl.mm

+  if (!context) {
+    return false;
+  }
+  [buffer_ enqueue];


enquing on the main therad ensures the order

jonahwilliams · 2023-05-14T20:13:31Z

impeller/renderer/backend/metal/surface_mtl.mm

@@ -179,6 +179,7 @@
    }
  }

+#if ((FML_OS_MACOSX && !FML_OS_IOS) || FML_OS_IOS_SIMULATOR)


I observed using wonderous that the waitUntilScheduled was only necessary on simulator and (speculatively) on macOS

Is that with this patch or without?

If you've observed this without the changes in this patch, can we move this to a separate patch?

Yes I can split this out. Though we need both of these changes to see much of an improvement, otherwise the final scheduling just takes longer

moved here: #42160

jonahwilliams · 2023-05-14T20:14:49Z

impeller/renderer/backend/metal/command_buffer_mtl.mm

@@ -171,6 +176,53 @@ static bool LogMTLCommandBufferErrorIfPresent(id<MTLCommandBuffer> buffer) {
  return true;
 }

+bool CommandBufferMTL::SubmitCommandsAsync(
+    std::shared_ptr<RenderPass> render_pass) {


Currently this only works for the render pass but could optionally take the blit and compute passes too. I moved this functionality to the command buffer to make ownership of the render pass data more obvious - the closure keeps the shared_ptr for the render pass alive while the command buffer itself only requires the buffer.

jonahwilliams · 2023-05-14T20:15:22Z

impeller/renderer/backend/metal/command_buffer_mtl.mm

+  // Render command encoder creation has been observed to exceed the stack size
+  // limit for worker threads, and therefore is intentionally constructed on the
+  // raster thread.
+  auto render_command_encoder =


This was a frustrating discovery because it prevents us from doing the obvious thing and just calling RenderPass::Encode in the closure.

chinmaygarde · 2023-05-15T19:14:03Z

Exciting stuff. Looking 👀

dnfield · 2023-05-17T17:18:18Z

impeller/renderer/backend/metal/command_buffer_mtl.mm

+  if (!IsValid() || !render_pass->IsValid()) {
+    return false;
+  }
+  auto context = context_.lock();


I think this is more or less what ended up being the fix for problems around ownership in Vulkan backend.

dnfield · 2023-05-17T17:19:00Z

impeller/renderer/backend/metal/command_buffer_mtl.mm

+
+  // Render command encoder creation has been observed to exceed the stack size
+  // limit for worker threads, and therefore is intentionally constructed on the
+  // raster thread.


Can we not bump the stack size limit for worker threads?

How can I try that?

It's platform specific code, we'd need a pthreads and a win32 implementation at the least.

dnfield · 2023-05-17T17:20:31Z

impeller/renderer/backend/metal/command_buffer_mtl.mm

+  }
+
+  auto task = fml::MakeCopyable(
+      [render_pass, buffer, render_command_encoder, context]() {


Should we instead capture the weak_ptr to context and make sure it's still lockable in the callback here?

The advantage being that if the context has otherwise gone away, we avoid doing work.

Otherwise, I think we might need to have something about the GPU sync switch in here to make sure we're not doing encoding work when the GPU is unavailable.

Yeah, capturing the context seems reasonable. I'm not sure what the buffer itself will do, I guess we should probably assume it won't handle anything gracefully.

dnfield · 2023-05-17T17:23:31Z

shell/platform/darwin/graphics/FlutterDarwinContextMetalImpeller.mm

@@ -42,7 +45,8 @@ @implementation FlutterDarwinContextMetalImpeller
 - (instancetype)init {
  self = [super init];
  if (self != nil) {
-    _context = CreateImpellerContext();
+    _workers = fml::ConcurrentMessageLoop::Create();


We shouldn't create a message loop here. Instead, we should be getting the concurrent loop the engine creates already and passing it in as a parameter here.

That means we should mark the default initializer as unavailable and make a new one like initWithMessageLoop, or alternatively just initWithTaskRunner since that's all we actually need here.

I'm not sure exactly how to do this. It looks like ImpellerContext is setup without any references to the current engine. Is this something we can/should change, or should I look at providing the message loop some other way?

I think it'd flow through PlatformViewIOS (ctor) -> IOSContext::Create -> IOSContextMetalImpeller (ctor)

Looks like there's something left to figure out there though and we're not doing this on Android right now (we create a special concurrent loop for the context to use there in android_context_vulkan_impeller.cc

Maybe for now we can file a bug for this. We shouldn't really be creating multiple concurrent message loops. But there might need to be some refactoring in the shell/platformview/engine setup to make sure that ownership and sharing of the loop is handled properly.

Filed flutter/flutter#127160. I'm ambivalent about whether resolving that blocks this or not, but if it turns out to be not too hard to resolve that then we should do it before creating another violating of it here.

I think we should figure out the concurrent loop stuff first.

dnfield

Things that seem dangerous to me:

Doing encoding work on another thread without checking if the GPU is available or not. I think if we don't keep the context alive the context will do the job of that for us, but I might be wrong.
Creating another message loop. We should avoid that and just use the existing one.

…queue

jonahwilliams · 2023-05-22T17:19:47Z

@dnfield PTAL, I followed your example and updated this to use the existing concurrent message loop.

dnfield · 2023-05-22T17:29:21Z

impeller/renderer/backend/metal/command_buffer_mtl.mm

+      [render_pass, buffer, render_command_encoder, weak_context = context_]() {
+        auto context = weak_context.lock();
+        if (!context) {
+          return;


I'm realizing this will not protect us from potentially encoding command in the background.

We need the gpu sync switch in here so we can fizzle if GPU access is disabled. That's available via Shell::GetIsGpuDisabledSyncSwitch. I think it would make sense to expose that on ContextMTL.

which operations are forbidden in the background? Is just submitting the cmd buffer? I'm curious what stops us from doing that now on the single raster thread

(that is, do we need to use the sync switch elsewhere too?)

Committing encoding work is forbidden: https://developer.apple.com/documentation/metal/gpu_devices_and_work_submission/preparing_your_metal_app_to_run_in_the_background?language=objc

Before this change, the work would all be guarded by the rasterizer - but now, spawning new threads, it's harder to predict how scheduling will occur and the rasterizer may get to the end of the function it's calling but the task spawned to encode new work happens after.

Specifically, [buffer commit] below is definitely illegal in the background, but I think EncodeCommands may make other calls that are probably not allowed.

At any rate, we can't trust that we're on the raster thread with the rasterizer having checked whether GPU access is allowed in this closure.

dnfield

LGTM once CI is happy

…into encode_and_enqueue

…er/engine#42028)

…127440) flutter/engine@3535e0d...fc6df95 2023-05-23 jonahwilliams@google.com [Impeller] Encode render passes concurrently on iOS. (flutter/engine#42028) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC bdero@google.com,rmistry@google.com,zra@google.com on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md

…lutter#127440) flutter/engine@3535e0d...fc6df95 2023-05-23 jonahwilliams@google.com [Impeller] Encode render passes concurrently on iOS. (flutter/engine#42028) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC bdero@google.com,rmistry@google.com,zra@google.com on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md

zanderso · 2023-05-25T16:00:30Z

SkiaPerf link for next set of release notes: https://flutter-flutter-perf.skia.org/e/?begin=1684868908&end=1684953846&keys=Xa06e17cd029c62b99ebc7c4da0a2c039&num_commits=50&request_type=1&xbaroffset=34979

This is a requirement for the "Remove Drawable Acquisition Latency" Change, but is otherwise a harmless improvement. --- Submitting a command buffer causes the backend specific encoding logic to run. Metal is unique in that it is fairly easy to move this work into a background thread, allowing the engine to move onto creating the next command buffer. This improves throughput of the engine, at the cost of needing two slightly different APIs. Currently the GLES and Vulkan versions of this method still submit synchronously - for now that is out of scope as doing background work with those APIs has proved more challenging. See also: * #42028 * flutter/flutter#131698 Separately, as a requirement for the design in "Remove Drawable Acquisition Latency", we need to be able to defer drawable acquisition to this background thread. While this almost already works for render passes, it does not work for blit passes today. if the engine renders a backdrop filter, then the final command buffer submitted will be a blit pass that copies an offscreen onto the drawable. Therefore we need to add an async version of the blit submission, so that we have a hook to move the drawable acquisition onto a background thread for metal. This hadn't been done until now because most blit cmd buffers have 1 or 2 cmds on them so the benefit of moving to a background thread is minimal. Part of flutter/flutter#138490

jonahwilliams added 2 commits May 14, 2023 13:09

Push encoding to worker thread

93b6a9c

++

7c9382d

jonahwilliams commented May 14, 2023

View reviewed changes

++

e122d32

jonahwilliams marked this pull request as ready for review May 15, 2023 18:16

jonahwilliams requested review from chinmaygarde, bdero and dnfield May 15, 2023 18:33

chinmaygarde assigned jonahwilliams May 15, 2023

chinmaygarde added the e: impeller label May 15, 2023

dnfield reviewed May 17, 2023

View reviewed changes

dnfield suggested changes May 17, 2023

View reviewed changes

jonahwilliams added 2 commits May 18, 2023 13:49

Merge branch 'master' of github.com:flutter/engine into encode_and_en…

23ecc12

…queue

++

69ac789

dnfield mentioned this pull request May 19, 2023

Avoid creating concurrent message loops for the Vulkan/Metal contexts in Impeller. flutter/flutter#127160

Closed

jonahwilliams mentioned this pull request May 19, 2023

[Impeller] remove final cmd buffer waitUntilScheduled on physical iOS #42160

Merged

jonahwilliams added 3 commits May 22, 2023 09:37

Merge branch 'master' of github.com:flutter/engine into encode_and_en…

b49f056

…queue

take existing concurrent message loop

18ef717

++

71369e4

jonahwilliams requested a review from dnfield May 22, 2023 17:19

dnfield reviewed May 22, 2023

View reviewed changes

jonahwilliams added 8 commits May 22, 2023 10:56

++

2821932

plumbing for gpu sync switch

b2515af

++

83b0b90

update a11y bridge test

1aed386

++

5198c2c

++

56c1687

++

929f0bc

++

b843862

jonahwilliams changed the title ~~[Impeller] Encode render passes concurrently on iOS, only block final submission on macOS and simulator.~~ [Impeller] Encode render passes concurrently on iOS. May 22, 2023

flutter-dashboard bot added the platform-ios label May 22, 2023

jonahwilliams requested a review from dnfield May 22, 2023 23:03

dnfield approved these changes May 22, 2023

View reviewed changes

jonahwilliams added 6 commits May 22, 2023 19:32

Merge branch 'main' into encode_and_enqueue

3df1b91

Merge branch 'main' into encode_and_enqueue

07ab4e9

++

9670639

Merge branch 'encode_and_enqueue' of github.com:jonahwilliams/engine …

60b42be

…into encode_and_enqueue

++

9d1a0cc

++

ff058f8

jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label May 23, 2023

auto-submit bot merged commit fc6df95 into flutter:main May 23, 2023
28 checks passed

jonahwilliams deleted the encode_and_enqueue branch May 23, 2023 20:20

engine-flutter-autoroll mentioned this pull request May 23, 2023

Roll Flutter Engine from 3535e0d301dd to fc6df9512aa2 (1 revision) flutter/flutter#127440

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 23, 2023

fc6df9512 [Impeller] Encode render passes concurrently on iOS. (flutt…

ef22eae

…er/engine#42028)

jonahwilliams mentioned this pull request Aug 1, 2023

[Impeller] Support multithreaded command encoding for Vulkan backend. flutter/flutter#131698

Closed

jonahwilliams mentioned this pull request Nov 15, 2023

[Impeller] add async command submission for blit pass. #48040

Merged

[Impeller] Encode render passes concurrently on iOS. #42028

[Impeller] Encode render passes concurrently on iOS. #42028

Conversation

jonahwilliams commented May 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chinmaygarde commented May 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnfield left a comment

Choose a reason for hiding this comment

jonahwilliams commented May 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnfield left a comment

Choose a reason for hiding this comment

zanderso commented May 25, 2023