Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Impeller] have Hostbuffer write directly to block allocated device buffers. #49505

Merged
merged 24 commits into from
Jan 10, 2024

Conversation

jonahwilliams
Copy link
Member

@jonahwilliams jonahwilliams commented Jan 3, 2024

part of flutter/flutter#140804

We can't use the existing host buffer abstraction as that requires us to collect all allocations up front. By itself, this isn't sufficient for #140804 , because we'll need a way to mark ranges as dirty and/or flush if we don't have host coherent memory. But by itself this change should be beneficial as we'll create fewer device buffers and should do less allocation in general.

The size of the device buffers is 1024 Kb, somewhat arbitrarily chosen.

//----------------------------------------------------------------------------
/// @brief Accessor for a pool of HostBuffers.
Pool<HostBuffer>& GetHostBufferPool() const { return host_buffer_pool_; }
virtual const std::shared_ptr<HostBuffer> GetTransientsBuffer() const = 0;
Copy link
Member

@bdero bdero Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be problematic to track the transients buffer in the ContentContext and trigger per-frame lifecycle stuff from EntityPass::Render() (called once per frame on the root EntityPass, even when fancy Picture stuff is afoot)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ContentContext would be a reasonable place to put this.

@jonahwilliams
Copy link
Member Author

The problem with putting it on ContentContext is that we actually have a ton of test code that wants to use the transients buffer that only references context/render pass. So if I do that I have to refactor a bunch of test code :(

@jonahwilliams
Copy link
Member Author

This is nearly working. i think we could land this without worrying about flushing until we switch to directly encoding to device buffers.

@jonahwilliams
Copy link
Member Author

I mean command buffers.

@jonahwilliams
Copy link
Member Author

Doesn't quite work for GLES yet, digging into that.

@jonahwilliams
Copy link
Member Author

Okay, sort of fixed for GLES by making them initially dirty.

@jonahwilliams jonahwilliams marked this pull request as ready for review January 9, 2024 00:55
@jonahwilliams
Copy link
Member Author

I still need to verify that this works with a Vulkan device that does not support host coherent memory. I have one on my desk and I'll check tomorrow.

@jonahwilliams
Copy link
Member Author

This change also sort of breaks the flutter:gpu functions that bind to the host buffer.

@jonahwilliams
Copy link
Member Author

basically you can no longer assume that the singular host buffer + offset is sufficient to reconstruct the correct buffer view. I think you'd either need to track some sort of token + the range, or else use a different abstraction that is separate from what impeller is using.

I could literally split them into two concepts, for flutter gpu you can have a host buffer that behaves like the old host buffer. And the rest of impeller can use a "transientArena" or something like that.

@jonahwilliams
Copy link
Member Author

FYI @bdero

@bdero
Copy link
Member

bdero commented Jan 9, 2024

I pushed a solution to keep HostBuffer working in Flutter GPU here: #49618

@@ -940,6 +947,7 @@ class ContentContext {
std::shared_ptr<scene::SceneContext> scene_context_;
#endif // IMPELLER_ENABLE_3D
std::shared_ptr<RenderTargetAllocator> render_target_cache_;
std::shared_ptr<HostBuffer> host_buffer_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -45,6 +45,8 @@ class DeviceBuffer : public Buffer,

virtual uint8_t* OnGetContents() const = 0;

virtual void Flush(std::optional<Range> range) const {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place definition in device_buffer.cc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void HostBuffer::HostBufferState::MaybeCreateNewBuffer(size_t required_size) {
if (current_buffer + 1 >= device_buffers.size()) {
if (required_size > kAllocatorBlockSize) {
FML_LOG(ERROR) << "Created oversized buffer: " << required_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be addressed later when we start doing DeviceBuffer reuse, but it would probably be a good to track oversized DeviceBuffers separately from the block list. Otherwise one oversized draw could end up causing a few oversized blocked to get allocated while scrolling around, for example.

In the doc for flutter/flutter#138161 I recommended just throwing away oversized buffers and not tracking when starting out... But perhaps later we could form a heap for tracking oversize buffers sorted by size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -198,6 +198,7 @@ class ContextVK final : public Context,
std::unique_ptr<fml::Thread> queue_submit_thread_;
std::shared_ptr<GPUTracerVK> gpu_tracer_;
std::shared_ptr<DescriptorPoolRecyclerVK> descriptor_pool_recycler_;
std::shared_ptr<HostBuffer> host_buffer_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<HostBuffer> host_buffer_;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#include "flutter/fml/mapping.h"
#include "flutter/fml/unique_fd.h"
#include "fml/thread.h"
#include "impeller/base/backend_cast.h"
#include "impeller/core/formats.h"
#include "impeller/core/host_buffer.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "impeller/core/host_buffer.h"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -52,6 +53,7 @@ class ContextGLES final : public Context,
std::shared_ptr<SamplerLibraryGLES> sampler_library_;
std::shared_ptr<AllocatorGLES> resource_allocator_;
std::shared_ptr<GPUTracerGLES> gpu_tracer_;
std::shared_ptr<HostBuffer> host_buffer_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<HostBuffer> host_buffer_;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -9,6 +9,7 @@
#include <unordered_map>
#include "flutter/fml/macros.h"
#include "impeller/base/backend_cast.h"
#include "impeller/core/host_buffer.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "impeller/core/host_buffer.h"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

renderer.GetRenderTargetCache()->End();
renderer.GetTransientsBuffer().Reset();
#if IMPELLER_ENABLE_3D
renderer.GetSceneContext()->GetTransientsBuffer().Reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best way to do this for Scene would be a scoped cleanup in Scene::Render instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -129,6 +129,7 @@ class ContextMTL final : public Context,
#endif // IMPELLER_DEBUG
std::deque<std::function<void()>> tasks_awaiting_gpu_;
std::unique_ptr<SyncSwitchObserver> sync_switch_observer_;
std::shared_ptr<HostBuffer> host_buffer_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<HostBuffer> host_buffer_;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

FML_CHECK(did_truncate);
offset = 0u;
current_buffer = 0u;
device_buffers.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans for reusing the DeviceBuffers across frames? One easy solution would be to rotate through MaxFramesInFlight vectors IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems simple enough. Added this logic plus some trimming of buffers if they go unused.

@@ -13,7 +13,8 @@ namespace gpu {

IMPLEMENT_WRAPPERTYPEINFO(flutter_gpu, HostBuffer);

HostBuffer::HostBuffer() : host_buffer_(impeller::HostBuffer::Create()) {}
HostBuffer::HostBuffer()
: host_buffer_(impeller::HostBuffer::Create(nullptr)) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix this in a sensible way I need to change the interface on the Dart side to be spawned from a a gpuContext. So feel free to land this break and I'll fix it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a shot at fixing that, seems to compile OK!

Copy link
Member

@bdero bdero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

zijiehe-google-com pushed a commit to zijiehe-google-com/engine that referenced this pull request Jan 9, 2024
Make the wrapped HostBuffer wrapper track/look up emplacements using a
fake byte offset.

This is a trick to keep Flutter GPU working after
flutter#49505 lands. I'll likely swing
around and change how `BufferView` works later on. We can simplify a lot
by making Flutter GPU `BufferView`s just take `DeviceBuffer` handles.
@jonahwilliams
Copy link
Member Author

Incrementing the arena for host buffer allocations in EntityPass happened too frequently, in the event that there was any usage of toImage we could easy step on our own buffers, which happened in a few flutter tester tests. I've pushed a new design that lets content context reigster a callback for context to call every time Renderer::Render is called, and the reset is done there instead.

@jonahwilliams
Copy link
Member Author

I do wonder if that will cause problems for the flutter tester style of rendering but I'm not certain what a reasonable fix would be for cases where we only ever call toImage.

if (!context_ || !context_->IsValid()) {
return;
}
// Register frame end callback to reset transient host buffer state.
context_->AddPerFrameCompleteTask(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️ WDUT? @bdero @dnfield

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed real quick in person: We could trigger the reset call from AiksContext::Render since that'll happen per-frame

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jonah pointed out Picture::ToImage also calls AiksContext::Render, so going with bool to turn the resetting behavior off

@jonahwilliams
Copy link
Member Author

Going to give this a shot now.

@jonahwilliams jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label Jan 10, 2024
@auto-submit auto-submit bot merged commit 52aedc6 into flutter:main Jan 10, 2024
28 checks passed
@jonahwilliams jonahwilliams deleted the write_to_device_buffer branch January 10, 2024 17:49
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jan 10, 2024
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request Jan 10, 2024
…141291)

flutter/engine@d1a2007...52aedc6

2024-01-10 jonahwilliams@google.com [Impeller] have Hostbuffer write directly to block allocated device buffers. (flutter/engine#49505)
2024-01-10 skia-flutter-autoroll@skia.org Roll Skia from 9271dcdade42 to 334160c0eede (1 revision) (flutter/engine#49675)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC bdero@google.com,rmistry@google.com,zra@google.com on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
@jonahwilliams jonahwilliams added the revert Label used to revert changes in a closed and merged pull request. label Jan 10, 2024
auto-submit bot pushed a commit that referenced this pull request Jan 10, 2024
@auto-submit auto-submit bot removed the revert Label used to revert changes in a closed and merged pull request. label Jan 10, 2024
auto-submit bot added a commit that referenced this pull request Jan 10, 2024
… device buffers." (#49688)

Reverts #49505
Initiated by: jonahwilliams
This change reverts the following previous change:
Original Description:
part of flutter/flutter#140804

We can't use the existing host buffer abstraction as that requires us to collect all allocations up front. By itself, this isn't sufficient for #140804 , because we'll need a way to mark ranges as dirty and/or flush if we don't have host coherent memory. But by itself this change should be beneficial as we'll create fewer device buffers and should do less allocation in general.

The size of the device buffers is 1024 Kb, somewhat arbitrarily chosen.
auto-submit bot pushed a commit that referenced this pull request Jan 10, 2024
Reland of #49505

---

part of flutter/flutter#140804

We can't use the existing host buffer abstraction as that requires us to collect all allocations up front. By itself, this isn't sufficient for #140804 , because we'll need a way to mark ranges as dirty and/or flush if we don't have host coherent memory. But by itself this change should be beneficial as we'll create fewer device buffers and should do less allocation in general.

The size of the device buffers is 1024 Kb, somewhat arbitrarily chosen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autosubmit Merge PR when tree becomes green via auto submit App e: impeller
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

2 participants