-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Impeller] have Hostbuffer write directly to block allocated device buffers. #49505
[Impeller] have Hostbuffer write directly to block allocated device buffers. #49505
Conversation
impeller/renderer/context.h
Outdated
//---------------------------------------------------------------------------- | ||
/// @brief Accessor for a pool of HostBuffers. | ||
Pool<HostBuffer>& GetHostBufferPool() const { return host_buffer_pool_; } | ||
virtual const std::shared_ptr<HostBuffer> GetTransientsBuffer() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be problematic to track the transients buffer in the ContentContext
and trigger per-frame lifecycle stuff from EntityPass::Render()
(called once per frame on the root EntityPass
, even when fancy Picture
stuff is afoot)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ContentContext would be a reasonable place to put this.
The problem with putting it on ContentContext is that we actually have a ton of test code that wants to use the transients buffer that only references context/render pass. So if I do that I have to refactor a bunch of test code :( |
This is nearly working. i think we could land this without worrying about flushing until we switch to directly encoding to device buffers. |
I mean command buffers. |
Doesn't quite work for GLES yet, digging into that. |
Okay, sort of fixed for GLES by making them initially dirty. |
I still need to verify that this works with a Vulkan device that does not support host coherent memory. I have one on my desk and I'll check tomorrow. |
This change also sort of breaks the flutter:gpu functions that bind to the host buffer. |
basically you can no longer assume that the singular host buffer + offset is sufficient to reconstruct the correct buffer view. I think you'd either need to track some sort of token + the range, or else use a different abstraction that is separate from what impeller is using. I could literally split them into two concepts, for flutter gpu you can have a host buffer that behaves like the old host buffer. And the rest of impeller can use a "transientArena" or something like that. |
FYI @bdero |
I pushed a solution to keep HostBuffer working in Flutter GPU here: #49618 |
@@ -940,6 +947,7 @@ class ContentContext { | |||
std::shared_ptr<scene::SceneContext> scene_context_; | |||
#endif // IMPELLER_ENABLE_3D | |||
std::shared_ptr<RenderTargetAllocator> render_target_cache_; | |||
std::shared_ptr<HostBuffer> host_buffer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impeller/core/device_buffer.h
Outdated
@@ -45,6 +45,8 @@ class DeviceBuffer : public Buffer, | |||
|
|||
virtual uint8_t* OnGetContents() const = 0; | |||
|
|||
virtual void Flush(std::optional<Range> range) const {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place definition in device_buffer.cc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
impeller/core/host_buffer.cc
Outdated
void HostBuffer::HostBufferState::MaybeCreateNewBuffer(size_t required_size) { | ||
if (current_buffer + 1 >= device_buffers.size()) { | ||
if (required_size > kAllocatorBlockSize) { | ||
FML_LOG(ERROR) << "Created oversized buffer: " << required_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be addressed later when we start doing DeviceBuffer
reuse, but it would probably be a good to track oversized DeviceBuffer
s separately from the block list. Otherwise one oversized draw could end up causing a few oversized blocked to get allocated while scrolling around, for example.
In the doc for flutter/flutter#138161 I recommended just throwing away oversized buffers and not tracking when starting out... But perhaps later we could form a heap for tracking oversize buffers sorted by size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -198,6 +198,7 @@ class ContextVK final : public Context, | |||
std::unique_ptr<fml::Thread> queue_submit_thread_; | |||
std::shared_ptr<GPUTracerVK> gpu_tracer_; | |||
std::shared_ptr<DescriptorPoolRecyclerVK> descriptor_pool_recycler_; | |||
std::shared_ptr<HostBuffer> host_buffer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::shared_ptr<HostBuffer> host_buffer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#include "flutter/fml/mapping.h" | ||
#include "flutter/fml/unique_fd.h" | ||
#include "fml/thread.h" | ||
#include "impeller/base/backend_cast.h" | ||
#include "impeller/core/formats.h" | ||
#include "impeller/core/host_buffer.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "impeller/core/host_buffer.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -52,6 +53,7 @@ class ContextGLES final : public Context, | |||
std::shared_ptr<SamplerLibraryGLES> sampler_library_; | |||
std::shared_ptr<AllocatorGLES> resource_allocator_; | |||
std::shared_ptr<GPUTracerGLES> gpu_tracer_; | |||
std::shared_ptr<HostBuffer> host_buffer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::shared_ptr<HostBuffer> host_buffer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -9,6 +9,7 @@ | |||
#include <unordered_map> | |||
#include "flutter/fml/macros.h" | |||
#include "impeller/base/backend_cast.h" | |||
#include "impeller/core/host_buffer.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "impeller/core/host_buffer.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
impeller/entity/entity_pass.cc
Outdated
renderer.GetRenderTargetCache()->End(); | ||
renderer.GetTransientsBuffer().Reset(); | ||
#if IMPELLER_ENABLE_3D | ||
renderer.GetSceneContext()->GetTransientsBuffer().Reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best way to do this for Scene would be a scoped cleanup in Scene::Render
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -129,6 +129,7 @@ class ContextMTL final : public Context, | |||
#endif // IMPELLER_DEBUG | |||
std::deque<std::function<void()>> tasks_awaiting_gpu_; | |||
std::unique_ptr<SyncSwitchObserver> sync_switch_observer_; | |||
std::shared_ptr<HostBuffer> host_buffer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::shared_ptr<HostBuffer> host_buffer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
impeller/core/host_buffer.cc
Outdated
FML_CHECK(did_truncate); | ||
offset = 0u; | ||
current_buffer = 0u; | ||
device_buffers.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans for reusing the DeviceBuffer
s across frames? One easy solution would be to rotate through MaxFramesInFlight
vectors IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that seems simple enough. Added this logic plus some trimming of buffers if they go unused.
lib/gpu/host_buffer.cc
Outdated
@@ -13,7 +13,8 @@ namespace gpu { | |||
|
|||
IMPLEMENT_WRAPPERTYPEINFO(flutter_gpu, HostBuffer); | |||
|
|||
HostBuffer::HostBuffer() : host_buffer_(impeller::HostBuffer::Create()) {} | |||
HostBuffer::HostBuffer() | |||
: host_buffer_(impeller::HostBuffer::Create(nullptr)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix this in a sensible way I need to change the interface on the Dart side to be spawned from a a gpuContext. So feel free to land this break and I'll fix it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a shot at fixing that, seems to compile OK!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Make the wrapped HostBuffer wrapper track/look up emplacements using a fake byte offset. This is a trick to keep Flutter GPU working after flutter#49505 lands. I'll likely swing around and change how `BufferView` works later on. We can simplify a lot by making Flutter GPU `BufferView`s just take `DeviceBuffer` handles.
Incrementing the arena for host buffer allocations in EntityPass happened too frequently, in the event that there was any usage of toImage we could easy step on our own buffers, which happened in a few flutter tester tests. I've pushed a new design that lets content context reigster a callback for context to call every time Renderer::Render is called, and the reset is done there instead. |
I do wonder if that will cause problems for the flutter tester style of rendering but I'm not certain what a reasonable fix would be for cases where we only ever call toImage. |
if (!context_ || !context_->IsValid()) { | ||
return; | ||
} | ||
// Register frame end callback to reset transient host buffer state. | ||
context_->AddPerFrameCompleteTask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed real quick in person: We could trigger the reset call from AiksContext::Render since that'll happen per-frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jonah pointed out Picture::ToImage also calls AiksContext::Render, so going with bool to turn the resetting behavior off
Going to give this a shot now. |
…141291) flutter/engine@d1a2007...52aedc6 2024-01-10 jonahwilliams@google.com [Impeller] have Hostbuffer write directly to block allocated device buffers. (flutter/engine#49505) 2024-01-10 skia-flutter-autoroll@skia.org Roll Skia from 9271dcdade42 to 334160c0eede (1 revision) (flutter/engine#49675) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC bdero@google.com,rmistry@google.com,zra@google.com on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
… device buffers." (#49688) Reverts #49505 Initiated by: jonahwilliams This change reverts the following previous change: Original Description: part of flutter/flutter#140804 We can't use the existing host buffer abstraction as that requires us to collect all allocations up front. By itself, this isn't sufficient for #140804 , because we'll need a way to mark ranges as dirty and/or flush if we don't have host coherent memory. But by itself this change should be beneficial as we'll create fewer device buffers and should do less allocation in general. The size of the device buffers is 1024 Kb, somewhat arbitrarily chosen.
Reland of #49505 --- part of flutter/flutter#140804 We can't use the existing host buffer abstraction as that requires us to collect all allocations up front. By itself, this isn't sufficient for #140804 , because we'll need a way to mark ranges as dirty and/or flush if we don't have host coherent memory. But by itself this change should be beneficial as we'll create fewer device buffers and should do less allocation in general. The size of the device buffers is 1024 Kb, somewhat arbitrarily chosen.
part of flutter/flutter#140804
We can't use the existing host buffer abstraction as that requires us to collect all allocations up front. By itself, this isn't sufficient for #140804 , because we'll need a way to mark ranges as dirty and/or flush if we don't have host coherent memory. But by itself this change should be beneficial as we'll create fewer device buffers and should do less allocation in general.
The size of the device buffers is 1024 Kb, somewhat arbitrarily chosen.