-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] Use separate pool for staging buffers. #43325
Conversation
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
auto image_upload = desc.usage_hint == UsageHint::kImageUpload; | ||
alloc_nfo.flags = ToVmaAllocationCreateFlags( | ||
desc.storage_mode, /*is_texture=*/true, | ||
image_upload ? 0u : desc.GetByteSizeOfBaseMipLevel()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of hacky, but if we want to allocate in the specific pool ... we can't ask for dedicated memory. Makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a bit more clear if you collapse it to one branch instead of 2 identical branches next to each other.
This is missing some checks that the usage is correct and currently crashes on an S10. Locally it might be marginally faster, and I still see long stalls on that device. That said, the image decompression competing with raster frames is improved on my Pixel 6. |
constexpr size_t kImageSizeThresholdForDedicatedMemoryAllocation = | ||
4 * 1024 * 1024; | ||
|
||
static bool CreateBufferPool(VmaAllocator allocator, VmaPool* pool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we should use fml::UniqueObject for pools, allocators, textures, and buffers from VMA. I took a closer look at the early returns and it seems to be fine but its easy to mess up and leak resources especially when it comes to error conditions. And we already have an RAII wrapper.
In a later patch is fine of course. Of file a followup and we can pick it up later so we don't forget.
impeller/core/formats.h
Outdated
|
||
std::string TextureUsageMaskToString(TextureUsageMask mask); | ||
|
||
enum class UsageHint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResourceUsageHint
perhaps? This is a bit ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather than a hint it should be more explict as "ResourceUsage", because I think we should also adjust the usage flags to see if that helps. For example ResourceUsage.stagingBuffer should only be a transferSrc, and doesn't need random access, et cetera.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can have a "general" category that retains the current set of features
impeller/core/formats.h
Outdated
|
||
enum class UsageHint { | ||
/// @brief Texture or buffer is being used during the raster workload. | ||
kRasterWorkload, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about kTransient
and kPermanent
? You expect to allocate the former in or near a frame workload and the latter has a longer lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My sense is that the current names betray this being used for a very Flutter as it works today use case.
|
||
// Maximum size to use VMA image suballocation. Any allocation greater than or | ||
// equal to this value will use a dedicated VkDeviceMemory. | ||
constexpr size_t kImageSizeThresholdForDedicatedMemoryAllocation = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in #43313 (comment), let's put the tunables in a separate file so everything is available at a glance?
|
||
VmaPoolCreateInfo poolCreateInfo = {}; | ||
poolCreateInfo.memoryTypeIndex = memTypeIndex; | ||
poolCreateInfo.blockSize = 128ull * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in #43313 (comment), let's put the tunables in a separate file so everything is available at a glance?
// Create a pool that can have at most 2 blocks, 128 MiB each. | ||
VmaPoolCreateInfo poolCreateInfo = {}; | ||
poolCreateInfo.memoryTypeIndex = memTypeIndex; | ||
poolCreateInfo.blockSize = 128ull * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in #43313 (comment), let's put the tunables in a separate file so everything is available at a glance?
poolCreateInfo.memoryTypeIndex = memTypeIndex; | ||
poolCreateInfo.blockSize = 128ull * 1024 * 1024; | ||
|
||
auto result = vk::Result{vmaCreatePool(allocator, &poolCreateInfo, pool)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These pools are sparse to start with right? I suppose we'll find out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think so. I'm not sure what blockSize = 0
is doing. Does it realloc if it gets too big?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy, this definitely complicates everything but I think this is the way forward, nice work. At the very least a pool should be eliminating the overhead of selecting the memory to use. I think we might be able to squeeze out more performance by using a linear algorithm.
FML_UNREACHABLE(); | ||
} | ||
|
||
vk::Flags<vk::BufferUsageFlagBits> VmaBufferUsageFlags(UsageHint usage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, from my understanding this tidying up can make a difference.
VkResult res = vmaFindMemoryTypeIndexForBufferInfo( | ||
allocator, &buffer_info_native, &allocation_info, &memTypeIndex); | ||
|
||
VmaPoolCreateInfo poolCreateInfo = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try out VMA_POOL_CREATE_LINEAR_ALGORITHM_BIT
? The API is a bit cheeky in that it doesn't tell you what happens down below but considering these are just living for one frame and getting thrown away it may work.
VkResult res = vmaFindMemoryTypeIndexForBufferInfo( | ||
allocator, &buffer_info_native, &allocation_info, &memTypeIndex); | ||
|
||
VmaPoolCreateInfo poolCreateInfo = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to supply a blockSize
?
poolCreateInfo.memoryTypeIndex = memTypeIndex; | ||
poolCreateInfo.blockSize = 128ull * 1024 * 1024; | ||
|
||
auto result = vk::Result{vmaCreatePool(allocator, &poolCreateInfo, pool)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think so. I'm not sure what blockSize = 0
is doing. Does it realloc if it gets too big?
FML_UNREACHABLE(); | ||
} | ||
|
||
static bool CreateBufferPool(VmaAllocator allocator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We probably should be using fml::status instead of bool for returning errors since we have it.
auto image_upload = desc.usage_hint == UsageHint::kImageUpload; | ||
alloc_nfo.flags = ToVmaAllocationCreateFlags( | ||
desc.storage_mode, /*is_texture=*/true, | ||
image_upload ? 0u : desc.GetByteSizeOfBaseMipLevel()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a bit more clear if you collapse it to one branch instead of 2 identical branches next to each other.
|
||
fml::RefPtr<vulkan::VulkanProcTable> vk_; | ||
VmaAllocator allocator_ = {}; | ||
VmaPool raster_buffer_pool_ = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we go forward with this we should have a docstring for the different pools. This is complicated enough that making it clear is helpful.
Sorry for the delay, I just lgtmed the dependent PR. |
Updated. I've backed out the change for texture allocation because I can't get a clear signal on whether or not that helps. At least moving the buffer allocation to one pool allows us to ignore granularity which can help make allocations faster. But its all sort of nebulous imo. I'm putting VMA down for a while after this. |
Based off of https://github.com/flutter/engine/pull/43313/files