-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Texture views cause unnecessary memory bandwidth use #744
Comments
The documentation of
|
Sorry, there's an option for it in the descriptor, but the behavior isn't specified. It shouldn't be allowed by default (and I'm pretty sure Dawn doesn't allow it). I recall that it has come up in the past that we would need a flag on GPUTexture creation to allow reinterpretation of the format. |
Thanks @litherum for the analysis! It provides a good rational for why we don't want to allow free-reinterpretation of formats. It is highly related to #168.
I'm not sure I agree with the conclusion because the issue here is the How about adding a flag at texture creation, or a list of formats the application would like to be able to reinterpret as? Having the list of formats would allow for better optimization when some reinterpret_casting is possible without breaking the texture compression. |
Similar thoughts here. Great investigation by @litherum (thanks!), just not the conclusion I expected. I recall we already discussed the requirements for backends to reinterpret the formats back in the day, I think it's more than just Metal that would benefit from an explicit flag like @Kangz described. |
I see. If there's a requirement that the view's pixel format has to match the texture's pixel format, and there are architectural reasons why this must be so (as described above), then I suppose the best solution would be to delete the We can add the field back if we want to drop this requirement in the future. |
Metal only requires such a flag if the view format has a "different component layout". We should still be able to support reinterpretations with the same component layout.
|
I guess this is "Needs Proposal" since a proposal for "valid view formats list" was requested. |
Yep. I'm in the middle of coming up with a proposal now. |
As discussed in the meeting, let's investigate the exact mechanisms of texture format reinterpretation. It's kind of a duplicate with #168 but worth doing again imho. VulkanVulkan has two texture creation flags that control whether formats can be reinterpreted:
Vulkan format compatibility classes (expressed in WebGPU formats) are the following: 8-bit formats: "r8unorm",
"r8snorm",
"r8uint",
"r8sint", 16-bit formats "r16uint",
"r16sint",
"r16float",
"rg8unorm",
"rg8snorm",
"rg8uint",
"rg8sint", 32-bit formats: "r32uint",
"r32sint",
"r32float",
"rg16uint",
"rg16sint",
"rg16float",
"rgba8unorm",
"rgba8unorm-srgb",
"rgba8snorm",
"rgba8uint",
"rgba8sint",
"bgra8unorm",
"bgra8unorm-srgb",
// Packed 32-bit formats
"rgb10a2unorm",
"rg11b10float", 64-bit formats: "rg32uint",
"rg32sint",
"rg32float",
"rgba16uint",
"rgba16sint",
"rgba16float", 128-bit formats: "rgba32uint",
"rgba32sint",
"rgba32float", Depth and stencil formats cannot be reinterpreted: // Each is its own compatibility class.
"depth32float",
"depth24plus",
"depth24plus-stencil8" @RafaelCintron @litherum please help produce similar investigations for D3D12 and Metal. |
Direct3D 12The D3D documentation refers to this sort of reinterpretation as casting. Most of the details can be found in the D3D11 Functional Spec and are still relevant to D3D12. Of particular interest here is the discussion around typeless and typed memory. An explicit table showing the various formats is provided in D3D11_3_Formats_FL11_1.xls. To summarize the rules:
CastingFullyTypedFormatSupported is required for any WDDM2.2 driver; this corresponds to any driver that supports Windows 10 version 1703 or higher. Here's the list of D3D12 format compatbility classes in terms of WebGPU formats:
|
These would apply to SRV and RTV/DSV only. |
I tried to compile all this info into a spreadsheet, and I also tried to summarize the (mostly fairly simple) rules from Metal, but the result is quite complicated and I'm not totally sure it's right. https://docs.google.com/spreadsheets/d/1PRiOja_AVse0QuB6rDH_YzidGw5fucQIRz0md9Sg4m8/edit?usp=sharing Please request edit access if you would like to make any changes. |
@kainino0x , thank you for putting together this spreadsheet. One aspect of @damyanp's reply that I am not 100% on is the following:
It would be unfortunate if depth24plus-stencil8 was always implemented with D32_FLOAT_S8X24_UINT instead of D24_UNORM_S8_UINT. Hence, I think the "depth24plus*" textures should be group together from a compatibility perspective and "depth32float" should be grouped with the "r32*" formats. Of course, if web developers only test on platforms where "plus" textures give them more bits of precision, we may be forced to provide the same number of bits everywhere to keep content portable. |
Updated the spreadsheet to reflect that. I don't think it ultimately matters, because I don't think Vulkan will allow us to have any conversions between these formats. And on top of that, it seems potentially dodgy to make views that view the "X" parts of formats with unused bits. |
Also updated the D3D12 and Metal spreadsheets to reflect that views shouldn't view X bits. |
Good catch Rafael, I'm not sure how I ended up with the depth/stencil in that state as it doesn't match the methodology I was using to generate the rest of the table at all. |
I don't have access to modify that spreadsheet, but it appears the Metal page is incorrect. I wrote a sample program to determine the correct data, and here is the results: Metal Pixel Formats.zip |
Does that mean that, without this bit, all views must have the same format as their underlying texture? Does setting this bit have any performance implications? (If the answer to both these questions is "yes" then that means Vulkan's spreadsheet truth table thing is identical to Metal's.) |
Also: Does setting this bit require any extensions to be enabled in Vulkan? |
Assuming the answer is "yes," (and no Vulkan extensions are required), then I think the proposal would be: Add a flag to The flag makes sense because of the performance vs flexibility tradeoff. And, if the flag is supplied, we need to use the intersection of the greenish boxes, because we can't expand that set while also maintaining implementability on all 3 native APIs. |
Whoops, I'm not sure I have access either, but I put the table in sheets so it's easy to view. |
Updating from macOS Catalina 19F to 19G caused the macOS spreadsheet to change slightly. Now, converting between Given that Vulkan hasn't changed at all, I don't think this affects the decision we make in WebGPU. I just wanted to mention it because there was some confusion during the call about how the Metal documentation seemed to disagree with the Metal validation layers. In 19G, they don't disagree. |
Updated spreadsheet with rgb9e5 (#975) and the info from Catalina 19G+ (^) |
Given the resolution from the Vulkan investigation it seems like there are no formats that can be freely reinterpreted without a "reinterpretable" flag. Adding a "reinterpretable" flag is #168. So, the only specification action that comes out of this issue is something that says "texture view format must equal texture format" until #168 is done. |
ok |
Creating a texture view in Metal requires the use of the
MTLTextureUsagePixelFormatView
usage on the original texture. Specifying this usage disables hardware lossless compression in modern iPhones. The reason is that the texture view can have a different pixel format than the original, and the lossless compression is specific to a particular pixel format. The result of specifying this flag when it isn’t really needed is that memory bandwidth use is unnecessarily increased.The user-visible symptom of increased memory bandwidth use is:
In order to characterize this, you can run the “traditional deferred lighting” mode of this sample code with and without the
MTLTextureUsagePixelFormatView
flag. If you run it unmodified, an iPhone XS Max reads 40.32 MiB and writes 56.8 MiB per frame. If you then add the flag to the texture creation functions, this increases to 55.55 MiB read and 71.81 MiB written of memory per frame. Therefore, removing the flag causes a 23.7% reduction of bandwidth use on this content.This is a simple case. In our experience, it's not uncommon to see memory bandwidth use shrink by 50% in real content.
This is a problem in WebGPU because texture views are used in bind groups. In WebGPU, it is impossible to sample from a texture without a view.
One way to solve this would be to let
GPUBindingResource
be aGPUTexture
. However, that alone probably wouldn’t be sufficient because there are some types ofGPUTextureViews
which can’t be represented by justGPUTextures
, like cubemaps. Therefore, we should unifyGPUTextureViewDimension
andGPUTextureDimension
, and add anarrayLayerCount
intoGPUTextureDescriptor
(and possibly more).The text was updated successfully, but these errors were encountered: