Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Texture views cause unnecessary memory bandwidth use #744

Closed
litherum opened this issue May 2, 2020 · 26 comments
Closed

Texture views cause unnecessary memory bandwidth use #744

litherum opened this issue May 2, 2020 · 26 comments
Projects

Comments

@litherum
Copy link
Contributor

litherum commented May 2, 2020

Creating a texture view in Metal requires the use of the MTLTextureUsagePixelFormatView usage on the original texture. Specifying this usage disables hardware lossless compression in modern iPhones. The reason is that the texture view can have a different pixel format than the original, and the lossless compression is specific to a particular pixel format. The result of specifying this flag when it isn’t really needed is that memory bandwidth use is unnecessarily increased.

The user-visible symptom of increased memory bandwidth use is:

  1. Worse performance, if the content is already memory bandwidth limited
  2. Worse battery life, as memory bandwidth use is a good indicator of power use by the GPU on iPhones
  3. A hotter device, which users don't like

In order to characterize this, you can run the “traditional deferred lighting” mode of this sample code with and without the MTLTextureUsagePixelFormatView flag. If you run it unmodified, an iPhone XS Max reads 40.32 MiB and writes 56.8 MiB per frame. If you then add the flag to the texture creation functions, this increases to 55.55 MiB read and 71.81 MiB written of memory per frame. Therefore, removing the flag causes a 23.7% reduction of bandwidth use on this content.

Screen Shot 2020-05-01 at 5 14 54 PM

This is a simple case. In our experience, it's not uncommon to see memory bandwidth use shrink by 50% in real content.

This is a problem in WebGPU because texture views are used in bind groups. In WebGPU, it is impossible to sample from a texture without a view.

One way to solve this would be to let GPUBindingResource be a GPUTexture. However, that alone probably wouldn’t be sufficient because there are some types of GPUTextureViews which can’t be represented by just GPUTextures, like cubemaps. Therefore, we should unify GPUTextureViewDimension and GPUTextureDimension, and add an arrayLayerCount into GPUTextureDescriptor (and possibly more).

@kainino0x
Copy link
Contributor

kainino0x commented May 2, 2020

The documentation of MTLTextureUsagePixelFormatView says:

This option allows you to call any of these methods of the given texture to create a texture view with a different component layout. (...)

The pixel layout is considered different if the number of components differs, or if their size or order is different from the components in the original pixel format.

GPUTextureViews don't have the capability to do that right now (the format can't be changed), so if I'm understanding correctly, MTLTextureUsagePixelFormatView should never be set. (Unfortunately, it looks like we do this incorrectly in Dawn right now.)

@kainino0x
Copy link
Contributor

kainino0x commented May 2, 2020

don't have the capability to do that right now

Sorry, there's an option for it in the descriptor, but the behavior isn't specified. It shouldn't be allowed by default (and I'm pretty sure Dawn doesn't allow it). I recall that it has come up in the past that we would need a flag on GPUTexture creation to allow reinterpretation of the format.

@Kangz
Copy link
Contributor

Kangz commented May 4, 2020

Thanks @litherum for the analysis! It provides a good rational for why we don't want to allow free-reinterpretation of formats. It is highly related to #168.

One way to solve this would be to let GPUBindingResource be a GPUTexture. However, that alone probably wouldn’t be sufficient because there are some types of GPUTextureViews which can’t be represented by just GPUTextures, like cubemaps. Therefore, we should unify GPUTextureViewDimension and GPUTextureDimension, and add an arrayLayerCount into GPUTextureDescriptor (and possibly more).

I'm not sure I agree with the conclusion because the issue here is the MTLTextureUsagePixelFormatView which is just for reinterpreting the format of of texture views, but doesn't matter for selecting sub-ranges of mipmaps / array layers or the dimensionality.

How about adding a flag at texture creation, or a list of formats the application would like to be able to reinterpret as? Having the list of formats would allow for better optimization when some reinterpret_casting is possible without breaking the texture compression.

@kvark
Copy link
Contributor

kvark commented May 4, 2020

Similar thoughts here. Great investigation by @litherum (thanks!), just not the conclusion I expected. I recall we already discussed the requirements for backends to reinterpret the formats back in the day, I think it's more than just Metal that would benefit from an explicit flag like @Kangz described.

@litherum
Copy link
Contributor Author

litherum commented May 4, 2020

I see. If there's a requirement that the view's pixel format has to match the texture's pixel format, and there are architectural reasons why this must be so (as described above), then I suppose the best solution would be to delete the GPUTextureFormat field from the GPUTextureViewDescriptor.

We can add the field back if we want to drop this requirement in the future.

@kainino0x
Copy link
Contributor

Metal only requires such a flag if the view format has a "different component layout". We should still be able to support reinterpretations with the same component layout.

The pixel layout is considered different if the number of components differs, or if their size or order is different from the components in the original pixel format.

Don't set this option if the texture view just needs to read the component values in a different order. Instead, create a texture view using a swizzle pattern to specify the new order. Similarly, don’t set this option if your texture view only converts between linear space and sRGB; for example, if your texture uses the MTLPixelFormatRGBA8Unorm pixel format and your texture view uses MTLPixelFormatBGRA8Unorm_sRGB.

@kainino0x kainino0x added this to Needs Discussion in Main May 18, 2020
@kainino0x
Copy link
Contributor

I guess this is "Needs Proposal" since a proposal for "valid view formats list" was requested.

@kainino0x kainino0x moved this from Needs Discussion to Needs Investigation/Proposal or Revision in Main May 18, 2020
@litherum
Copy link
Contributor Author

Yep. I'm in the middle of coming up with a proposal now.

@Kangz
Copy link
Contributor

Kangz commented Jun 3, 2020

As discussed in the meeting, let's investigate the exact mechanisms of texture format reinterpretation. It's kind of a duplicate with #168 but worth doing again imho.

Vulkan

Vulkan has two texture creation flags that control whether formats can be reinterpreted:

  • VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT that says some reinterpretation will be possible inside format compatibility classes (see below), and also allows creating single-plane views of planar format with a format compatible with that plane's format.
  • VK_IMAGE_CREATE_BLOCK_TEXEL_VIEW_COMPATIBLE_BIT requires that VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT is set and allows creating texture views of block-compressed formats with a size-compatible format (i.e. the texel size is the same size in bytes as the texel block for the compressed format).

Vulkan format compatibility classes (expressed in WebGPU formats) are the following:

8-bit formats:

    "r8unorm",
    "r8snorm",
    "r8uint",
    "r8sint",

16-bit formats

    "r16uint",
    "r16sint",
    "r16float",
    "rg8unorm",
    "rg8snorm",
    "rg8uint",
    "rg8sint",

32-bit formats:

    "r32uint",
    "r32sint",
    "r32float",
    "rg16uint",
    "rg16sint",
    "rg16float",
    "rgba8unorm",
    "rgba8unorm-srgb",
    "rgba8snorm",
    "rgba8uint",
    "rgba8sint",
    "bgra8unorm",
    "bgra8unorm-srgb",
    // Packed 32-bit formats
    "rgb10a2unorm",
    "rg11b10float",

64-bit formats:

    "rg32uint",
    "rg32sint",
    "rg32float",
    "rgba16uint",
    "rgba16sint",
    "rgba16float",

128-bit formats:

    "rgba32uint",
    "rgba32sint",
    "rgba32float",

Depth and stencil formats cannot be reinterpreted:

    // Each is its own compatibility class.
    "depth32float",
    "depth24plus",
    "depth24plus-stencil8"

@RafaelCintron @litherum please help produce similar investigations for D3D12 and Metal.

@damyanp
Copy link

damyanp commented Jun 5, 2020

Direct3D 12

The D3D documentation refers to this sort of reinterpretation as casting. Most of the details can be found in the D3D11 Functional Spec and are still relevant to D3D12. Of particular interest here is the discussion around typeless and typed memory. An explicit table showing the various formats is provided in D3D11_3_Formats_FL11_1.xls.

To summarize the rules:

  • Resources created with _TYPELESS can be accessed with compatible resource views using a compatible typed format as long as the number components and number of bits per-component are identical.
  • Depending on the value of the CastingFullyTypedFormatSupported:
    • CastingFullyTypedFormatSupported == true: Resources created with the typed formats can be cast to any compatible type
    • CastingFullyTypedFormatSupported == false: Resources created with the typed formats can only be accessed with exactly matching reosurce view formats (so no casting is supported)
  • _TYPELESS is potentially less performant if the resource is only accessed using a single typed format. (Edit: clarified that CastingFullyTypedFormatSupported does not give any promises about performance)

CastingFullyTypedFormatSupported is required for any WDDM2.2 driver; this corresponds to any driver that supports Windows 10 version 1703 or higher.

Here's the list of D3D12 format compatbility classes in terms of WebGPU formats:

// 8-bit formats (DXGI_FORMAT_R8_TYPELESS)
"r8unorm",
"r8snorm",
"r8uint",
"r8sint",

// 16-bit formats (DXGI_FORMAT_R16_TYPELESS)
"r16uint",
"r16sint",
"r16float",

// 16-bit formats (DXGI_FORMAT_R8G8_TYPELESS)
"rg8unorm",
"rg8snorm",
"rg8uint",
"rg8sint",

// 32-bit formats (DXGI_FORMAT_R32_TYPELESS)
"r32uint",
"r32sint",
"r32float",

// 32-bit formats (DXGI_FORMAT_R16G16_TYPELESS)
"rg16uint",
"rg16sint",
"rg16float",
// note: d3d also supports unorm & int variants here

// 32-bit formats (DXGI_FORMAT_R8G8B8A8_TYPELESS)
"rgba8unorm",
"rgba8unorm-srgb",
"rgba8snorm",
"rgba8uint",
"rgba8sint",

// 32-bit formats (DXGI_FORMAT_B8G8R8A8_TYPELESS)
"bgra8unorm",
"bgra8unorm-srgb",

// 64-bit formats (DXGI_FORMAT_R32G32B32A32_TYPELESS)
"rg32uint",
"rg32sint",
"rg32float",

// 64-bit formats (DXGI_FORMAT_R16G16B16A16_TYPELESS)
"rgba16uint",
"rgba16sint",
"rgba16float",
// note: d3d also supports snorm & unorm variants

// 128-bit formats (DXGI_FORMAT_R32G32B32_TYPELESS)
"rgba32uint",
"rgba32sint",
"rgba32float",

// Depth and stencil formats (DXGI_FORMAT_R32G8X24_TYPELESS)
"depth32float", // DXGI_FORMAT_D32_FLOAT
"depth24plus",  // DXGI_FORMAT_D32_FLOAT
"depth24plus-stencil8" // DXGI_FORMAT_D32_FLOAT_S8X24_UINT

@kvark
Copy link
Contributor

kvark commented Jun 15, 2020

@damyanp

Resources created with _TYPELESS can be accessed with compatible resource views using a compatible typed format as long as the number components and number of bits per-component are identical.
Here's the list of D3D12 format compatbility classes in terms of WebGPU formats:

These would apply to SRV and RTV/DSV only.
For UAV, it appears to be a bit more relaxed: we can view any 32bit format as R32, according to this.

@kainino0x
Copy link
Contributor

kainino0x commented Jun 16, 2020

I tried to compile all this info into a spreadsheet, and I also tried to summarize the (mostly fairly simple) rules from Metal, but the result is quite complicated and I'm not totally sure it's right.

https://docs.google.com/spreadsheets/d/1PRiOja_AVse0QuB6rDH_YzidGw5fucQIRz0md9Sg4m8/edit?usp=sharing

Please request edit access if you would like to make any changes.

@RafaelCintron
Copy link
Contributor

@kainino0x , thank you for putting together this spreadsheet.

One aspect of @damyanp's reply that I am not 100% on is the following:

// Depth and stencil formats (DXGI_FORMAT_R32G8X24_TYPELESS)
"depth32float", // DXGI_FORMAT_D32_FLOAT
"depth24plus", // DXGI_FORMAT_D32_FLOAT
"depth24plus-stencil8" // DXGI_FORMAT_D32_FLOAT_S8X24_UINT

It would be unfortunate if depth24plus-stencil8 was always implemented with D32_FLOAT_S8X24_UINT instead of D24_UNORM_S8_UINT. Hence, I think the "depth24plus*" textures should be group together from a compatibility perspective and "depth32float" should be grouped with the "r32*" formats.

Of course, if web developers only test on platforms where "plus" textures give them more bits of precision, we may be forced to provide the same number of bits everywhere to keep content portable.

@kainino0x
Copy link
Contributor

Updated the spreadsheet to reflect that.

I don't think it ultimately matters, because I don't think Vulkan will allow us to have any conversions between these formats. And on top of that, it seems potentially dodgy to make views that view the "X" parts of formats with unused bits.

@kainino0x
Copy link
Contributor

Also updated the D3D12 and Metal spreadsheets to reflect that views shouldn't view X bits.

@damyanp
Copy link

damyanp commented Jun 20, 2020

Good catch Rafael, I'm not sure how I ended up with the depth/stencil in that state as it doesn't match the methodology I was using to generate the rest of the table at all.

@litherum
Copy link
Contributor Author

I don't have access to modify that spreadsheet, but it appears the Metal page is incorrect. I wrote a sample program to determine the correct data, and here is the results: Metal Pixel Formats.zip

@litherum
Copy link
Contributor Author

@Kangz

VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT that says some reinterpretation will be possible inside format compatibility classes

Does that mean that, without this bit, all views must have the same format as their underlying texture?

Does setting this bit have any performance implications?

(If the answer to both these questions is "yes" then that means Vulkan's spreadsheet truth table thing is identical to Metal's.)

@litherum
Copy link
Contributor Author

Also: Does setting this bit require any extensions to be enabled in Vulkan?

@litherum
Copy link
Contributor Author

(If the answer to both these questions is "yes" then that means ...)

Assuming the answer is "yes," (and no Vulkan extensions are required), then I think the proposal would be:

Add a flag to GPUTextureUsage called allowsSomeViewFormatReinterpretation (name up for bikeshedding). When this flag is not supplied by the author, texture views must have the same format as their owning texture. When the flag is supplied by the author, then some other texture formats are allowed. The set of allowed texture formats is the intersection of the green(ish) boxes in the spreadsheet (meaning - reinterpretation between two formats is allowed iff all of Metal, Vulkan, and D3D can perform the reinterpretation).

The flag makes sense because of the performance vs flexibility tradeoff. And, if the flag is supplied, we need to use the intersection of the greenish boxes, because we can't expand that set while also maintaining implementability on all 3 native APIs.

@Kangz
Copy link
Contributor

Kangz commented Jun 22, 2020

I don't have access to modify that spreadsheet, but it appears the Metal page is incorrect. I wrote a sample program to determine the correct data, and here is the results: Metal Pixel Formats.zip

Whoops, I'm not sure I have access either, but I put the table in sheets so it's easy to view.

@litherum
Copy link
Contributor Author

litherum commented Jun 22, 2020

Discussed today.

Resolution: Accept the proposal. @kvark to come up with a proposal for how we can use the yellow box in the D3D spreadsheet. (Until then, don't use the yellow box.)

@litherum
Copy link
Contributor Author

Updating from macOS Catalina 19F to 19G caused the macOS spreadsheet to change slightly. Now, converting between RGBA8Unorm and RGBA8Unorm_sRGB is free in either direction, and converting between BGRA8Unorm and BGRA8Unorm_sRGB is free in either direction.

Given that Vulkan hasn't changed at all, I don't think this affects the decision we make in WebGPU. I just wanted to mention it because there was some confusion during the call about how the Metal documentation seemed to disagree with the Metal validation layers. In 19G, they don't disagree.

Screen Shot 2020-07-20 at 10 52 27 PM

@kainino0x
Copy link
Contributor

kainino0x commented Aug 18, 2020

Updated spreadsheet with rgb9e5 (#975) and the info from Catalina 19G+ (^)

@kainino0x kainino0x moved this from Needs Investigation/Proposal or Revision to Needs Specification in Main Aug 18, 2020
@kainino0x
Copy link
Contributor

Given the resolution from the Vulkan investigation it seems like there are no formats that can be freely reinterpreted without a "reinterpretable" flag. Adding a "reinterpretable" flag is #168. So, the only specification action that comes out of this issue is something that says "texture view format must equal texture format" until #168 is done.

@litherum
Copy link
Contributor Author

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Main
Specification Done
Development

No branches or pull requests

6 participants