Investigation: Variable Rasterization Rate #450

litherum · 2019-10-01T03:48:43Z

Traditionally, the density of fragment shader invocations for a given triangle is constant across the framebuffer. Variable Rasterization Rate is a feature that can run more fragment shader invocations in areas of high importance, and fewer fragment shader invocations in areas of low importance.

Motivation

The main motivation for variable rasterization rate is to increase performance by decreasing the total number of fragment shader invocations. This means that using this feature is “lossy” in that it negatively effects the resulting rendering. However, the density of fragment shader invocations is finely tunable, which means that its effect can be tweaked to find a good balance between performance and quality.

Apple released a Metal sample recently which uses variable rate rasterization on the Amazon Lumberyard Bistro scene. It includes a flag which can be used to trivially turn variable rasterization rate on and off. The sample is open source, so it’s easy to add tracking information to gather frame times. Here are the results:

Using variable rasterization rate for this scene causes a 31% reduction in frame time (which corresponds to a 45% increase in fps). Here are the images of variable rasterization rate on and off, so you can judge the quality:

With RRM:

Without RRM:

To me, the visual quality looks quite comparable.

Difficulty

This feature has lots of effects in various parts of the API. It affects things like sample shading, multisampling, and shader derivative functions. Fully specifying this and finding the parts that are interoperable will take time.

Vulkan

VK_NV_shading_rate_image and SPV_NV_shading_rate: 4% Windows, 2% Linux, 0% Android. The shading rate is represented as a uint8 texture, where each texel represents a 16x16 block of pixels in the framebuffer. The texture itself isn’t put into bind groups like other textures; instead, it’s just bound with a special-purpose command enqueued into the command buffer.

The meaning of each texel is an index into a palette. The palette is defined at pipeline creation time, and if the device supports it, can be modified by a command enqueued into the command buffer. The palette itself is a 1d array of enums, where each enum defines a particular rate (e.g. VK_SHADING_RATE_PALETTE_ENTRY_2_INVOCATIONS_PER_PIXEL_NV).

Optionally, you can also configure the sample locations for each of these enum values.

D3D12

There are two tiers of support. The first tier only lets you set a single scalar (via a command list command) which affects the rasterization rate for any successive draw commands.

The second tier lets you bind a texture as a rate map via a command list command. Each texel in the texture represents a tile in the framebuffer, and you can query the tile size. There’s no “palette” like in Vulkan. The values in the rate map represent the enum density enum values themselves.

Interestingly, you can’t render to the texture being used as the rate map. The only ways you can populate it are to write into it via a UAV, or copy into it.

Also, the second tier adds support for SV_ShadingRate, a semantic emitted by the vertex / geometry shader which controls the rasterization rate at a per-triangle granularity. It’s present in Shader Model 6.4.

Because there are 3 different ways of specifying rasterization rates (a per-draw-call scalar, a rate map in a texture, and a per-triangle semantic), D3D includes a system of “combiners” which combine these 3 signals into a final rasterization rate. These combiners are configurable.

Metal

Support is only present in iOS, not macOS. You can create a MTLRasterizationRateMap, but this doesn’t hold a texture. Instead, you give it two 1-D arrays of scalar values. These scalar values can vary between 1.0, which represents full resolution shading, and 0.0, which represents minimum resolution shading. The arrays get mapped (stretched) across the width and height of the framebuffer, respectively. Therefore, for a particular region of the framebuffer, that region’s rate is proportional to one of the scalars in the first array and another one fo the scalars in the second array.

In order to use the rate map, you have to do a two-pass algorithm. In the first pass, you render your scene into a small texture using the rate map. Then, for the second pass, you render your scene into the normal-sized framebuffer, and sample from the output texture of the first pass.

You ask the rate map for the size of that intermediate texture, and use the rate map by specifying it in the render pass descriptor. If you look at this texture directly, straight lines in the image will look curved because the rasterization rate changes throughout the image. In the second pass, you have to know where to sample the small texture in the correct place. You do this by telling the rate map to copy itself into a MTLBuffer, and binding the buffer as a resource in the shader. The shading language includes a new opaque type, rasterization_rate_map_data, which exposes an API to let you convert between arbitrary points on the real destination and the input texture.

Recommendation

The palette only exists in Vulkan. The per-draw-call rasterization scalar can’t be represented in Metal without interrupting the render pass. The per-triangle HLSL semantic also can’t be represented in Metal. The texture-based rasterization map also can’t be represented in Metal, because Metal only lets you specify two 1-D arrays. However, the Metal approach can be represented in both Vulkan and D3D by expanding those two 1D arrays into a full texture.

In the interest of interoperability, having a single WebGPU extension that works the same way on all platforms / native APIs is valuable. A good place to start would be a single extension that represents the Metal feature set because it can provide benefit on all platforms / native APIs. The two passes would probably be handled similarly to how we handle multisample resolves today, except the runtime would allocate the intermediate texture. If there is demand for the non-portable features of the other APIs (e.g. the palette, or the per-triangle semantic), then we can consider those for additional extensions.

The text was updated successfully, but these errors were encountered:

Kangz · 2023-09-08T12:13:16Z

This issue tracker is for WebGPU only, another forum should be used to ask questions about Metal demos / samples. Try Apple support maybe?

litherum added the investigation label Oct 1, 2019

Kangz added this to the post-V1 milestone Sep 2, 2021

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022

add todos in createView tests (gpuweb#450)

5d8fdc2

akioCL mentioned this issue Oct 17, 2022

Proposed RFC Feature: Variable Rate Shading o3de/sig-graphics-audio#87

Closed

kainino0x added the api WebGPU API label Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation: Variable Rasterization Rate #450

Investigation: Variable Rasterization Rate #450

litherum commented Oct 1, 2019 •

edited

Kangz commented Sep 8, 2023

Investigation: Variable Rasterization Rate #450

Investigation: Variable Rasterization Rate #450

Comments

litherum commented Oct 1, 2019 • edited

Motivation

Difficulty

Vulkan

D3D12

Metal

Recommendation

Kangz commented Sep 8, 2023

litherum commented Oct 1, 2019 •

edited