Bump allocation for Uniform Buffers on WebGPU #5438

mvaligursky · 2023-06-26T11:32:32Z

Before this PR, each UniformBuffer would allocate its internal GPUBuffer storage, and per frame copy the CPU storage content to it using writeBuffer. This required many writeBuffer calls, which is expensive on both CPU and GPU time.

This PR implement more performant implementation. Under the hood, a one or more large (1MB) gpu buffers are allocated, and a pool of staging buffers of the same size. Individual uniform buffers allocate storage using bump allocator from the staging buffers. Then, just before the command buffers are submitted, a command buffer is added to execute first, which copies the used staging buffers to the gpu buffers.
Here's an example of used buffers for many example. Note that the number of staging buffers gets larger each time a command buffers are submitted, as they can no longer use already existing staging buffer.

This PR also cleans up some temporary solutions introduced in #5423 to limit the number of expensive submit commands per frame. Before, command buffer of each render pass would be submitted separately, while now those are batched to a very small number.

As an example, the shadow cascades example is using a single submit, first copying the staging buffers to gpu buffers, following by a single command buffer render all shadow cascade render passes, followed the the forward pass of the scene:

Multi view example similarly renders the whole scene using a single submit for all command buffers:

If there are texture uploads done in a frame (typically a very small number of places), for example in this case the bone texture used by the skinning, and clustered lights updated on CPU, we end up with two submits:

All rendering submitted from the update functions of the script are submitted separately for now (could be a single submit as well), for example reflection-cubemap example which renders the scene using a single submit, and does multiple texture reprojections using draQuadWithShader within the scripts:

Performance

CPU frame time for the hierarchy example with 5000 or so meshes:

WebGPU before: 57ms
WebGPU now: 48ms (15% improvement)
for comparison, WebGL time: 14ms

GPU times (these are based on the GPU duration reported by Chrome Profiler only, not sure about their reliability / what else they capture). I do not think this is reliable at all.

WebGPU before: 12.8ms
WebGPU now: 11.8ms
WebGL time: hard to estimate in browser, too many displayed bars.

src/platform/graphics/dynamic-buffers.js

Bump allocation for Uniform Buffers on WebGPU

4d134bc

mvaligursky self-assigned this Jun 26, 2023

mvaligursky marked this pull request as draft June 26, 2023 11:32

mvaligursky added feature request area: graphics Graphics related issue labels Jun 26, 2023

mvaligursky mentioned this pull request Jun 26, 2023

WebGPU Support #3986

Open

lint

09d2987

mvaligursky requested review from willeastcott, slimbuck and GSterbrant June 26, 2023 13:11

mvaligursky marked this pull request as ready for review June 26, 2023 13:11

willeastcott reviewed Jun 27, 2023

View reviewed changes

src/platform/graphics/dynamic-buffers.js Outdated Show resolved Hide resolved

willeastcott approved these changes Jun 27, 2023

View reviewed changes

types

c432a21

mvaligursky merged commit c30f8d8 into main Jun 27, 2023
7 checks passed

mvaligursky deleted the mv-dynamic-buffers branch June 27, 2023 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump allocation for Uniform Buffers on WebGPU #5438

Bump allocation for Uniform Buffers on WebGPU #5438

mvaligursky commented Jun 26, 2023 •

edited

Bump allocation for Uniform Buffers on WebGPU #5438

Bump allocation for Uniform Buffers on WebGPU #5438

Conversation

mvaligursky commented Jun 26, 2023 • edited

Performance

mvaligursky commented Jun 26, 2023 •

edited