Allow configuring whether workgroup memory is zero initialised #5508

DJMcNab · 2024-04-08T14:02:15Z

Connections

This is related to #5319 - a key reason initialisation takes so long is the 5x increase in pipeline creation time added by this feature
See also #4592, which this does not fix.
I'm planning on racing this PR with a fix for #4592, in the hope that I can get one of them in 0.20
(Both PRs should be useful, however.)

Description
Allows skipping zero initialisation of pipeline workgroup variables.

Backends:

Vulkan
DirectX
GLES
Metal

The underlying pipeline creation time issue is a really significant issue for adoption of Vello and Xilem, as it makes our projects take an extremely long time to start up.
This decreases pipeline creation time by a factor of (nearly) 5 on Android, which makes this a worthwhile tradeoff for us.

Renderer creation time with different modes (all values in millseconds). None corresponds to the new ZeroInitializeWorkgroupMemory::never(). This is on a Pixel 6

Threads	Single	Multi
Native	4228	1829
Polyfill	6350	2772
None	790	370

This is a breaking change, but this breaking change is shared with #5500

Testing

I have tested the improperly wired up version of this in Vello, to get the above numbers.

I have also tested this PR in Vello, which showed the expected decrease in startup time

Checklist

Run cargo fmt.
Run cargo clippy. If applicable, add:
- --target wasm32-unknown-unknown
- --target wasm32-unknown-emscripten
Run cargo xtask test to run tests - same story as Pipeline cache API and implementation for Vulkan #5319 - no new failures
Add change to CHANGELOG.md. See simple instructions inside file.

jimblandy

It's not great that we're adding this field that needs to be set in every pipeline descriptor anyone will ever write. But I understand that Vello wants to be a good neighbor when sharing Devices with other code, so we can't make it a device-wide flag.

Could we see an alternative version of this PR where the flag is on wgpu_core::pipeline::ShaderModuleDescriptor, rather than ComputePipelineDescriptor?

DJMcNab · 2024-04-08T15:51:13Z

Could we see an alternative version of this PR where the flag is on wgpu_core::pipeline::ShaderModuleDescriptor, rather than ComputePipelineDescriptor?

Do you mean as a seperate PR, or reusing this same one?

teoxoy · 2024-04-08T15:55:53Z

It's worth noting we don't have the max_compute_workgroup_storage_size limit fully implemented (it's not doing anything).
Nor the 16k "maximum byte-size of an array type instantiated in the workgroup address space" from the WGSL spec.

Are the shaders in question surpassing those limits?

teoxoy · 2024-04-08T16:00:16Z

Using a loop to do array initialization might also side-step the issue (if it doesn't get unrolled that is).

DJMcNab · 2024-04-08T16:01:32Z

As it happens, we are exactly on the limit for maxComputeWorkgroupStorageSize, based on a quick check (taking the values in https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/DirectX12.20extremely.20slow.20startup.20time/near/431594219 as my source of truth for that claim, we have 16 lots of 256 lots of 4 bytes). So maybe, if the 16384 is an exclusive maximum.

DJMcNab · 2024-04-08T16:03:09Z

Yes, as I say, fixing #4592 would also likely resolve this issue, as it would cause us to output probably ~24 instructions rather than 16384

But this is much easier to implement, and the actual zero initialisation is superfluous work anyway

Besides which, wgpu would still use the native zero initialisation method on Vulkan Android, which would mean we would still need to disable it. The native method is at least faster than the polyfill. But it would be slower if #4592 reaches the same result

DJMcNab · 2024-04-09T08:39:39Z

Moving into draft whilst I reconfigure as requested

CHANGELOG.md

DJMcNab · 2024-04-10T08:38:52Z

This now shares the breaking change with #5500, using the mechanism discussed on Matrix. So it is a breaking change, but the "same" breaking change was already needed from. CC @teoxoy , as you brought in the constants field which has been moved.

I'm now quite happy with how this API looks - it also removes the need for everyone to set constants, which was a bit weird as &Default::default

Stale

cwfitzgerald

Seems pretty uncontroversial!

This is a performance improvement for shader compilation. See gfx-rs/wgpu#5508

This is a performance improvement for shader compilation. See gfx-rs/wgpu#5508 --------- Co-authored-by: Lixou <82600264+DasLixou@users.noreply.github.com>

DJMcNab added 3 commits April 8, 2024 14:44

Wire up zero initialising workgroup memory

94bbb95

Implement for Vulkan

ed5bca6

Add safety comment

30884dd

DJMcNab requested a review from a team as a code owner April 8, 2024 14:02

DJMcNab added 6 commits April 8, 2024 15:19

Implement for GLES

9ae7c84

Fix intra-doc link

f888a6f

Add a changelog entry

3d56579

📎

bb45fd0

Fix the player tests

eb88e3b

Implement for DX12

23a22ab

jimblandy previously requested changes Apr 8, 2024

View reviewed changes

Move into a compilation options struct

859ba8b

DJMcNab marked this pull request as draft April 9, 2024 08:39

DJMcNab added 4 commits April 9, 2024 10:15

Checkpoint some docs

c3b4900

Move to a PipelineCompilationOptions model

1b76c29

Add a comment explaining the use of HashMap

7fb649a

Fix tests

feee2a4

DJMcNab marked this pull request as ready for review April 10, 2024 08:02

DJMcNab commented Apr 10, 2024

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

DJMcNab added 3 commits April 10, 2024 09:25

Fix changelog entry

9c9834f

Fix handling in DirectX12

5707f1f

Implement for the metal backend

8d0c5c7

DJMcNab requested a review from jimblandy April 10, 2024 08:36

cwfitzgerald self-requested a review April 10, 2024 23:52

Merge branch 'trunk' into zero-initialise-workgroup

0fc5c4f

DJMcNab mentioned this pull request Apr 11, 2024

Improve the polyfill for workgroup variable zero initialization #5521

Draft

6 tasks

Merge branch 'trunk' into zero-initialise-workgroup

f6b4266

cwfitzgerald approved these changes Apr 17, 2024

View reviewed changes

cwfitzgerald merged commit 965b00c into gfx-rs:trunk Apr 17, 2024
25 checks passed

DJMcNab deleted the zero-initialise-workgroup branch April 17, 2024 20:17

This was referenced Apr 17, 2024

Subgroup Operations #5301

Merged

Fix Merge Issues Between #5301 and #5508 #5549

Merged

cwfitzgerald added a commit that referenced this pull request Apr 17, 2024

Fix Merge Issues Between #5301 and #5508 (#5549)

c1291bd

DJMcNab mentioned this pull request Apr 22, 2024

[backdrop_dyn] Handle upstream pipeline failure linebender/vello#553

Merged

DJMcNab mentioned this pull request May 3, 2024

Winit example taking very long to startup on android linebender/vello#289

Closed

waywardmonkeys added a commit to waywardmonkeys/vello that referenced this pull request May 13, 2024

Disable zeroing workgroup memory for compute shaders.

ed8af50

This is a performance improvement for shader compilation. See gfx-rs/wgpu#5508

waywardmonkeys mentioned this pull request May 13, 2024

Disable zeroing workgroup memory for compute shaders. linebender/vello#575

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configuring whether workgroup memory is zero initialised #5508

Allow configuring whether workgroup memory is zero initialised #5508

DJMcNab commented Apr 8, 2024 •

edited

Loading

jimblandy left a comment

DJMcNab commented Apr 8, 2024

teoxoy commented Apr 8, 2024

teoxoy commented Apr 8, 2024

DJMcNab commented Apr 8, 2024

DJMcNab commented Apr 8, 2024 •

edited

Loading

DJMcNab commented Apr 9, 2024

DJMcNab commented Apr 10, 2024

cwfitzgerald left a comment

Allow configuring whether workgroup memory is zero initialised #5508

Allow configuring whether workgroup memory is zero initialised #5508

Conversation

DJMcNab commented Apr 8, 2024 • edited Loading

jimblandy left a comment

Choose a reason for hiding this comment

DJMcNab commented Apr 8, 2024

teoxoy commented Apr 8, 2024

teoxoy commented Apr 8, 2024

DJMcNab commented Apr 8, 2024

DJMcNab commented Apr 8, 2024 • edited Loading

DJMcNab commented Apr 9, 2024

DJMcNab commented Apr 10, 2024

cwfitzgerald left a comment

Choose a reason for hiding this comment

DJMcNab commented Apr 8, 2024 •

edited

Loading

DJMcNab commented Apr 8, 2024 •

edited

Loading