Barriers proposal #1374

alan-baker · 2021-01-26T15:12:13Z

Workgroup Barrier

Add a single builtin function for barrier:

workgroupBarrier<storage_class_list>() -> void

workgroupBarrier is templated on a list of affected storage classes. storage_class_list is a comma-separated list of storage classes. The only valid storage classes for MVP are storage and workgroup.

workgroupBarrier is a control barrier with acquire_release memory ordering. That is, all memory, atomic, and barrier operations are ordered in program order relative to the barrier. Additionally, the affected memory and atomic operations program ordered before the barrier must be visible to all other threads in the workgroup before any affected memory or atomic operation program ordered after the barrier is executed by a member of the workgroup.

workgroupBarrier must only be used in compute shaders and must only be called from workgroup uniform control flow.

Translations

MSL

workgroupBarrier<storage>();
// Translates to:
// threadgroup_barrier(mem_flags::mem_device);

workgroupBarrier<workgroup>();
// Translates to:
// threadgroup_barrier(mem_flags::mem_threadgroup);

workgroupBarrier<storage,workgroup>();
// Translates to (assuming mem_flags is not a bit mask):
// threadgroup_barrier(mem_flags::mem_device);
// threadgroup_barrier(mem_flags::mem_threadgroup);

HLSL

workgroupBarrier<storage>();
// Translates to:
// DeviceMemoryBarrierWithGroupSync();

workgroupBarrier<workgroup>();
// Translates to:
// GroupMemoryBarrierWithGroupSync();

workgroupBarrier<storage,workgroup>();
// Translates to (slight over-approximation, includes textures):
// AllMemoryBarrierWithGroupSync();

SPIR-V

workgroupBarrier<storage>();
// Translates to:
// Execution scope is workgroup = %uint_2
// Memory scope is device = %uint_1
// Memory semantics are AcquireRelease | UniformMemory (0x8 | 0x40) = %uint_72
// OpControlBarrier %uint_2 %uint_1 %uint_72

workgroupBarrier<workgroup>();
// Translates to:
// Execution and memory scope are workgroup = %uint_2
// Memory semantics are AcquireRelease | WorkgroupMemory (0x8 | 0x100) = %uint_264
// OpControlBarrier %uint_2 %uint_2 %uint_264

workgroupBarrier<storage,workgroup>();
// Translates to:
// Execution scope is workgroup = %uint_2
// Memory scope is device = %uint_1
// Memory semantics are AcquireRelease | UniformMemory | WorkgroupMemory 
//   (0x8 | 0x40 | 0x100) = %uint_328
// OpControlBarrier %uint_2 %uint_1 %uint_328

Discussion

This proposal represents the intersection of functionality across the underlying implementations. Memory ordering is not exposed in the MVP since all barriers use acquire_release orderings. There is no separate memory barrier because MSL does not expose one. Translating a memory barrier to a control barrier is not ideal due to the necessity to require uniform control flow. A control-only (no memory) barrier was not included because there is no good translation into HLSL and its value is dubious.

Post-MVP, if read_write textures are supported the storage class list would need some way to include textures.

Subgroup barriers will be an interesting extension as MSL uses uniform subgroup barriers and Vulkan uses non-uniform subgroup barriers and subgroup barriers do not appear to be present in HLSL.

Survey

Barriers in MSL can only be used in kernel (compute) shaders. MSL provides two barrier functions: threadgroup_barrier and simdgroup_barrier. These are equivalent to OpControlBarrier with a Workgroup and Subgroup execution scope respectively. The synchronization is controlled by the mem_flags parameter on the barrier. mem_flags can have the following values:

mem_none - no memory operations are ordered
mem_device - orders device (storage, uniform) memory operations
mem_threadgroup - orders threadgroup (workgroup) memory operations
mem_simdgroup - orders simdgroup (subgroup) memory operations
mem_texture - orders texture memory operations
- Note: this only becomes available on macOS in Metal 1.2 and on iOS in Metal 2.0
- Not a problem to omit since WebGPU does not support reading and writing storage images in a single shader
mem_threadgroup_imageblock - orders threadgroup imageblock memory operations

All barriers are required to be executed in dynamically uniform control flow for threadgroup or simdgroup. MSL does not explicitly state that mem_flags can be or’d together like a bit mask (examples only use a single value). Since barriers order all affected memory operations they should have acquire_release memory order.

MSL does not appear to provide memory barriers, just control barriers.

Curiously, the documentation seems to indicate that simdgroup_barrier could be used in fragment shaders, but this is contradicted earlier in the specification.

HLSL

HLSL provides both memory barriers and control barriers.

Barriers:

AllMemoryBarrier
AllMemoryBarrierWithGroupSync
- Syncs all memory (workgroup, storage, uniform and texture)
DeviceMemoryBarrier
DeviceMemoryBarrierWithGroupSync
- Syncs device memory (storage, uniform and texture)
GroupMemoryBarrier
GroupMemoryBarrierWithGroupSync
- Syncs group memory (workgroup)

The WithGroupSync variants also act as control barriers with a Group (workgroup) execution scope. As such these variants are required to be executed in dynamically uniform control flow (workgroup).

All variants are only permitted in compute shaders except DeviceMemoryBarrier which is also permitted in Pixel (fragment) shaders. Since barriers order all affected memory operations they should have acquire_release memory order.

SPIR-V

Vulkan provides both memory (OpMemoryBarrier) and control (OpControlBarrier) barriers that are configurable in terms of affected execution scopes, storage classes and memory ordering.

Unlike other APIs, OpControlBarrier with a Subgroup execution scope is not a uniform barrier. That is, only active threads in the subgroup synchronize at the barrier. Because active threads are not well defined, it is difficult to specify exactly which threads will be involved in the synchronization.

Both memory and control barriers support the following memory orderings: None, Acquire, Release, and AcquireRelease. Acquire (and AcquireRelease) orders all loads to occur in program order relative to the barrier. Release (and AcquireRelease) orders all stores to occur in program order relative to the barrier.

Both memory and control barriers support synchronizing the following storage classes: UniformMemory (storage), SubgroupMemory, WorkgroupMemory (workgroup), ImageMemory (texture) and OutputMemory (output).

The Vulkan Memory Model additionally enables finer grained control of availability and visibility of memory operations. Other APIs assume availability and visibility are automatic.

The text was updated successfully, but these errors were encountered:

alan-baker · 2021-01-26T15:13:44Z

I'd be interested to here from Apple about whether Metal allows composable mem_flags.

kvark · 2021-01-26T15:27:44Z

Wonderful investigation and the proposal, thank you for contribution!
I'm only concerned about the "variable generics" part. It seems too heavy handed to drag in the advanced language feature here, not previously used in WGSL, just for this purpose. We should find an alternative syntax using more basic primitives.

alan-baker · 2021-01-26T15:34:30Z

Wonderful investigation and the proposal, thank you for contribution!
I'm only concerned about the "variable generics" part. It seems too heavy handed to drag in the advanced language feature here, not previously used in WGSL, just for this purpose. We should find an alternative syntax using more basic primitives.

We debated internally about that aspect. The major alternative considered was using enums and constexprs both of which are currently unspecified by WGSL. I think, long term, that would be good to add if a fully configurable barrier is ever added to WGSL (SPIR-V with the Vulkan memory model provides highly configurable barriers).

I generally prefer a single configurable function call, but obviously different languages take different routes.
Were you thinking something more akin to HLSL where storage classes are part of the function name?

RobinMorisset · 2021-02-16T18:48:37Z

See also:
#232
#27

Fixes gpuweb#1374 * Adds workgroupBarrier as a control barrier templated on affected storage classes

Fixes gpuweb#1374 * Adds workgroupBarrier as a control barrier templated on affected storage classes * Modifies func_call_statement grammar to allow limited templates

kdashg · 2021-03-05T20:14:01Z

WGSL meeting minutes 2021-02-16

(initial socialization)
JG: Sounds useful if you have atomics, but maybe useful otherwise
AG: Workgroup storage it’s useful, do preprocess and other invocations can read
MM: Scatter gather without atomics
RM: Issues about this topic a few months ago. How does this proposal match the thinking at the time?
AB: DN pointed at the issues and tried to account, but failed to link the right issues into the pull request.
MM: Angle brackets.
JG: Schedule for next week? Is it germane to the issue here?
MM: I think it can be adjusted as a post-process.
JG: Inclined to want to see spec for this as well.
MM: Think this is a good proposal. Exactly matches the barriers in WHLSL.
JG: Do folks want more time to think, or needs spec?
DS: Needs spec sounds good
JG: We’ll do that

* Add workgroupBarrier to WGSL Fixes #1374 * Adds workgroupBarrier as a control barrier templated on affected storage classes * Modifies func_call_statement grammar to allow limited templates * Fix link to program order * fix typo * Implement outcome of VF2F 2020-02-23 * Remove templated function and replace with two control barriers * workgroupBarrier - affects memory and atomics in workgroup * storageBarrier - affects memory and atomics in storage * both barriers synchroize with each other * remove stale sentence

@const

This PR syncs the CTS text to the spec text for the new @const decorations. Issue: gpuweb#2787

alan-baker added the wgsl WebGPU Shading Language Issues label Jan 26, 2021

kvark added the investigation label Jan 26, 2021

kdashg added this to Needs Discussion in WGSL Feb 16, 2021

kdashg moved this from Needs Discussion to Resolved: Needs Specification Work in WGSL Feb 16, 2021

alan-baker added a commit to alan-baker/gpuweb that referenced this issue Feb 18, 2021

Add workgroupBarrier to WGSL

ee23a81

Fixes gpuweb#1374 * Adds workgroupBarrier as a control barrier templated on affected storage classes

alan-baker added a commit to alan-baker/gpuweb that referenced this issue Feb 18, 2021

Add workgroupBarrier to WGSL

cf9d260

Fixes gpuweb#1374 * Adds workgroupBarrier as a control barrier templated on affected storage classes * Modifies func_call_statement grammar to allow limited templates

alan-baker mentioned this issue Feb 18, 2021

Add workgroupBarrier to WGSL #1449

Merged

kvark mentioned this issue Mar 16, 2021

Control barriers gfx-rs/naga#583

Closed

dneto0 closed this as completed in #1449 Mar 16, 2021

WGSL automation moved this from Resolved: Needs Specification Work to Done Mar 16, 2021

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022

[wgsl] Add missing @const decorations. (gpuweb#1374)

dae9dbb

This PR syncs the CTS text to the spec text for the new @const decorations. Issue: gpuweb#2787

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Barriers proposal #1374

Barriers proposal #1374

alan-baker commented Jan 26, 2021

alan-baker commented Jan 26, 2021

kvark commented Jan 26, 2021

alan-baker commented Jan 26, 2021

RobinMorisset commented Feb 16, 2021

kdashg commented Mar 5, 2021

Barriers proposal #1374

Barriers proposal #1374

Comments

alan-baker commented Jan 26, 2021

Workgroup Barrier

Translations

MSL

HLSL

SPIR-V

Discussion

Survey

HLSL

SPIR-V

alan-baker commented Jan 26, 2021

kvark commented Jan 26, 2021

alan-baker commented Jan 26, 2021

RobinMorisset commented Feb 16, 2021

kdashg commented Mar 5, 2021