Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barriers proposal #1374

Closed
alan-baker opened this issue Jan 26, 2021 · 5 comments · Fixed by #1449
Closed

Barriers proposal #1374

alan-baker opened this issue Jan 26, 2021 · 5 comments · Fixed by #1449
Labels
investigation wgsl WebGPU Shading Language Issues
Projects

Comments

@alan-baker
Copy link
Contributor

Workgroup Barrier

Add a single builtin function for barrier:

workgroupBarrier<storage_class_list>() -> void

workgroupBarrier is templated on a list of affected storage classes. storage_class_list is a comma-separated list of storage classes. The only valid storage classes for MVP are storage and workgroup.

workgroupBarrier is a control barrier with acquire_release memory ordering. That is, all memory, atomic, and barrier operations are ordered in program order relative to the barrier. Additionally, the affected memory and atomic operations program ordered before the barrier must be visible to all other threads in the workgroup before any affected memory or atomic operation program ordered after the barrier is executed by a member of the workgroup.

workgroupBarrier must only be used in compute shaders and must only be called from workgroup uniform control flow.

Translations

MSL

workgroupBarrier<storage>();
// Translates to:
// threadgroup_barrier(mem_flags::mem_device);

workgroupBarrier<workgroup>();
// Translates to:
// threadgroup_barrier(mem_flags::mem_threadgroup);

workgroupBarrier<storage,workgroup>();
// Translates to (assuming mem_flags is not a bit mask):
// threadgroup_barrier(mem_flags::mem_device);
// threadgroup_barrier(mem_flags::mem_threadgroup);

HLSL

workgroupBarrier<storage>();
// Translates to:
// DeviceMemoryBarrierWithGroupSync();

workgroupBarrier<workgroup>();
// Translates to:
// GroupMemoryBarrierWithGroupSync();

workgroupBarrier<storage,workgroup>();
// Translates to (slight over-approximation, includes textures):
// AllMemoryBarrierWithGroupSync();

SPIR-V

workgroupBarrier<storage>();
// Translates to:
// Execution scope is workgroup = %uint_2
// Memory scope is device = %uint_1
// Memory semantics are AcquireRelease | UniformMemory (0x8 | 0x40) = %uint_72
// OpControlBarrier %uint_2 %uint_1 %uint_72

workgroupBarrier<workgroup>();
// Translates to:
// Execution and memory scope are workgroup = %uint_2
// Memory semantics are AcquireRelease | WorkgroupMemory (0x8 | 0x100) = %uint_264
// OpControlBarrier %uint_2 %uint_2 %uint_264

workgroupBarrier<storage,workgroup>();
// Translates to:
// Execution scope is workgroup = %uint_2
// Memory scope is device = %uint_1
// Memory semantics are AcquireRelease | UniformMemory | WorkgroupMemory 
//   (0x8 | 0x40 | 0x100) = %uint_328
// OpControlBarrier %uint_2 %uint_1 %uint_328

Discussion

This proposal represents the intersection of functionality across the underlying implementations. Memory ordering is not exposed in the MVP since all barriers use acquire_release orderings. There is no separate memory barrier because MSL does not expose one. Translating a memory barrier to a control barrier is not ideal due to the necessity to require uniform control flow. A control-only (no memory) barrier was not included because there is no good translation into HLSL and its value is dubious.

Post-MVP, if read_write textures are supported the storage class list would need some way to include textures.

Subgroup barriers will be an interesting extension as MSL uses uniform subgroup barriers and Vulkan uses non-uniform subgroup barriers and subgroup barriers do not appear to be present in HLSL.

Survey

Barriers in MSL can only be used in kernel (compute) shaders. MSL provides two barrier functions: threadgroup_barrier and simdgroup_barrier. These are equivalent to OpControlBarrier with a Workgroup and Subgroup execution scope respectively. The synchronization is controlled by the mem_flags parameter on the barrier. mem_flags can have the following values:

  • mem_none - no memory operations are ordered
  • mem_device - orders device (storage, uniform) memory operations
  • mem_threadgroup - orders threadgroup (workgroup) memory operations
  • mem_simdgroup - orders simdgroup (subgroup) memory operations
  • mem_texture - orders texture memory operations
    • Note: this only becomes available on macOS in Metal 1.2 and on iOS in Metal 2.0
    • Not a problem to omit since WebGPU does not support reading and writing storage images in a single shader
  • mem_threadgroup_imageblock - orders threadgroup imageblock memory operations

All barriers are required to be executed in dynamically uniform control flow for threadgroup or simdgroup. MSL does not explicitly state that mem_flags can be or’d together like a bit mask (examples only use a single value). Since barriers order all affected memory operations they should have acquire_release memory order.

MSL does not appear to provide memory barriers, just control barriers.

Curiously, the documentation seems to indicate that simdgroup_barrier could be used in fragment shaders, but this is contradicted earlier in the specification.

HLSL

HLSL provides both memory barriers and control barriers.

Barriers:

  • AllMemoryBarrier
  • AllMemoryBarrierWithGroupSync
    • Syncs all memory (workgroup, storage, uniform and texture)
  • DeviceMemoryBarrier
  • DeviceMemoryBarrierWithGroupSync
    • Syncs device memory (storage, uniform and texture)
  • GroupMemoryBarrier
  • GroupMemoryBarrierWithGroupSync
    • Syncs group memory (workgroup)

The WithGroupSync variants also act as control barriers with a Group (workgroup) execution scope. As such these variants are required to be executed in dynamically uniform control flow (workgroup).

All variants are only permitted in compute shaders except DeviceMemoryBarrier which is also permitted in Pixel (fragment) shaders. Since barriers order all affected memory operations they should have acquire_release memory order.

SPIR-V

Vulkan provides both memory (OpMemoryBarrier) and control (OpControlBarrier) barriers that are configurable in terms of affected execution scopes, storage classes and memory ordering.

Unlike other APIs, OpControlBarrier with a Subgroup execution scope is not a uniform barrier. That is, only active threads in the subgroup synchronize at the barrier. Because active threads are not well defined, it is difficult to specify exactly which threads will be involved in the synchronization.

Both memory and control barriers support the following memory orderings: None, Acquire, Release, and AcquireRelease. Acquire (and AcquireRelease) orders all loads to occur in program order relative to the barrier. Release (and AcquireRelease) orders all stores to occur in program order relative to the barrier.

Both memory and control barriers support synchronizing the following storage classes: UniformMemory (storage), SubgroupMemory, WorkgroupMemory (workgroup), ImageMemory (texture) and OutputMemory (output).

The Vulkan Memory Model additionally enables finer grained control of availability and visibility of memory operations. Other APIs assume availability and visibility are automatic.

@alan-baker alan-baker added the wgsl WebGPU Shading Language Issues label Jan 26, 2021
@alan-baker
Copy link
Contributor Author

I'd be interested to here from Apple about whether Metal allows composable mem_flags.

@kvark
Copy link
Contributor

kvark commented Jan 26, 2021

Wonderful investigation and the proposal, thank you for contribution!
I'm only concerned about the "variable generics" part. It seems too heavy handed to drag in the advanced language feature here, not previously used in WGSL, just for this purpose. We should find an alternative syntax using more basic primitives.

@alan-baker
Copy link
Contributor Author

Wonderful investigation and the proposal, thank you for contribution!
I'm only concerned about the "variable generics" part. It seems too heavy handed to drag in the advanced language feature here, not previously used in WGSL, just for this purpose. We should find an alternative syntax using more basic primitives.

We debated internally about that aspect. The major alternative considered was using enums and constexprs both of which are currently unspecified by WGSL. I think, long term, that would be good to add if a fully configurable barrier is ever added to WGSL (SPIR-V with the Vulkan memory model provides highly configurable barriers).

I generally prefer a single configurable function call, but obviously different languages take different routes.
Were you thinking something more akin to HLSL where storage classes are part of the function name?

@kdashg kdashg added this to Needs Discussion in WGSL Feb 16, 2021
@RobinMorisset
Copy link
Contributor

See also:
#232
#27

@kdashg kdashg moved this from Needs Discussion to Resolved: Needs Specification Work in WGSL Feb 16, 2021
alan-baker added a commit to alan-baker/gpuweb that referenced this issue Feb 18, 2021
Fixes gpuweb#1374

* Adds workgroupBarrier as a control barrier templated on affected
  storage classes
alan-baker added a commit to alan-baker/gpuweb that referenced this issue Feb 18, 2021
Fixes gpuweb#1374

* Adds workgroupBarrier as a control barrier templated on affected
  storage classes
* Modifies func_call_statement grammar to allow limited templates
@kdashg
Copy link
Contributor

kdashg commented Mar 5, 2021

WGSL meeting minutes 2021-02-16
  • (initial socialization)
  • JG: Sounds useful if you have atomics, but maybe useful otherwise
  • AG: Workgroup storage it’s useful, do preprocess and other invocations can read
  • MM: Scatter gather without atomics
  • RM: Issues about this topic a few months ago. How does this proposal match the thinking at the time?
  • AB: DN pointed at the issues and tried to account, but failed to link the right issues into the pull request.
  • MM: Angle brackets.
  • JG: Schedule for next week? Is it germane to the issue here?
  • MM: I think it can be adjusted as a post-process.
  • JG: Inclined to want to see spec for this as well.
  • MM: Think this is a good proposal. Exactly matches the barriers in WHLSL.
  • JG: Do folks want more time to think, or needs spec?
  • DS: Needs spec sounds good
  • JG: We’ll do that

WGSL automation moved this from Resolved: Needs Specification Work to Done Mar 16, 2021
dneto0 pushed a commit that referenced this issue Mar 16, 2021
* Add workgroupBarrier to WGSL

Fixes #1374

* Adds workgroupBarrier as a control barrier templated on affected
  storage classes
* Modifies func_call_statement grammar to allow limited templates

* Fix link to program order

* fix typo

* Implement outcome of VF2F 2020-02-23

* Remove templated function and replace with two control barriers
  * workgroupBarrier - affects memory and atomics in workgroup
  * storageBarrier - affects memory and atomics in storage
  * both barriers synchroize with each other

* remove stale sentence
ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022
This PR syncs the CTS text to the spec text for the new @const decorations.

Issue: gpuweb#2787
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigation wgsl WebGPU Shading Language Issues
Projects
WGSL
Done
Development

Successfully merging a pull request may close this issue.

4 participants