From 060fec1e3bd822153805574b63659ed80eadba01 Mon Sep 17 00:00:00 2001 From: ArielG-NV Date: Tue, 15 Jul 2025 00:33:13 -0700 Subject: [PATCH 1/8] Amend proposal as per new guidelines of design provided --- ...-coherent-pointer-operations-and-access.md | 314 ++++++++++++++++++ proposals/031-coherent-pointers.md | 221 ------------ 2 files changed, 314 insertions(+), 221 deletions(-) create mode 100644 proposals/031-coherent-pointer-operations-and-access.md delete mode 100644 proposals/031-coherent-pointers.md diff --git a/proposals/031-coherent-pointer-operations-and-access.md b/proposals/031-coherent-pointer-operations-and-access.md new file mode 100644 index 0000000..e28410e --- /dev/null +++ b/proposals/031-coherent-pointer-operations-and-access.md @@ -0,0 +1,314 @@ +# SP\#031: Coherent Pointer Operations & Pointer Access + +## Status + +Status: Design Review +Implementation: +Author: Ariel Glasroth +Reviewer: + +## Background + +### Introduction + +GPUs have a concept known as coherent operations. Coherent operations flush cache for reads/writes so that when **thread A** modifies memory, **thread B** may read that memory, seeing all changes to memory done by a different thread. When flushing cache it is important to note that not all caches will be flushed. If a user wants coherence to `WorkGroup` memory, only the levels of cache up to `WorkGroup` memory will need to be flushed. + +Additionally, pointers have a topic called 'access', the ability to mark a pointer as read-only or read-write. A read-only pointer is immutable (unable to modify the data pointed to), a read-write pointer allows reading & writing to the data the pointer points at. + +### Prior Implementations Of Coherence + +* HLSL – `globallycoherent` keyword can be added to declarations (`globallycoherent RWStructuredBuffer buffer`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. `groupshared` objects are likely coherent, specification does not specify. +* GLSL – `coherent` keyword can be added to declarations (`coherent uniform image2D img`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is unspecified. Objects tagged with `shared` are [implicitly](https://www.khronos.org/opengl/wiki/Compute_Shader) `coherent`. +* Metal – `memory_coherence::memory_coherence_device` is a generic argument to buffers. This argument ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. +* WGSL – [All operations](https://www.w3.org/TR/WGSL/#private-vs-non-private) are coherent. + +### SPIR-V Support For Coherence + +Originally, SPIR-V supported coherent objects through the `coherent` type `Decoration`. Modern SPIR-V (`VulkanMemoryModel`) exposes this functionality differently to control coherence as a **per operation** functionality. Coherence is now done per operation through adding the memory operands `MakePointerAvailable`, `MakePointerVisible`, `MakeTexelAvailable`, and `MakeTexelVisible` to load and store operations. Users must additionally specify the memory scope to which an operation is coherent. + +`MakePointerAvailable` is for memory stores of non textures, `OpStore`, `OpCooperativeMatrixStoreKHR` and `OpCooperativeVectorStoreNV`.. + +`MakePointerVisible` is for memory loads of non textures, `OpLoad`, `OpCooperativeMatrixLoadKHR` and `OpCooperativeVectorLoadNV`. + + `MakeTexelAvailableKHR` is for memory stores of textures, `OpImageWrite`, `OpImageSparseLoad`. + + `MakeTexelVisibleKHR` is for memory loads of textures, `OpImageRead`, `OpImageSparseRead`. + +Additionally, `MakePointer{Visible,Available}` support usage in `OpCopyMemory` and `OpCopyMemorySized`. + +### Example + +The simple use-case of this feature can be modeled with the following example: (1) We have **thread1** and **thread2** both reading/writing to the same `RWStructuredBuffer`. (2) **thread1** `OpStore`’s non-coherently into the buffer. (3) if **thread2** uses an `OpLoad` on the texture they may not see the change **thread1** made for 2 reasons: + +1) **thread1** does not promise its writes are visible to other threads. Cached writes may not immediately flush to device memory. +2) **thread2** may load from a cache, not device memory. This means we will not see the new value because the new value was written to device memory, not the intermediate cache. + +If we specify `MakePointerAvailable/MakePointerVisible` with `OpStore`/`OpLoad` to the memory scope `QueueFamily` we will solve this problem since we are flushing changes to a memory scope shared by the two threads. This additionally may be faster than the alternative of flushing to `Device` memory since a `QueueFamily` is a tighter scope than `Device`. + +### Prior Implementations Of Access + +* C/C++ – `const int* ptr` or `int const*` both mean the underlying data pointed to is constant + * This will be equivlent to `Access::Read` + +### Compiler Support For Coherence + +This is currently planned to be a high-level concept which does not map to anything in SPIR-V. + +## Proposed Solution + +### Frontend For Pointer Access + +Pointer Access will be implemented through a new generic-argument `Access access` on our `Ptr` data-type. +We will also expose pointer `AddressSpace`s. + +```c# +enum Access : uint64_t +{ + ReadWrite = 0, + Read = 1 + //... +} + +enum AddressSpace : uint64_t +{ + Generic = 0x7fffffff, + // Corresponds to SPIR-V's SpvStorageClassPrivate + ThreadLocal = 1, + Global, + // Corresponds to SPIR-V's SpvStorageClassWorkgroup + GroupShared, + // Corresponds to SPIR-V's SpvStorageClassUniform + Uniform, + // specific address space for payload data in metal + MetalObjectData, + // Corresponds to SPIR-V's SpvStorageClassInput + Input, + // Same as `Input`, but used for builtin input variables + BuiltinInput, + // Corresponds to SPIR-V's SpvStorageClassOutput + Output, + // Same as `Output`, but used for builtin output variables + BuiltinOutput, + // Corresponds to SPIR-V's SpvStorageClassTaskPayloadWorkgroupEXT + TaskPayloadWorkgroup, + // Corresponds to SPIR-V's SpvStorageClassFunction + Function, + // Corresponds to SPIR-V's SpvStorageClassStorageBuffer + StorageBuffer, + // Corresponds to SPIR-V's SpvStorageClassPushConstant + PushConstant, + // Corresponds to SPIR-V's SpvStorageClassRayPayloadKHR + RayPayloadKHR, + // Corresponds to SPIR-V's SpvStorageClassIncomingRayPayloadKHR + IncomingRayPayload, + // Corresponds to SPIR-V's SpvStorageClassCallableDataKHR + CallableDataKHR, + // Corresponds to SPIR-V's SpvStorageClassIncomingCallableDataKHR + IncomingCallableData, + // Corresponds to SPIR-V's SpvStorageClassHitObjectAttributeNV + HitObjectAttribute, + // Corresponds to SPIR-V's SpvStorageClassHitAttributeKHR + HitAttribute, + // Corresponds to SPIR-V's SpvStorageClassShaderRecordBufferKHR + ShaderRecordBuffer, + // Corresponds to SPIR-V's SpvStorageClassUniformConstant + UniformConstant, + // Corresponds to SPIR-V's SpvStorageClassImage + Image, + // Represents a SPIR-V specialization constant + SpecializationConstant, + // Corresponds to SPIR-V's SpvStorageClassNodePayloadAMDX + NodePayloadAMDX, + // Default address space for a user-defined pointer + UserPointer = 0x100000001ULL, +}; + +__generic +struct Ptr +{ + //... +} +``` + +If a pointer is `Access::Read`, a user program may only read from the given pointer. If a pointer is `Access::ReadWrite`, a user program may read from a given pointer or write to it. + +### Frontend For Coherent Pointer Operations + +We propose to implement coherence on a per-operation level for SPIR-V targets. This will be accomplished through new intrinsic methods to handle coherent load/store on a per-operation basis. + +```c# +public enum MemoryScope : int32_t +{ + CrossDevice = 0, + Device, + Workgroup, + Subgroup, + Invocation, + QueueFamily, + ShaderCall, + //... +} + +// `ptr` is the value to be loaded. +// The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to. +// The `int alignment` parameter controls the alignment to load from a pointer with. +[ForceInline] +[require(SPV_KHR_vulkan_memory_model)] +__generic +T loadCoherent(Ptr ptr, int alignment, constexpr MemoryScope scope = MemoryScope::Device); + +// Return a `CoopVec`, loaded from `ptr`. +[ForceInline] +[require(SPV_KHR_vulkan_memory_model, cooperative_vector)] +__generic +CoopVec coopVecLoadCoherent(Ptr ptr, int offset, int alignment, constexpr MemoryScope scope = MemoryScope::Device); + +// Return a `CoopMat`, loaded from `ptr`. +[ForceInline] +[require(SPV_KHR_vulkan_memory_model, cooperative_matrix)] +__generic< + T : __BuiltinArithmeticType, + let S : MemoryScope, + let M : int, + let N : int, + let R : CoopMatMatrixUse, + let matrixLayout : CoopMatMatrixLayout, + Access access, + AddressSpace addrSpace> +CoopMat coopMatLoadCoherent(Ptr ptr, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device); + +// `ptr` is the dst for the store. +// `val` is the value to store into `ptr`. +// The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to. +// The `int alignment` parameter controls the alignment to load from a pointer with. +[ForceInline] +[require(SPV_KHR_vulkan_memory_model)] +__generic +void storeCoherent(Ptr ptr, T val, int alignment, constexpr MemoryScope scope = MemoryScope::Device); + +// Store into `ptr` given `val`. +[ForceInline] +[require(SPV_KHR_vulkan_memory_model, cooperative_vector)] +__generic +void coopVecStoreCoherent(Ptr ptr, CoopVec val, int offset, int alignment, constexpr MemoryScope scope = MemoryScope::Device); + +// Store into `ptr` given `val`. +[ForceInline] +[require(SPV_KHR_vulkan_memory_model, cooperative_matrix)] +__generic< + T : __BuiltinArithmeticType, + let S : MemoryScope, + let M : int, + let N : int, + let R : CoopMatMatrixUse, + let matrixLayout : CoopMatMatrixLayout, + AddressSpace addrSpace> +void coopMatLoadCoherent(Ptr ptr, CoopMat val, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device); +``` + +### Support For Coherent Workgroup Memory + +Any access through a coherent-pointer to a `groupshared` object is coherent; Since Slang does not currently support pointers to `groupshared` memory, this proposal will extend the existing `AddressSpace::GroupShared` implementation for pointers as needed. + +### Casting Pointers + +All pointers can be casted to each other. Casting must be explicit. + +### Banned keywords + +HLSL style `globallycoherent T*` and GLSL style `coherent T*` will be disallowed. + +`const T*` and `T* const` will be disallowed. + +### Explicitly allowed keywords + +`const Ptr` is permitted. This means that a `Ptr` is constant, the address the pointer is pointing at will not change. + +### Order of Implementation + +* Frontend for pointer changes +* Logic for pointer access +* Support casting explicitly between pointers +* Disallow `globallycoherent T*` and `coherent T*` +* Disallowed `const T*` and `T* const` +* Support for coherent buffers and textures +* Support for workgroup memory pointers. +* Support for coherent workgroup memory +* Support for coherent cooperative matrix & cooperative vector + +## Alternative Designs Considered + +1. Using special methods (part of the `Ptr` type) to access coherent-operation functionality + +```c# +T* ptr1 = bufferPtr1; +T* ptr2 = bufferPtr2; +var loadedData = coherentLoad(ptr1, scope = MemoryScope::Device); +coherentStore(ptr2, loadedData, scope = MemoryScope::Device); +``` + +2. Tagging types as coherent through a modifier + +```c# +// Not allowed: +globallycoherent RWStructuredBuffer bufferPtr1 : register(u0); + +cbuffer PtrBuffer +{ + // We only allow coherent on pointers + globallycoherent int* bufferPtr1; +} +``` + +3. ‘OOP’ approach, get a `CoherentPtr` from a regular `Ptr`. Any operation on a `CoherentPtr` will use the Coherent variant of a store/load. + +```c# +[require(SPV_KHR_vulkan_memory_model)] +void computeMain() +{ + int* ptr = gmemBuffer; + CoherentPtr ptr_workgroup = CoherentPtr(gmemBuffer); + CoherentPtr ptr_device = CoherentPtr(gmemBuffer); + + + ptr_workgroup[0] = output[1]; + ptr_workgroup = ptr_workgroup + 1; + output[2] = ptr_workgroup[0]; + ptr_workgroup = ptr_workgroup - 1; + output[3] = ptr_workgroup[3]; + + + ptr[10] = 10; + + ptr_device = ptr_device + 3; + gmemBuffer[0] = 10; + ptr_device[3] = output[3]; +} +``` + +4. Modifier with parameter to specify memory-scope + +```c# +cbuffer PtrBuffer +{ + int* bufferPtr1; +} +int main() +{ + coherent int* bufferPtrWorkgroup = bufferPtr1; + coherent int* bufferPtrDevice = bufferPtr1; +} +``` + +5. Coherence as a generic argument + +```c# +typedef Ptr DeviceCoherentPtrInt; +int main() +{ + DeviceCoherentPtrInt ptr = DeviceCoherentPtrInt(&processMemory[id.x]); + output[id] = ptr[id]; +} +``` + +## diff --git a/proposals/031-coherent-pointers.md b/proposals/031-coherent-pointers.md deleted file mode 100644 index e603d17..0000000 --- a/proposals/031-coherent-pointers.md +++ /dev/null @@ -1,221 +0,0 @@ -# SP\#031: Coherent Pointers & Pointer Access - -## Status - -Status: Design Review -Implementation: -Author: Ariel Glasroth -Reviewer: - -## Background - -### Introduction - -GPUs have a concept known as coherent operations. Coherent operations flush cache for reads/writes so that when **thread A** modifies memory, **thread B** may read that memory, seeing all changes to memory done by a different thread. When flushing cache it is important to note that not all caches will be flushed. If a user wants coherence to `WorkGroup` memory, only the levels of cache up to `WorkGroup` memory will need to be flushed. - -Additionally, pointers will be permitted to be marked as read-only or read/write. A read-only pointer will be immutable (unable to modify the data pointed to). - -### Prior Implementations - -* HLSL – `globallycoherent` keyword can be added to declarations (`globallycoherent RWStructuredBuffer buffer`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. `groupshared` objects are likely coherent, specification does not specify. -* GLSL– `coherent` keyword can be added to declarations (`coherent uniform image2D img`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is unspecified. Objects tagged with `shared` are [implicitly](https://www.khronos.org/opengl/wiki/Compute_Shader) `coherent`. -* Metal – `memory_coherence::memory_coherence_device` is a generic argument to buffers. This argument ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. -* WGSL – [All operations](https://www.w3.org/TR/WGSL/#private-vs-non-private) are coherent. - -### SPIR-V Support For Coherence - -Originally, SPIR-V supported coherent objects through the `coherent` type `Decoration`. Modern SPIR-V (`VulkanMemoryModel`) exposes this functionality differently to control coherence as a **per operation** functionality. Coherence is now done per operation through adding the memory operands `MakePointerAvailable`, `MakePointerVisible`, `MakeTexelAvailable`, and `MakeTexelVisible` to load and store operations. Users must additionally specify the memory scope to which an operation is coherent. - -`MakePointerAvailable` is for memory stores of non textures, `OpStore`, `OpCooperativeMatrixStoreKHR` and `OpCooperativeVectorStoreNV`.. - -`MakePointerVisible` is for memory loads of non textures, `OpLoad`, `OpCooperativeMatrixLoadKHR` and `OpCooperativeVectorLoadNV`. - - `MakeTexelAvailableKHR` is for memory stores of textures, `OpImageWrite`, `OpImageSparseLoad`. - - `MakeTexelVisibleKHR` is for memory loads of textures, `OpImageRead`, `OpImageSparseRead`. - -Additionally, `MakePointer{Visible,Available}` support usage in `OpCopyMemory` and `OpCopyMemorySized`. - -### Example - -The simple use-case of this feature can be modeled with the following example: (1) We have **thread1** and **thread2** both reading/writing to the same `RWStructuredBuffer`. (2) **thread1** `OpStore`’s non-coherently into the buffer. (3) if **thread2** uses an `OpLoad` on the texture they may not see the change **thread1** made for 2 reasons: - -1) **thread1** does not promise its writes are visible to other threads. Cached writes may not immediately flush to device memory. -2) **thread2** may load from a cache, not device memory. This means we will not see the new value because the new value was written to device memory, not the intermediate cache. - -If we specify `MakePointerAvailable/MakePointerVisible` with `OpStore`/`OpLoad` to the memory scope `QueueFamily` we will solve this problem since we are flushing changes to a memory scope shared by the two threads. This additionally may be faster than the alternative of flushing to `Device` memory since a `QueueFamily` is a tighter scope than `Device`. - -## Proposed Solution - -### Frontend For Coherent Pointers & Pointer Access - -We propose to implement coherence on a per-operation level for only SPIR-V targets. This will be accomplished through modifying `Ptr` to include the new generic argument `CoherentScope coherentScope`. - -We also propose the new generic argument `Access access` to specify if a pointer is read-only or not. - -```c# -public enum CoherentScope -{ - NotCoherent = 0xFF, - CrossDevice = MemoryScope::CrossDevice, - Device = MemoryScope::Device, - Workgroup = MemoryScope::Workgroup, - Subgroup = MemoryScope::Subgroup, - Invocation = MemoryScope::Invocation, - QueueFamily = MemoryScope::QueueFamily, - ShaderCall = MemoryScope::ShaderCallKHR, - //... -} - -public enum Access -{ - ReadWrite = 0, - Read = 1 -} - -__generic -struct Ptr -{ - ... -} -``` - -If `coherentScope` is not `CoherentScope::NotCoherent`, all accesses to memory through this pointer will be considered coherent to the specified memory scope (example: `CoherentScope::Device` is coherent to the memory scope of `Device`). - -If `access` is `Access::ReadWrite` a pointer can read/write to the data pointed to. -If `access` is `Access::Read`, a pointer will only be allowed to read from the data pointed to. - -We will also provide a a type alias for user-convenience. - -```c# -__generic -typealias CoherentPtr = Ptr; -``` - -### Support For Coherent Buffers and Textures - -Any access through a coherent-pointer to a buffer/texture is coherent. - -```c# -RWStructuredBuffer val; // Texture works as well. -CoherentPtr p = &val[0]; -*p = 10; // coherent store -p = p+10; -int b = *p; //coherent load -int c = val[10] + *p; //allowed to use coherent and non-coherent simultaneously -``` - -### Support For Coherent Workgroup Memory - -Any access through a coherent-pointer to a `groupshared` object is coherent; Since Slang does not currently support pointers to `groupshared` memory, this proposal will extend the existing `AddressSpace::GroupShared` implementation for pointers as needed. - -### Support For Coherent Cooperative Matrix & Cooperative Vector - -`CoopVec` and `CoopMat` load data into their respective data-structures from other objects using `CoopVec::Load`, `CoopVec::Store`, `CoopMat::Load`, and `CoopMat::Store`. Due to this design, we will add coherent operations to `CoopVec` and `CoopMat` by modifying `CoopVec::Load`, `CoopVec::Store`, `CoopMat::Load`, and `CoopMat::Store` to complete coherent operations if given a `CoherentPtr` as a parameter. Syntax required to use the method(s) will not change. - -### Casting Pointers - -All pointers can be casted to each other. Casting must be explicit. - -### Banned keywords - -HLSL style `globallycoherent T*` and GLSL style `coherent T*` will be disallowed. - -### Order of Implementation - -* Frontend for coherent pointers & pointer access -* Logic for pointer access -* Support casting explicitly between pointers -* Disallow `globallycoherent T*` and `coherent T*` -* Support for coherent buffers and textures -* Support for workgroup memory pointers. -* Support for coherent workgroup memory -* Support for coherent cooperative matrix & cooperative vector - -## Future Work - -### Supporting Aligned Loads - -Users may choose to load coherently given a specific alignment. This will be supported through the `[Align(ALIGNMENT)]` decoration. - -```c# -[Align(ALIGNMENT)] -struct MyType {...} - -MyType* p = ...; -... -let a = *p; // should be aligned load; -let b = p.member; // should be aligned load, with alignment derived from both `MyType` and `member`'s type. -``` - -When loading data from a pointer `p` Slang will honor the alignment and emit an `OpLoad` with the SPIR-V `Aligned` memory operand, providing the argument `ALIGNMENT`. This will function alongside `coherent` pointers. - -### Additional Pointer Arguments - -`Volatile` and `Const` are planned features for `Ptr`. - -## Alternative Designs Considered - -1. Using special methods (part of the `Ptr` type) to access coherent-operation functionality - -```c# -T* ptr1 = bufferPtr1; -T* ptr2 = bufferPtr2; -var loadedData = coherentLoad(ptr1, scope = MemoryScope::Device); -coherentStore(ptr2, loadedData, scope = MemoryScope::Device); -``` - -2. Tagging types as coherent through a modifier - -```c# -// Not allowed: -globallycoherent RWStructuredBuffer bufferPtr1 : register(u0); - -cbuffer PtrBuffer -{ - // We only allow coherent on pointers - globallycoherent int* bufferPtr1; -} -``` - -3. ‘OOP’ approach, get a `CoherentPtr` from a regular `Ptr`. Any operation on a `CoherentPtr` will use the Coherent variant of a store/load. - -```c# -[require(SPV_KHR_vulkan_memory_model)] -void computeMain() -{ - int* ptr = gmemBuffer; - CoherentPtr ptr_workgroup = CoherentPtr(gmemBuffer); - CoherentPtr ptr_device = CoherentPtr(gmemBuffer); - - - ptr_workgroup[0] = output[1]; - ptr_workgroup = ptr_workgroup + 1; - output[2] = ptr_workgroup[0]; - ptr_workgroup = ptr_workgroup - 1; - output[3] = ptr_workgroup[3]; - - - ptr[10] = 10; - - ptr_device = ptr_device + 3; - gmemBuffer[0] = 10; - ptr_device[3] = output[3]; -} -``` - -4. Modifier with parameter to specify memory-scope - -```c# -cbuffer PtrBuffer -{ - int* bufferPtr1; -} -int main() -{ - coherent int* bufferPtrWorkgroup = bufferPtr1; - coherent int* bufferPtrDevice = bufferPtr1; -} -``` - -## From 49a86a5bf3c54cf173221961f94f42c5a73e605e Mon Sep 17 00:00:00 2001 From: ArielG-NV Date: Tue, 15 Jul 2025 00:35:07 -0700 Subject: [PATCH 2/8] cleanup --- ...-coherent-pointer-operations-and-access.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/proposals/031-coherent-pointer-operations-and-access.md b/proposals/031-coherent-pointer-operations-and-access.md index e28410e..d21803f 100644 --- a/proposals/031-coherent-pointer-operations-and-access.md +++ b/proposals/031-coherent-pointer-operations-and-access.md @@ -2,9 +2,9 @@ ## Status -Status: Design Review -Implementation: -Author: Ariel Glasroth +Status: Design Review +Implementation: +Author: Ariel Glasroth Reviewer: ## Background @@ -17,14 +17,14 @@ Additionally, pointers have a topic called 'access', the ability to mark a point ### Prior Implementations Of Coherence -* HLSL – `globallycoherent` keyword can be added to declarations (`globallycoherent RWStructuredBuffer buffer`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. `groupshared` objects are likely coherent, specification does not specify. -* GLSL – `coherent` keyword can be added to declarations (`coherent uniform image2D img`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is unspecified. Objects tagged with `shared` are [implicitly](https://www.khronos.org/opengl/wiki/Compute_Shader) `coherent`. -* Metal – `memory_coherence::memory_coherence_device` is a generic argument to buffers. This argument ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. +* HLSL – `globallycoherent` keyword can be added to declarations (`globallycoherent RWStructuredBuffer buffer`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. `groupshared` objects are likely coherent, specification does not specify. +* GLSL – `coherent` keyword can be added to declarations (`coherent uniform image2D img`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is unspecified. Objects tagged with `shared` are [implicitly](https://www.khronos.org/opengl/wiki/Compute_Shader) `coherent`. +* Metal – `memory_coherence::memory_coherence_device` is a generic argument to buffers. This argument ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. * WGSL – [All operations](https://www.w3.org/TR/WGSL/#private-vs-non-private) are coherent. ### SPIR-V Support For Coherence -Originally, SPIR-V supported coherent objects through the `coherent` type `Decoration`. Modern SPIR-V (`VulkanMemoryModel`) exposes this functionality differently to control coherence as a **per operation** functionality. Coherence is now done per operation through adding the memory operands `MakePointerAvailable`, `MakePointerVisible`, `MakeTexelAvailable`, and `MakeTexelVisible` to load and store operations. Users must additionally specify the memory scope to which an operation is coherent. +Originally, SPIR-V supported coherent objects through the `coherent` type `Decoration`. Modern SPIR-V (`VulkanMemoryModel`) exposes this functionality differently to control coherence as a **per operation** functionality. Coherence is now done per operation through adding the memory operands `MakePointerAvailable`, `MakePointerVisible`, `MakeTexelAvailable`, and `MakeTexelVisible` to load and store operations. Users must additionally specify the memory scope to which an operation is coherent. `MakePointerAvailable` is for memory stores of non textures, `OpStore`, `OpCooperativeMatrixStoreKHR` and `OpCooperativeVectorStoreNV`.. @@ -40,10 +40,10 @@ Additionally, `MakePointer{Visible,Available}` support usage in `OpCopyMemory` The simple use-case of this feature can be modeled with the following example: (1) We have **thread1** and **thread2** both reading/writing to the same `RWStructuredBuffer`. (2) **thread1** `OpStore`’s non-coherently into the buffer. (3) if **thread2** uses an `OpLoad` on the texture they may not see the change **thread1** made for 2 reasons: -1) **thread1** does not promise its writes are visible to other threads. Cached writes may not immediately flush to device memory. +1) **thread1** does not promise its writes are visible to other threads. Cached writes may not immediately flush to device memory. 2) **thread2** may load from a cache, not device memory. This means we will not see the new value because the new value was written to device memory, not the intermediate cache. -If we specify `MakePointerAvailable/MakePointerVisible` with `OpStore`/`OpLoad` to the memory scope `QueueFamily` we will solve this problem since we are flushing changes to a memory scope shared by the two threads. This additionally may be faster than the alternative of flushing to `Device` memory since a `QueueFamily` is a tighter scope than `Device`. +If we specify `MakePointerAvailable/MakePointerVisible` with `OpStore`/`OpLoad` to the memory scope `QueueFamily` we will solve this problem since we are flushing changes to a memory scope shared by the two threads. This additionally may be faster than the alternative of flushing to `Device` memory since a `QueueFamily` is a tighter scope than `Device`. ### Prior Implementations Of Access @@ -233,8 +233,8 @@ HLSL style `globallycoherent T*` and GLSL style `coherent T*` will be disallowed * Disallowed `const T*` and `T* const` * Support for coherent buffers and textures * Support for workgroup memory pointers. -* Support for coherent workgroup memory -* Support for coherent cooperative matrix & cooperative vector +* Support for coherent workgroup memory +* Support for coherent cooperative matrix & cooperative vector ## Alternative Designs Considered From a44bbb6ad955f8f7bc4de3e61d8b2b3fbb8cf42d Mon Sep 17 00:00:00 2001 From: ArielG-NV Date: Tue, 15 Jul 2025 00:38:23 -0700 Subject: [PATCH 3/8] fix some grammer --- proposals/031-coherent-pointer-operations-and-access.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/031-coherent-pointer-operations-and-access.md b/proposals/031-coherent-pointer-operations-and-access.md index d21803f..c9c9a7a 100644 --- a/proposals/031-coherent-pointer-operations-and-access.md +++ b/proposals/031-coherent-pointer-operations-and-access.md @@ -134,7 +134,7 @@ If a pointer is `Access::Read`, a user program may only read from the given poin ### Frontend For Coherent Pointer Operations -We propose to implement coherence on a per-operation level for SPIR-V targets. This will be accomplished through new intrinsic methods to handle coherent load/store on a per-operation basis. +We propose to implement coherence on a per-operation level for SPIR-V targets. This will be accomplished through new intrinsic methods to handle coherent load/store. ```c# public enum MemoryScope : int32_t @@ -150,8 +150,8 @@ public enum MemoryScope : int32_t } // `ptr` is the value to be loaded. -// The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to. // The `int alignment` parameter controls the alignment to load from a pointer with. +// The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to. [ForceInline] [require(SPV_KHR_vulkan_memory_model)] __generic @@ -179,8 +179,8 @@ CoopMat coopMatLoadCoherent(Ptr ptr, uint e // `ptr` is the dst for the store. // `val` is the value to store into `ptr`. -// The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to. // The `int alignment` parameter controls the alignment to load from a pointer with. +// The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to. [ForceInline] [require(SPV_KHR_vulkan_memory_model)] __generic From a24502da4b99f19131c27e1dd76d752005213cb2 Mon Sep 17 00:00:00 2001 From: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Date: Tue, 15 Jul 2025 13:25:27 -0700 Subject: [PATCH 4/8] disallow additional syntax --- .../031-coherent-pointer-operations-and-access.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/proposals/031-coherent-pointer-operations-and-access.md b/proposals/031-coherent-pointer-operations-and-access.md index c9c9a7a..919be3c 100644 --- a/proposals/031-coherent-pointer-operations-and-access.md +++ b/proposals/031-coherent-pointer-operations-and-access.md @@ -214,11 +214,15 @@ Any access through a coherent-pointer to a `groupshared` object is coherent; Sin All pointers can be casted to each other. Casting must be explicit. -### Banned keywords +### Banned keyword usage + +The following keyword use is disallowed: +* `globallycoherent T*` +* `coherent T*`. +* `const T*` and `T* const` +* `Ptr`, `Ptr`, and `Ptr` -HLSL style `globallycoherent T*` and GLSL style `coherent T*` will be disallowed. -`const T*` and `T* const` will be disallowed. ### Explicitly allowed keywords From dee47083c2749a3edd2f8637d6146f7f8434bfe4 Mon Sep 17 00:00:00 2001 From: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Date: Tue, 15 Jul 2025 13:25:54 -0700 Subject: [PATCH 5/8] Update proposals/031-coherent-pointer-operations-and-access.md --- proposals/031-coherent-pointer-operations-and-access.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/proposals/031-coherent-pointer-operations-and-access.md b/proposals/031-coherent-pointer-operations-and-access.md index 919be3c..0e25b65 100644 --- a/proposals/031-coherent-pointer-operations-and-access.md +++ b/proposals/031-coherent-pointer-operations-and-access.md @@ -222,8 +222,6 @@ The following keyword use is disallowed: * `const T*` and `T* const` * `Ptr`, `Ptr`, and `Ptr` - - ### Explicitly allowed keywords `const Ptr` is permitted. This means that a `Ptr` is constant, the address the pointer is pointing at will not change. From 7e27aeeed7bfa48d5c9029473058d5d620ed70b8 Mon Sep 17 00:00:00 2001 From: ArielG-NV Date: Wed, 16 Jul 2025 10:10:15 -0700 Subject: [PATCH 6/8] Address review of Kai --- ...031-coherent-pointer-operations-and-access.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/proposals/031-coherent-pointer-operations-and-access.md b/proposals/031-coherent-pointer-operations-and-access.md index c9c9a7a..a6ef8c3 100644 --- a/proposals/031-coherent-pointer-operations-and-access.md +++ b/proposals/031-coherent-pointer-operations-and-access.md @@ -48,7 +48,9 @@ If we specify `MakePointerAvailable/MakePointerVisible` with `OpStore`/`OpLoad` ### Prior Implementations Of Access * C/C++ – `const int* ptr` or `int const*` both mean the underlying data pointed to is constant - * This will be equivlent to `Access::Read` + * This will be equivalent to `Access::Read` +* C/C++ – `int* const` means the value of the pointer is constant + * This will be equivalent to `const Ptr ptr` ### Compiler Support For Coherence @@ -173,8 +175,8 @@ __generic< let N : int, let R : CoopMatMatrixUse, let matrixLayout : CoopMatMatrixLayout, - Access access, - AddressSpace addrSpace> + let access : Access, + let addrSpace : AddressSpace> CoopMat coopMatLoadCoherent(Ptr ptr, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device); // `ptr` is the dst for the store. @@ -202,13 +204,13 @@ __generic< let N : int, let R : CoopMatMatrixUse, let matrixLayout : CoopMatMatrixLayout, - AddressSpace addrSpace> + let addrSpace : AddressSpace> void coopMatLoadCoherent(Ptr ptr, CoopMat val, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device); ``` ### Support For Coherent Workgroup Memory -Any access through a coherent-pointer to a `groupshared` object is coherent; Since Slang does not currently support pointers to `groupshared` memory, this proposal will extend the existing `AddressSpace::GroupShared` implementation for pointers as needed. +Any access through a coherent-pointer to a `groupshared` object is coherent; since Slang does not currently support pointers to `groupshared` memory, this proposal will extend the existing `AddressSpace::GroupShared` implementation for pointers as needed. ### Casting Pointers @@ -218,7 +220,7 @@ All pointers can be casted to each other. Casting must be explicit. HLSL style `globallycoherent T*` and GLSL style `coherent T*` will be disallowed. -`const T*` and `T* const` will be disallowed. +`const T*`, `T* const`, and `T const*` will be disallowed. ### Explicitly allowed keywords @@ -230,7 +232,7 @@ HLSL style `globallycoherent T*` and GLSL style `coherent T*` will be disallowed * Logic for pointer access * Support casting explicitly between pointers * Disallow `globallycoherent T*` and `coherent T*` -* Disallowed `const T*` and `T* const` +* Disallow `const T*`, `T const*`, and `T* const` * Support for coherent buffers and textures * Support for workgroup memory pointers. * Support for coherent workgroup memory From c2c31147ea887711a9657f03799952c7e0e573aa Mon Sep 17 00:00:00 2001 From: ArielG-NV Date: Fri, 18 Jul 2025 10:27:21 -0700 Subject: [PATCH 7/8] make more neat; keep consistent with how `CoopVec`/`CoopMat` is defined --- ...-coherent-pointer-operations-and-access.md | 65 ++++++++++++------- 1 file changed, 40 insertions(+), 25 deletions(-) diff --git a/proposals/031-coherent-pointer-operations-and-access.md b/proposals/031-coherent-pointer-operations-and-access.md index 7eb31eb..cb51140 100644 --- a/proposals/031-coherent-pointer-operations-and-access.md +++ b/proposals/031-coherent-pointer-operations-and-access.md @@ -151,6 +151,8 @@ public enum MemoryScope : int32_t //... } +//// Ptr<> + // `ptr` is the value to be loaded. // The `int alignment` parameter controls the alignment to load from a pointer with. // The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to. @@ -159,26 +161,6 @@ public enum MemoryScope : int32_t __generic T loadCoherent(Ptr ptr, int alignment, constexpr MemoryScope scope = MemoryScope::Device); -// Return a `CoopVec`, loaded from `ptr`. -[ForceInline] -[require(SPV_KHR_vulkan_memory_model, cooperative_vector)] -__generic -CoopVec coopVecLoadCoherent(Ptr ptr, int offset, int alignment, constexpr MemoryScope scope = MemoryScope::Device); - -// Return a `CoopMat`, loaded from `ptr`. -[ForceInline] -[require(SPV_KHR_vulkan_memory_model, cooperative_matrix)] -__generic< - T : __BuiltinArithmeticType, - let S : MemoryScope, - let M : int, - let N : int, - let R : CoopMatMatrixUse, - let matrixLayout : CoopMatMatrixLayout, - let access : Access, - let addrSpace : AddressSpace> -CoopMat coopMatLoadCoherent(Ptr ptr, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device); - // `ptr` is the dst for the store. // `val` is the value to store into `ptr`. // The `int alignment` parameter controls the alignment to load from a pointer with. @@ -188,13 +170,29 @@ CoopMat coopMatLoadCoherent(Ptr ptr, uint e __generic void storeCoherent(Ptr ptr, T val, int alignment, constexpr MemoryScope scope = MemoryScope::Device); -// Store into `ptr` given `val`. +//// CoopVec<> + +// Return a `CoopVec`, loaded from `ptr`. [ForceInline] [require(SPV_KHR_vulkan_memory_model, cooperative_vector)] -__generic -void coopVecStoreCoherent(Ptr ptr, CoopVec val, int offset, int alignment, constexpr MemoryScope scope = MemoryScope::Device); +__generic +CoopVec coopVecLoadCoherent(Ptr ptr, int offset, int alignment, constexpr MemoryScope scope = MemoryScope::Device); + +// As a method to `CoopVec`, keep consistent with how `CoopVec` is defined +struct CoopVec +{ +... + // Store into `ptr` given `val`. + [ForceInline] + [require(SPV_KHR_vulkan_memory_model, cooperative_vector)] + __generic + void storeCoherent(Ptr ptr, int offset, int alignment, constexpr MemoryScope scope = MemoryScope::Device); +... +} -// Store into `ptr` given `val`. +//// CoopMat<> + +// Return a `CoopMat`, loaded from `ptr`. [ForceInline] [require(SPV_KHR_vulkan_memory_model, cooperative_matrix)] __generic< @@ -204,8 +202,21 @@ __generic< let N : int, let R : CoopMatMatrixUse, let matrixLayout : CoopMatMatrixLayout, + let access : Access, let addrSpace : AddressSpace> -void coopMatLoadCoherent(Ptr ptr, CoopMat val, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device); +CoopMat coopMatLoadCoherent(Ptr ptr, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device); + +// As a method to `CoopMat`, keep consistent with how `CoopMat` is defined +struct CoopMat +{ +... + // Store into `ptr` given `val`. + [ForceInline] + [require(SPV_KHR_vulkan_memory_model, cooperative_matrix)] + __generic + void storeCoherent(Ptr ptr, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device); +... +} ``` ### Support For Coherent Workgroup Memory @@ -240,6 +251,10 @@ The following keyword use is disallowed: * Support for coherent workgroup memory * Support for coherent cooperative matrix & cooperative vector +### Potential Next Steps + +`coopVecLoadCoherent` and `coopMatLoadCoherent` should have a member-function version inside `CoopVec`/`CoopMat` (`loadCoherent`). + ## Alternative Designs Considered 1. Using special methods (part of the `Ptr` type) to access coherent-operation functionality From 71fd320d1adbc2673cfbea79c54216bc9cc6517c Mon Sep 17 00:00:00 2001 From: ArielG-NV Date: Fri, 18 Jul 2025 13:59:46 -0700 Subject: [PATCH 8/8] minor grammer --- proposals/031-coherent-pointer-operations-and-access.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/031-coherent-pointer-operations-and-access.md b/proposals/031-coherent-pointer-operations-and-access.md index cb51140..bb052f7 100644 --- a/proposals/031-coherent-pointer-operations-and-access.md +++ b/proposals/031-coherent-pointer-operations-and-access.md @@ -30,9 +30,9 @@ Originally, SPIR-V supported coherent objects through the `coherent` type `Decor `MakePointerVisible` is for memory loads of non textures, `OpLoad`, `OpCooperativeMatrixLoadKHR` and `OpCooperativeVectorLoadNV`. - `MakeTexelAvailableKHR` is for memory stores of textures, `OpImageWrite`, `OpImageSparseLoad`. +`MakeTexelAvailableKHR` is for memory stores of textures, `OpImageWrite`, `OpImageSparseLoad`. - `MakeTexelVisibleKHR` is for memory loads of textures, `OpImageRead`, `OpImageSparseRead`. +`MakeTexelVisibleKHR` is for memory loads of textures, `OpImageRead`, `OpImageSparseRead`. Additionally, `MakePointer{Visible,Available}` support usage in `OpCopyMemory` and `OpCopyMemorySized`.