-
Notifications
You must be signed in to change notification settings - Fork 15.1k
AMDGPU: Preliminary documentation for named barriers #165502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1179,6 +1179,55 @@ is conservatively correct for OpenCL. | |
| other operations within the same address space. | ||
| ======================= =================================================== | ||
|
|
||
| Target Types | ||
| ------------ | ||
|
|
||
| The AMDGPU backend implements some target extension types. | ||
|
|
||
| .. _amdgpu-types-named-barriers: | ||
|
|
||
| Named Barriers | ||
| ~~~~~~~~~~~~~~ | ||
|
|
||
| Named barriers are fixed function hardware barrier objects that are available | ||
| in gfx12.5+ in addition to the traditional default barriers. | ||
|
|
||
| In LLVM IR, named barriers are represented by global variables of type | ||
| ``target("amdgcn.named.barrier", 0)`` in the LDS address space. Named barrier | ||
| global variables do not occupy actual LDS memory, but their lifetime and | ||
| allocation scope matches that of global variables in LDS. Programs in LLVM IR | ||
| refer to named barriers using pointers. | ||
|
|
||
| The following named barrier types are supported in global variables, defined | ||
| recursively: | ||
|
|
||
| * a single, standalone ``target("amdgcn.named.barrier", 0)`` | ||
| * an array of supported types | ||
| * a struct containing a single element of supported type | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| @bar = addrspace(3) global target("amdgcn.named.barrier", 0) undef | ||
| @foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef | ||
| @baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef | ||
|
|
||
| ... | ||
|
|
||
| %foo.i = getelementptr [2 x target("amdgcn.named.barrier", 0)], ptr addrspace(3) @foo, i32 0, i32 %i | ||
| call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %foo.i, i32 0) | ||
|
|
||
| Named barrier types may not be used in ``alloca``. | ||
|
|
||
| Named barriers do not have an underlying byte representation. | ||
| It is undefined behavior to use a pointer to any part of a named barrier object | ||
| as the pointer operand of a regular memory access instruction or intrinsic. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we need to add something to say the pointer has to be used directly, so cannot be offset (gep) or undergo any kind of manipulation such as inttoptr/ptrtoint? |
||
| Pointers to named barrier objects are intended to be used with dedicated | ||
| intrinsics. | ||
|
|
||
| We expand on the semantics of named barriers in | ||
| :ref:`the memory model section <amdgpu-memory-model-named-barriers>`. | ||
|
|
||
|
|
||
| LLVM IR Intrinsics | ||
| ------------------ | ||
|
|
||
|
|
@@ -6621,6 +6670,137 @@ Multiple tags can be used at the same time to synchronize with more than one add | |
| better code optimization, at the cost of synchronizing additional address | ||
| spaces. | ||
|
|
||
| .. _amdgpu-memory-model-barriers: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm hoping to land the formal model soon(-ish), so maybe this section won't be necessary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean the entire section, including the stuff that is specific to named barriers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, not the named barrier parts yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about this a bit more and I think it'd be better to not land this section. I'm looking at the barrier execution & memory model right now and it's just going to be confusing if this lands only to be removed short term. Curious to hear what other reviewers think too, if others prefer if this lands even if it's reverted later, then it's also fine for me. |
||
|
|
||
| Hardware Barriers | ||
| +++++++++++++++++ | ||
|
|
||
| .. note:: | ||
|
|
||
| This section is preliminary. The semantics described here are intended to be | ||
| formalized properly in the future. | ||
|
|
||
| Hardware barriers synchronize execution between concurrently running waves using | ||
| fixed function hardware. Intuitively, a set of waves are "members" of a barrier. | ||
| Waves *signal* the barrier and later *wait* for it. Execution only proceeds past | ||
| the *wait* once all member waves have *signaled* the barrier. | ||
|
|
||
| Formally, barriers affect semantics in exactly two ways. First, they affect | ||
| forward progress. Waiting on a barrier that never completes (is not signaled | ||
| sufficiently) prevents forward progress and therefore, given the assumption of | ||
| forward progress, is undefined behavior. Second, barrier operations can pair | ||
| with fences to contribute *synchronizes-with* relations in the memory model. | ||
|
|
||
| Roughly speaking: | ||
|
|
||
| - Release fences pair with barrier signal operations that are later in program | ||
| order | ||
| - Barrier wait operations pair with acquire fences that are later in program | ||
| order | ||
| - If a barrier signal operation contributes to allowing a wait operation to | ||
| complete, then the corresponding paired fences can synchronize-with each | ||
| other (given compatible sync scopes and memory model relaxation annotations) | ||
|
|
||
| Default Barriers | ||
| ################ | ||
|
|
||
| There is a default workgroup barrier and a default cluster barrier. All waves | ||
| of a workgroup and cluster are members of the same default workgroup and | ||
| cluster barriers, respectively. | ||
|
|
||
| .. _amdgpu-memory-model-named-barriers: | ||
|
|
||
| Named Barriers | ||
| ############## | ||
|
|
||
| All named barrier operations must occur in wave-uniform control flow. All | ||
| arguments of named barrier intrinsics must be wave-uniform. | ||
|
|
||
| Named barriers are allocated as global variables of | ||
| :ref:`a target extension type <amdgpu-types-named-barriers>`. | ||
|
|
||
| Named barriers may be signaled by the intrinsics: | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %barrier_ptr, i32 %member_count) | ||
|
|
||
| If ``member_count`` is non-zero, the operation is an *initializing* signal, | ||
| else it is *non*-initializing. | ||
|
|
||
| Named barriers may be initialized explicitly using: | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare void @llvm.amdgcn.s.barrier.init(ptr addrspace(3) %barrier_ptr, i32 %member_count) | ||
|
|
||
| It is possible to "leave" a named barrier. This decrements the named barrier's | ||
| member count and completes the barrier if all other members have signaled it: | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare void @llvm.amdgcn.s.barrier.leave(i32 %barrier_type) | ||
|
|
||
| ``barrier_type`` must be set to ``1``. | ||
|
|
||
| Note that leaving a named barrier is not exactly the opposite of joining a | ||
| barrier (for example, joining a barrier does not change its member count). | ||
|
|
||
| Leaving implicitly *joins* (see below) a null named barrier. | ||
|
|
||
| Signal, leave, and initializing operations on the same named barrier must obey | ||
| certain ordering constraints: | ||
|
|
||
| * Non-initializing signals must be ordered after some initializing signal or an | ||
| explicit initializing operation. | ||
| * Explicit initializing operations must not race signal or leave operations. | ||
| * Initializing signal operations must not race leave operations. | ||
| * Initializing signal operations with contradicting member counts must not race | ||
| each other. | ||
|
|
||
| The details of how these orders can be established and races prevented are tbd. | ||
| Using a default workgroup or cluster barrier in the natural way is guaranteed to | ||
| be sufficient. | ||
|
|
||
| In order to wait for a named barrier, a wave must first *join* the named barrier | ||
| using: | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) %barrier_ptr) | ||
|
|
||
| The named barrier may then be waited for using: | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare void @llvm.amdgcn.s.barrier.wait(i32 %barrier_type) | ||
|
|
||
| ... with ``barrier_type`` set to ``1``. | ||
|
|
||
| Signal, leave, join, and wait operations must obey certain ordering constraints. | ||
| The details are tbd. Satisfying the following rules is guaranteed to be | ||
| sufficient: | ||
|
|
||
| * Signal or wait for a named barrier only if it is the most recent to have been | ||
| joined in program order. | ||
| * Signal or leave a named barrier only if the number of prior signaling | ||
| operations on that named barrier since the most recent join in program order | ||
| is equal to the number of prior wait operations on that named barrier since | ||
| the most recent join in program order. | ||
| * Wait for a named barrier only if the number of prior signaling operations on | ||
| that named barrier since the most recent join in program order is one larger | ||
| than the number of prior wait operations on that named barrier since the most | ||
| recent join in program order. | ||
| * Do not signal a named barrier or wait for it in program order after leaving it. | ||
|
|
||
| Additionally, use signal, leave, and wait operations on a named barrier from a | ||
| consistent associated set of waves that is determined at initialization time and | ||
| whose initial size is the member count used at initialization. The set of waves | ||
| may shrink with leave operations. Operations on a named barrier object with | ||
| conflicting sets of waves must not race. The details of this rule and how an | ||
| ordering can be established to prevent a race is tbd. Using a default workgroup | ||
| or cluster barrier in the natural way is guaranteed to be sufficient. | ||
|
|
||
| .. _amdgpu-amdhsa-memory-model-gfx6-gfx9: | ||
|
|
||
| Memory Model GFX6-GFX9 | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.