Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 180 additions & 0 deletions llvm/docs/AMDGPUUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1179,6 +1179,55 @@ is conservatively correct for OpenCL.
other operations within the same address space.
======================= ===================================================

Target Types
------------

The AMDGPU backend implements some target extension types.

.. _amdgpu-types-named-barriers:

Named Barriers
~~~~~~~~~~~~~~

Named barriers are fixed function hardware barrier objects that are available
in gfx12.5+ in addition to the traditional default barriers.

In LLVM IR, named barriers are represented by global variables of type
``target("amdgcn.named.barrier", 0)`` in the LDS address space. Named barrier
global variables do not occupy actual LDS memory, but their lifetime and
allocation scope matches that of global variables in LDS. Programs in LLVM IR
refer to named barriers using pointers.

The following named barrier types are supported in global variables, defined
recursively:

* a single, standalone ``target("amdgcn.named.barrier", 0)``
* an array of supported types
* a struct containing a single element of supported type

.. code-block:: llvm

@bar = addrspace(3) global target("amdgcn.named.barrier", 0) undef
@foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef
@baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef

...

%foo.i = getelementptr [2 x target("amdgcn.named.barrier", 0)], ptr addrspace(3) @foo, i32 0, i32 %i
call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %foo.i, i32 0)

Named barrier types may not be used in ``alloca``.

Named barriers do not have an underlying byte representation.
It is undefined behavior to use a pointer to any part of a named barrier object
as the pointer operand of a regular memory access instruction or intrinsic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to add something to say the pointer has to be used directly, so cannot be offset (gep) or undergo any kind of manipulation such as inttoptr/ptrtoint?

Pointers to named barrier objects are intended to be used with dedicated
intrinsics.

We expand on the semantics of named barriers in
:ref:`the memory model section <amdgpu-memory-model-named-barriers>`.


LLVM IR Intrinsics
------------------

Expand Down Expand Up @@ -6621,6 +6670,137 @@ Multiple tags can be used at the same time to synchronize with more than one add
better code optimization, at the cost of synchronizing additional address
spaces.

.. _amdgpu-memory-model-barriers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping to land the formal model soon(-ish), so maybe this section won't be necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the entire section, including the stuff that is specific to named barriers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not the named barrier parts yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit more and I think it'd be better to not land this section. I'm looking at the barrier execution & memory model right now and it's just going to be confusing if this lands only to be removed short term.
What do you think? I think this patch is fine if it just documents the LDS GV. It doesn't need to extend into a barrier memory model.

Curious to hear what other reviewers think too, if others prefer if this lands even if it's reverted later, then it's also fine for me.


Hardware Barriers
+++++++++++++++++

.. note::

This section is preliminary. The semantics described here are intended to be
formalized properly in the future.

Hardware barriers synchronize execution between concurrently running waves using
fixed function hardware. Intuitively, a set of waves are "members" of a barrier.
Waves *signal* the barrier and later *wait* for it. Execution only proceeds past
the *wait* once all member waves have *signaled* the barrier.

Formally, barriers affect semantics in exactly two ways. First, they affect
forward progress. Waiting on a barrier that never completes (is not signaled
sufficiently) prevents forward progress and therefore, given the assumption of
forward progress, is undefined behavior. Second, barrier operations can pair
with fences to contribute *synchronizes-with* relations in the memory model.

Roughly speaking:

- Release fences pair with barrier signal operations that are later in program
order
- Barrier wait operations pair with acquire fences that are later in program
order
- If a barrier signal operation contributes to allowing a wait operation to
complete, then the corresponding paired fences can synchronize-with each
other (given compatible sync scopes and memory model relaxation annotations)

Default Barriers
################

There is a default workgroup barrier and a default cluster barrier. All waves
of a workgroup and cluster are members of the same default workgroup and
cluster barriers, respectively.

.. _amdgpu-memory-model-named-barriers:

Named Barriers
##############

All named barrier operations must occur in wave-uniform control flow. All
arguments of named barrier intrinsics must be wave-uniform.

Named barriers are allocated as global variables of
:ref:`a target extension type <amdgpu-types-named-barriers>`.

Named barriers may be signaled by the intrinsics:

.. code-block:: llvm

declare void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %barrier_ptr, i32 %member_count)

If ``member_count`` is non-zero, the operation is an *initializing* signal,
else it is *non*-initializing.

Named barriers may be initialized explicitly using:

.. code-block:: llvm

declare void @llvm.amdgcn.s.barrier.init(ptr addrspace(3) %barrier_ptr, i32 %member_count)

It is possible to "leave" a named barrier. This decrements the named barrier's
member count and completes the barrier if all other members have signaled it:

.. code-block:: llvm

declare void @llvm.amdgcn.s.barrier.leave(i32 %barrier_type)

``barrier_type`` must be set to ``1``.

Note that leaving a named barrier is not exactly the opposite of joining a
barrier (for example, joining a barrier does not change its member count).

Leaving implicitly *joins* (see below) a null named barrier.

Signal, leave, and initializing operations on the same named barrier must obey
certain ordering constraints:

* Non-initializing signals must be ordered after some initializing signal or an
explicit initializing operation.
* Explicit initializing operations must not race signal or leave operations.
* Initializing signal operations must not race leave operations.
* Initializing signal operations with contradicting member counts must not race
each other.

The details of how these orders can be established and races prevented are tbd.
Using a default workgroup or cluster barrier in the natural way is guaranteed to
be sufficient.

In order to wait for a named barrier, a wave must first *join* the named barrier
using:

.. code-block:: llvm

declare void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) %barrier_ptr)

The named barrier may then be waited for using:

.. code-block:: llvm

declare void @llvm.amdgcn.s.barrier.wait(i32 %barrier_type)

... with ``barrier_type`` set to ``1``.

Signal, leave, join, and wait operations must obey certain ordering constraints.
The details are tbd. Satisfying the following rules is guaranteed to be
sufficient:

* Signal or wait for a named barrier only if it is the most recent to have been
joined in program order.
* Signal or leave a named barrier only if the number of prior signaling
operations on that named barrier since the most recent join in program order
is equal to the number of prior wait operations on that named barrier since
the most recent join in program order.
* Wait for a named barrier only if the number of prior signaling operations on
that named barrier since the most recent join in program order is one larger
than the number of prior wait operations on that named barrier since the most
recent join in program order.
* Do not signal a named barrier or wait for it in program order after leaving it.

Additionally, use signal, leave, and wait operations on a named barrier from a
consistent associated set of waves that is determined at initialization time and
whose initial size is the member count used at initialization. The set of waves
may shrink with leave operations. Operations on a named barrier object with
conflicting sets of waves must not race. The details of this rule and how an
ordering can be established to prevent a race is tbd. Using a default workgroup
or cluster barrier in the natural way is guaranteed to be sufficient.

.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:

Memory Model GFX6-GFX9
Expand Down
Loading