Skip to content

Conversation

@nhaehnle
Copy link
Collaborator

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Nicolai Hähnle (nhaehnle)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/165502.diff

1 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+179)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7780c0a6dca0a..9a4c644a63f6e 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1179,6 +1179,53 @@ is conservatively correct for OpenCL.
                              other operations within the same address space.
      ======================= ===================================================
 
+Target Types
+------------
+
+The AMDGPU backend implements some target extension types.
+
+.. _amdgpu-types-named-barriers:
+
+Named Barriers
+~~~~~~~~~~~~~~
+
+Named barriers are represented as memory objects of type
+``target("amdgcn.named.barrier", 0)``. They are allocated as global variables
+in the LDS address space. They do not occupy regular LDS memory, but their
+lifetime and allocation granularity matches that of global variables in LDS.
+
+The following types built from named barriers are supported in global variables,
+defined recursively:
+
+* a standalone ``target("amdgcn.named.barrier", 0)``
+* an array of supported types
+* a struct containing a single element of supported type
+
+.. code-block:: llvm
+
+      @bar = addrspace(3) global target("amdgcn.named.barrier", 0) undef
+      @foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef
+      @baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef
+
+Barrier types may not be used in ``alloca``.
+
+The integral representation of a pointer to a valid named barrier is in the
+range ``0x0080'0010`` to ``0x0080'0100`` (inclusive). The representation is
+formed by the expression ``0x0080'0000 | (id << 4)``, where ``id`` is the
+hardware barrier ID. The integral representation of the null named barrier is
+``0x0080'0000``.
+
+It is not legal to attempt to form a pointer to any non-named barrier objects.
+
+It is undefined behavior to use a pointer to any part of a named barrier object
+as the pointer operand of a regular memory access instruction or intrinsic.
+Pointers to named barrier objects are intended to be used with dedicated
+intrinsics.
+
+We expand on the semantics of named barriers in
+:ref:`the memory model section <amdgpu-memory-model-named-barriers>`.
+
+
 LLVM IR Intrinsics
 ------------------
 
@@ -6621,6 +6668,138 @@ Multiple tags can be used at the same time to synchronize with more than one add
   better code optimization, at the cost of synchronizing additional address
   spaces.
 
+.. _amdgpu-memory-model-barriers:
+
+Hardware Barriers
++++++++++++++++++
+
+.. note::
+
+  This section is preliminary. The semantics described here are intended to be
+  formalized properly in the future.
+
+Hardware barriers synchronize execution between concurrently running waves using
+fixed function hardware. Intuitively, a set of waves are "members" of a barrier.
+Waves *signal* the barrier and later *wait* for it. Execution only proceeds past
+the *wait* once all member waves have *signaled* the barrier.
+
+Formally, barriers affect semantics in exactly two ways. First, they affect
+forward progress. Waiting on a barrier that never completes (is not signaled
+sufficiently) prevents forward progress and therefore, given the assumption of
+forward progress, is undefined behavior. Second, barrier operations can pair
+with fences to contribute *synchronizes-with* relations in the memory model.
+
+Roughly speaking:
+
+- Release fences pair with barrier signal operations that are later in program
+  order
+- Barrier wait operations pair with acquire fences that are later in program
+  order
+- If a barrier signal operation contributes to allowing a wait operation to
+  complete, then the corresponding paired fences can synchronize-with each
+  other (given compatible sync scopes and memory model relaxation annotations)
+
+Default Barriers
+################
+
+There is a default workgroup barrier and a default cluster barrier. All waves
+of a workgroup and cluster are members of the same default workgroup and
+cluster barriers, respectively.
+
+.. _amdgpu-memory-model-named-barriers:
+
+Named Barriers
+##############
+
+All named barrier operations must occur in wave-uniform control flow. All
+arguments of named barrier intrinsics must be wave-uniform.
+
+Named barriers are allocated as global variables of
+:ref:`a target extension type <amdgpu-types-named-barriers>`.
+
+Named barriers may be signaled by the intrinsics:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.signal(i32 %barrier_hw_id)
+  declare void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %barrier_ptr, i32 %member_count)
+
+If the second form is used and ``member_count`` is non-zero, the operation is
+an *initializing* signal, else it is *non*-initializing.
+
+Named barriers may be initialized explicitly using:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.init(ptr addrspace(3) %barrier_ptr, i32 %member_count)
+
+It is possible to "leave" a named barrier. This decrements the named barrier's
+member count and completes the barrier if all other members have signaled it:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.leave(i32 %barrier_type)
+
+``barrier_type`` must be set to ``1``.
+
+Note that leaving a named barrier is not exactly the opposite of joining a
+barrier (for example, joining a barrier does not change its member count).
+
+Leaving implicitly *joins* (see below) a null named barrier.
+
+Signal, leave, and initializing operations on the same named barrier must obey
+certain ordering constraints:
+
+* Non-initializing signals must be ordered after some initializing signal or an
+  explicit initializing operation.
+* Explicit initializing operations must not race signal or leave operations.
+* Initializing signal operations must not race leave operations.
+* Initializing signal operations with contradicting member counts must not race
+  each other.
+
+The details of how these orders can be established and races prevented are tbd.
+Using a default workgroup or cluster barrier in the natural way is guaranteed to
+be sufficient.
+
+In order to wait for a named barrier, a wave must first *join* the named barrier
+using:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) %barrier_ptr)
+
+The named barrier may then be waited for using:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.wait(i32 %barrier_type)
+
+... with ``barrier_type`` set to ``1``.
+
+Signal, leave, join, and wait operations must obey certain ordering constraints.
+The details are tbd. Satisfying the following rules is guaranteed to be
+sufficient:
+
+* Signal or wait for a named barrier only if it is the most recent to have been
+  joined in program order.
+* Signal or leave a named barrier only if the number of prior signaling
+  operations on that named barrier since the most recent join in program order
+  is equal to the number of prior wait operations on that named barrier since
+  the most recent join in program order.
+* Wait for a named barrier only if the number of prior signaling operations on
+  that named barrier since the most recent join in program order is one larger
+  than the number of prior wait operations on that named barrier since the most
+  recent join in program order.
+* Do not signal a named barrier or wait for it in program order after leaving it.
+
+Additionally, use signal, leave, and wait operations on a named barrier from a
+consistent associated set of waves that is determined at initialization time and
+whose initial size is the member count used at initialization. The set of waves
+may shrink with leave operations. Operations on a named barrier object with
+conflicting sets of waves must not race. The details of this rule and how an
+ordering can be established to prevent a race is tbd. Using a default workgroup
+or cluster barrier in the natural way is guaranteed to be sufficient.
+
 .. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
 
 Memory Model GFX6-GFX9


.. _amdgpu-memory-model-barriers:

Hardware Barriers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping to land the formal model soon(-ish), so maybe this section won't be necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the entire section, including the stuff that is specific to named barriers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not the named barrier parts yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit more and I think it'd be better to not land this section. I'm looking at the barrier execution & memory model right now and it's just going to be confusing if this lands only to be removed short term.
What do you think? I think this patch is fine if it just documents the LDS GV. It doesn't need to extend into a barrier memory model.

Curious to hear what other reviewers think too, if others prefer if this lands even if it's reverted later, then it's also fine for me.


Barrier types may not be used in ``alloca``.

The integral representation of a pointer to a valid named barrier is in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph doesn't feel right. I think we should say absolutely nothing about the pointer value and not commit to anything in that regard. We can just say it is only valid to form a pointer to a value of type amdgcn.named.barrier by taking the address of a global value of that type, and modifying those pointers in any way (pointer arithmetic for example) is undefined behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see why you wrote that. The lowering as implemented right now encodes the barrier ID as a pointer value through absolute_address. IMO that's terrible.
I strongly recommend we do not enshrine that hack anywhere in the documentation and keep it as an implementation detail of how the codegen passes communicate.

We should just say the address is not relevant because it is eventually replaced by some "implementation-dependent unique identifier" used to access the barrier state. Keep it as vague as possible so we have freedom of implementation: the id can be a pointer, an integer, a random dynamically assigned ID, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to hear some other opinions on that, to be honest. Two things to consider:

  • It does actually matter to users if we cop to allowing the use of s.barrier.signal (non-var version) for named barriers. (But maybe the answer is that we just shouldn't do that and instead make sure that s.barrier.signal.var is always good enough?)
  • We do document a whole bunch of fairly internal stuff in this document, like the function call ABI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does actually matter to users if we cop to allowing the use of s.barrier.signal (non-var version) for named barriers. (But maybe the answer is that we just shouldn't do that and instead make sure that s.barrier.signal.var is always good enough?)

Yes please, I would like to keep the GV-backed side of things isolated so the address can be an implementation detail, and we can rediscuss it at a future date.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurred to me that the bit representation of the pointers matters if a user wants to interop between compiled code and handwritten assembly using function calls, which is also part of the justification for documenting the function call ABI in this document.

But I already removed it from this PR and I'm not going to insist that we add it back.

~~~~~~~~~~~~~~

Named barriers are represented as memory objects of type
``target("amdgcn.named.barrier", 0)``. They are allocated as global variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call a spade a spade and say these are token objects with no actual storage in LDS, and that they're only used as lowering hints for named barrier IDs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be consistent about whether internal implementation details are mentioned here or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find the framing as "lowering hints" weird. If we did that, then logically speaking we'd have to say that global objects are lowering hints for addresses, which... sure, that's true in a sense, but why would you phrase it that way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lowering hint in the sense this is just a convenience for frontends. It does not have any actually addressable backing memory (at least not in LDS). It's a fake GV for all intents and purposes.

I'd really like to not mention any implementation details and keep this about how the GV and the target type are used, what is its impact on the memory model (none, in theory), and that's it.

It's fine to not use the term token type/lowering hint, but I think it should at least say the intent behind this GV: it's to assist in selecting barrier IDs. It's not (it should not be) intended to represent the memory behind the barrier state as that is completely opaque to the software and we know nothing about how it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have a philosophical disagreement. There is addressable backing memory in hardware, it's just not the kind of memory that consists of the usual bits and bytes. But it doesn't really matter for what's written in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree to disagree for now. I prefer to not qualify this memory as addressable until we resolve the disagreement around using the LDS AS to represent it.

hardware barrier ID. The integral representation of the null named barrier is
``0x0080'0000``.

It is not legal to attempt to form a pointer to any non-named barrier objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this just raised more questions than it answered. What is a non-named barrier object and how would I even attempt to form a pointer to one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. The intention was to make it clear that you're not supposed to try to form a pointer to the default workgroup barrier, but it probably does confuse more than clarify. I'm going to remove it.


.. code-block:: llvm

declare void @llvm.amdgcn.s.barrier.signal(i32 %barrier_hw_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be i32 %barrier_type for consistency with other intrinsics below? (TBH I'm a bit confused about the meaning of the i32 argument, for the intrinsics that don't take a ptr argument.)

Copy link
Collaborator Author

@nhaehnle nhaehnle Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They really are different hings. s.barrier.wait does not care about the named barrier ID, it only cares about distinguishing between named barriers vs. default barriers.

@nhaehnle
Copy link
Collaborator Author

Here's a new version that skips over implementation details and only says to use s.barrier.signal.var with named barriers.


Named barriers do not have an underlying byte representation.
It is undefined behavior to use a pointer to any part of a named barrier object
as the pointer operand of a regular memory access instruction or intrinsic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to add something to say the pointer has to be used directly, so cannot be offset (gep) or undergo any kind of manipulation such as inttoptr/ptrtoint?


.. _amdgpu-memory-model-barriers:

Hardware Barriers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit more and I think it'd be better to not land this section. I'm looking at the barrier execution & memory model right now and it's just going to be confusing if this lands only to be removed short term.
What do you think? I think this patch is fine if it just documents the LDS GV. It doesn't need to extend into a barrier memory model.

Curious to hear what other reviewers think too, if others prefer if this lands even if it's reverted later, then it's also fine for me.

~~~~~~~~~~~~~~

Named barriers are represented as memory objects of type
``target("amdgcn.named.barrier", 0)``. They are allocated as global variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree to disagree for now. I prefer to not qualify this memory as addressable until we resolve the disagreement around using the LDS AS to represent it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants