Skip to content

Commit

Permalink
[LangRef] adjust IR atomics specification following C++20 model tweak…
Browse files Browse the repository at this point in the history
…s. (#77263)

C++20 accepted two papers, [P0668](https://wg21.link/P0668) and
[P0982](https://wg21.link/P0982), which changed the atomics memory model
slightly in order to reflect the realities of the existing
implementations.

The rationale for these changes applies as well to the LLVM IR atomics
model. No code changes are expected to be required from this change: it
is primarily a matter of more-correctly-documenting the existing state
of the world.

There's three changes: two of them weaken guarantees, and one
strengthens them:

1. The memory ordering guaranteed by some backends/CPUs when seq_cst
operations are mixed with acquire/release operations on the same
location was weaker than the spec guaranteed. Therefore, the
specification is changed to remove the requirement that seq_cst ordering
is consistent with happens-before, and replaces it with a slightly
weaker requirement of consistency with a new relation named
strongly-happens-before.

2. The rules for a "release sequence" were weakened. Previously, an
acquire synchronizes with an release even if it observes a later
monotonic store from the same thread as the release store. That has now
been removed: now, only read-modify-write operations can extend a
release sequence.

3. The model for a a seq_cst fence is strengthened, such that placing a
seq_cst between monotonic accesses now _is_ sufficient to guarantee
sequential consistency in the model (as it always has been on existing
implementations.)

Note that I've directly referenced the C++ standard's atomics.order
section for the precise semantics of seq_cst, instead of fully
describing them. They are quite complex, and a lot of work has gone into
refining the words in the standard. I'm afraid if I attempt to reiterate
them, I would only introduce errors.
  • Loading branch information
jyknight committed Jan 23, 2024
1 parent 3942027 commit 8a45cec
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 59 deletions.
40 changes: 24 additions & 16 deletions llvm/docs/Atomics.rst
Expand Up @@ -14,9 +14,16 @@ asynchronous signals.
The atomic instructions are designed specifically to provide readable IR and
optimized code generation for the following:

* The C++11 ``<atomic>`` header. (`C++11 draft available here
<http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
<http://www.open-std.org/jtc1/sc22/wg14/>`_.)
* The C++ ``<atomic>`` header and C ``<stdatomic.h>`` headers. These
were originally added in C++11 and C11. The memory model has been
subsequently adjusted to correct errors in the initial
specification, so LLVM currently intends to implement the version
specified by C++20. (See the `C++20 draft standard
<https://isocpp.org/files/papers/N4860.pdf>`_ or the unofficial
`latest C++ draft <https://eel.is/c++draft/>`_. A `C2x draft
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf>`_ is
also available, though the text has not yet been updated with the
errata corrected by C++20.)

* Proper semantics for Java-style memory, for both ``volatile`` and regular
shared variables. (`Java Specification
Expand Down Expand Up @@ -110,13 +117,14 @@ where threads and signals are involved.
atomic store (where the store is conditional for ``cmpxchg``), but no other
memory operation can happen on any thread between the load and store.

A ``fence`` provides Acquire and/or Release ordering which is not part of
another operation; it is normally used along with Monotonic memory operations.
A Monotonic load followed by an Acquire fence is roughly equivalent to an
Acquire load, and a Monotonic store following a Release fence is roughly
equivalent to a Release store. SequentiallyConsistent fences behave as both
an Acquire and a Release fence, and offer some additional complicated
guarantees, see the C++11 standard for details.
A ``fence`` provides Acquire and/or Release ordering which is not part
of another operation; it is normally used along with Monotonic memory
operations. A Monotonic load followed by an Acquire fence is roughly
equivalent to an Acquire load, and a Monotonic store following a
Release fence is roughly equivalent to a Release
store. SequentiallyConsistent fences behave as both an Acquire and a
Release fence, and additionally provide a total ordering with some
complicated guarantees, see the C++ standard for details.

Frontends generating atomic instructions generally need to be aware of the
target to some degree; atomic instructions are guaranteed to be lock-free, and
Expand Down Expand Up @@ -222,7 +230,7 @@ essentially guarantees that if you take all the operations affecting a specific
address, a consistent ordering exists.

Relevant standard
This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
This corresponds to the C++/C ``memory_order_relaxed``; see those
standards for the exact definition.

Notes for frontends
Expand Down Expand Up @@ -252,8 +260,8 @@ Acquire provides a barrier of the sort necessary to acquire a lock to access
other memory with normal loads and stores.

Relevant standard
This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
used for C++11/C11 ``memory_order_consume``.
This corresponds to the C++/C ``memory_order_acquire``. It should also be
used for C++/C ``memory_order_consume``.

Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
Expand Down Expand Up @@ -282,7 +290,7 @@ Release is similar to Acquire, but with a barrier of the sort necessary to
release a lock.

Relevant standard
This corresponds to the C++11/C11 ``memory_order_release``.
This corresponds to the C++/C ``memory_order_release``.

Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
Expand All @@ -308,7 +316,7 @@ AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
barrier (for fences and operations which both read and write memory).

Relevant standard
This corresponds to the C++11/C11 ``memory_order_acq_rel``.
This corresponds to the C++/C ``memory_order_acq_rel``.

Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
Expand All @@ -331,7 +339,7 @@ and Release semantics for stores. Additionally, it guarantees that a total
ordering exists between all SequentiallyConsistent operations.

Relevant standard
This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
This corresponds to the C++/C ``memory_order_seq_cst``, Java volatile, and
the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.

Notes for frontends
Expand Down
54 changes: 32 additions & 22 deletions llvm/docs/LangRef.rst
Expand Up @@ -3312,15 +3312,15 @@ Memory Model for Concurrent Operations
The LLVM IR does not define any way to start parallel threads of
execution or to register signal handlers. Nonetheless, there are
platform-specific ways to create them, and we define LLVM IR's behavior
in their presence. This model is inspired by the C++0x memory model.
in their presence. This model is inspired by the C++ memory model.

For a more informal introduction to this model, see the :doc:`Atomics`.

We define a *happens-before* partial order as the least partial order
that

- Is a superset of single-thread program order, and
- When a *synchronizes-with* ``b``, includes an edge from ``a`` to
- When ``a`` *synchronizes-with* ``b``, includes an edge from ``a`` to
``b``. *Synchronizes-with* pairs are introduced by platform-specific
techniques, like pthread locks, thread creation, thread joining,
etc., and by atomic instructions. (See also :ref:`Atomic Memory Ordering
Expand Down Expand Up @@ -3384,13 +3384,12 @@ Atomic instructions (:ref:`cmpxchg <i_cmpxchg>`,
:ref:`atomicrmw <i_atomicrmw>`, :ref:`fence <i_fence>`,
:ref:`atomic load <i_load>`, and :ref:`atomic store <i_store>`) take
ordering parameters that determine which other atomic instructions on
the same address they *synchronize with*. These semantics are borrowed
from Java and C++0x, but are somewhat more colloquial. If these
descriptions aren't precise enough, check those specs (see spec
references in the :doc:`atomics guide <Atomics>`).
:ref:`fence <i_fence>` instructions treat these orderings somewhat
differently since they don't take an address. See that instruction's
documentation for details.
the same address they *synchronize with*. These semantics implement
the Java or C++ memory models; if these descriptions aren't precise
enough, check those specs (see spec references in the
:doc:`atomics guide <Atomics>`). :ref:`fence <i_fence>` instructions
treat these orderings somewhat differently since they don't take an
address. See that instruction's documentation for details.

For a simpler introduction to the ordering constraints, see the
:doc:`Atomics`.
Expand Down Expand Up @@ -3418,32 +3417,37 @@ For a simpler introduction to the ordering constraints, see the
stronger) operations on the same address. If an address is written
``monotonic``-ally by one thread, and other threads ``monotonic``-ally
read that address repeatedly, the other threads must eventually see
the write. This corresponds to the C++0x/C1x
``memory_order_relaxed``.
the write. This corresponds to the C/C++ ``memory_order_relaxed``.
``acquire``
In addition to the guarantees of ``monotonic``, a
*synchronizes-with* edge may be formed with a ``release`` operation.
This is intended to model C++'s ``memory_order_acquire``.
This is intended to model C/C++'s ``memory_order_acquire``.
``release``
In addition to the guarantees of ``monotonic``, if this operation
writes a value which is subsequently read by an ``acquire``
operation, it *synchronizes-with* that operation. (This isn't a
complete description; see the C++0x definition of a release
sequence.) This corresponds to the C++0x/C1x
operation, it *synchronizes-with* that operation. Furthermore,
this occurs even if the value written by a ``release`` operation
has been modified by a read-modify-write operation before being
read. (Such a set of operations comprises a *release
sequence*). This corresponds to the C/C++
``memory_order_release``.
``acq_rel`` (acquire+release)
Acts as both an ``acquire`` and ``release`` operation on its
address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
address. This corresponds to the C/C++ ``memory_order_acq_rel``.
``seq_cst`` (sequentially consistent)
In addition to the guarantees of ``acq_rel`` (``acquire`` for an
operation that only reads, ``release`` for an operation that only
writes), there is a global total order on all
sequentially-consistent operations on all addresses, which is
consistent with the *happens-before* partial order and with the
modification orders of all the affected addresses. Each
sequentially-consistent operations on all addresses. Each
sequentially-consistent read sees the last preceding write to the
same address in this global order. This corresponds to the C++0x/C1x
``memory_order_seq_cst`` and Java volatile.
same address in this global order. This corresponds to the C/C++
``memory_order_seq_cst`` and Java ``volatile``.

Note: this global total order is *not* guaranteed to be fully
consistent with the *happens-before* partial order if
non-``seq_cst`` accesses are involved. See the C++ standard
`[atomics.order] <https://wg21.link/atomics.order>`_ section
for more details on the exact guarantees.

.. _syncscope:

Expand Down Expand Up @@ -10762,7 +10766,13 @@ still *synchronize-with* the explicit ``fence`` and establish the

A ``fence`` which has ``seq_cst`` ordering, in addition to having both
``acquire`` and ``release`` semantics specified above, participates in
the global program order of other ``seq_cst`` operations and/or fences.
the global program order of other ``seq_cst`` operations and/or
fences. Furthermore, the global ordering created by a ``seq_cst``
fence must be compatible with the individual total orders of
``monotonic`` (or stronger) memory accesses occurring before and after
such a fence. The exact semantics of this interaction are somewhat
complicated, see the C++ standard's `[atomics.order]
<https://wg21.link/atomics.order>`_ section for more details.

A ``fence`` instruction can also take an optional
":ref:`syncscope <syncscope>`" argument.
Expand Down
28 changes: 7 additions & 21 deletions llvm/include/llvm/CodeGen/TargetLowering.h
Expand Up @@ -2166,27 +2166,13 @@ class TargetLoweringBase {
/// This function should either return a nullptr, or a pointer to an IR-level
/// Instruction*. Even complex fence sequences can be represented by a
/// single Instruction* through an intrinsic to be lowered later.
/// Backends should override this method to produce target-specific intrinsic
/// for their fences.
/// FIXME: Please note that the default implementation here in terms of
/// IR-level fences exists for historical/compatibility reasons and is
/// *unsound* ! Fences cannot, in general, be used to restore sequential
/// consistency. For example, consider the following example:
/// atomic<int> x = y = 0;
/// int r1, r2, r3, r4;
/// Thread 0:
/// x.store(1);
/// Thread 1:
/// y.store(1);
/// Thread 2:
/// r1 = x.load();
/// r2 = y.load();
/// Thread 3:
/// r3 = y.load();
/// r4 = x.load();
/// r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all
/// seq_cst. But if they are lowered to monotonic accesses, no amount of
/// IR-level fences can prevent it.
///
/// The default implementation emits an IR fence before any release (or
/// stronger) operation that stores, and after any acquire (or stronger)
/// operation. This is generally a correct implementation, but backends may
/// override if they wish to use alternative schemes (e.g. the PowerPC
/// standard ABI uses a fence before a seq_cst load instead of after a
/// seq_cst store).
/// @{
virtual Instruction *emitLeadingFence(IRBuilderBase &Builder,
Instruction *Inst,
Expand Down

0 comments on commit 8a45cec

Please sign in to comment.