diff --git a/llvm/docs/Atomics.rst b/llvm/docs/Atomics.rst index 6ad6e1812cb0a..4dee3e6bd9f4f 100644 --- a/llvm/docs/Atomics.rst +++ b/llvm/docs/Atomics.rst @@ -14,9 +14,16 @@ asynchronous signals. The atomic instructions are designed specifically to provide readable IR and optimized code generation for the following: -* The C++11 ```` header. (`C++11 draft available here - `_.) (`C11 draft available here - `_.) +* The C++ ```` header and C ```` headers. These + were originally added in C++11 and C11. The memory model has been + subsequently adjusted to correct errors in the initial + specification, so LLVM currently intends to implement the version + specified by C++20. (See the `C++20 draft standard + `_ or the unofficial + `latest C++ draft `_. A `C2x draft + `_ is + also available, though the text has not yet been updated with the + errata corrected by C++20.) * Proper semantics for Java-style memory, for both ``volatile`` and regular shared variables. (`Java Specification @@ -110,13 +117,14 @@ where threads and signals are involved. atomic store (where the store is conditional for ``cmpxchg``), but no other memory operation can happen on any thread between the load and store. -A ``fence`` provides Acquire and/or Release ordering which is not part of -another operation; it is normally used along with Monotonic memory operations. -A Monotonic load followed by an Acquire fence is roughly equivalent to an -Acquire load, and a Monotonic store following a Release fence is roughly -equivalent to a Release store. SequentiallyConsistent fences behave as both -an Acquire and a Release fence, and offer some additional complicated -guarantees, see the C++11 standard for details. +A ``fence`` provides Acquire and/or Release ordering which is not part +of another operation; it is normally used along with Monotonic memory +operations. A Monotonic load followed by an Acquire fence is roughly +equivalent to an Acquire load, and a Monotonic store following a +Release fence is roughly equivalent to a Release +store. SequentiallyConsistent fences behave as both an Acquire and a +Release fence, and additionally provide a total ordering with some +complicated guarantees, see the C++ standard for details. Frontends generating atomic instructions generally need to be aware of the target to some degree; atomic instructions are guaranteed to be lock-free, and @@ -222,7 +230,7 @@ essentially guarantees that if you take all the operations affecting a specific address, a consistent ordering exists. Relevant standard - This corresponds to the C++11/C11 ``memory_order_relaxed``; see those + This corresponds to the C++/C ``memory_order_relaxed``; see those standards for the exact definition. Notes for frontends @@ -252,8 +260,8 @@ Acquire provides a barrier of the sort necessary to acquire a lock to access other memory with normal loads and stores. Relevant standard - This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be - used for C++11/C11 ``memory_order_consume``. + This corresponds to the C++/C ``memory_order_acquire``. It should also be + used for C++/C ``memory_order_consume``. Notes for frontends If you are writing a frontend which uses this directly, use with caution. @@ -282,7 +290,7 @@ Release is similar to Acquire, but with a barrier of the sort necessary to release a lock. Relevant standard - This corresponds to the C++11/C11 ``memory_order_release``. + This corresponds to the C++/C ``memory_order_release``. Notes for frontends If you are writing a frontend which uses this directly, use with caution. @@ -308,7 +316,7 @@ AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release barrier (for fences and operations which both read and write memory). Relevant standard - This corresponds to the C++11/C11 ``memory_order_acq_rel``. + This corresponds to the C++/C ``memory_order_acq_rel``. Notes for frontends If you are writing a frontend which uses this directly, use with caution. @@ -331,7 +339,7 @@ and Release semantics for stores. Additionally, it guarantees that a total ordering exists between all SequentiallyConsistent operations. Relevant standard - This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and + This corresponds to the C++/C ``memory_order_seq_cst``, Java volatile, and the gcc-compatible ``__sync_*`` builtins which do not specify otherwise. Notes for frontends diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 178029aca98a9..7a7ddc59ba985 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -3312,7 +3312,7 @@ Memory Model for Concurrent Operations The LLVM IR does not define any way to start parallel threads of execution or to register signal handlers. Nonetheless, there are platform-specific ways to create them, and we define LLVM IR's behavior -in their presence. This model is inspired by the C++0x memory model. +in their presence. This model is inspired by the C++ memory model. For a more informal introduction to this model, see the :doc:`Atomics`. @@ -3320,7 +3320,7 @@ We define a *happens-before* partial order as the least partial order that - Is a superset of single-thread program order, and -- When a *synchronizes-with* ``b``, includes an edge from ``a`` to +- When ``a`` *synchronizes-with* ``b``, includes an edge from ``a`` to ``b``. *Synchronizes-with* pairs are introduced by platform-specific techniques, like pthread locks, thread creation, thread joining, etc., and by atomic instructions. (See also :ref:`Atomic Memory Ordering @@ -3384,13 +3384,12 @@ Atomic instructions (:ref:`cmpxchg `, :ref:`atomicrmw `, :ref:`fence `, :ref:`atomic load `, and :ref:`atomic store `) take ordering parameters that determine which other atomic instructions on -the same address they *synchronize with*. These semantics are borrowed -from Java and C++0x, but are somewhat more colloquial. If these -descriptions aren't precise enough, check those specs (see spec -references in the :doc:`atomics guide `). -:ref:`fence ` instructions treat these orderings somewhat -differently since they don't take an address. See that instruction's -documentation for details. +the same address they *synchronize with*. These semantics implement +the Java or C++ memory models; if these descriptions aren't precise +enough, check those specs (see spec references in the +:doc:`atomics guide `). :ref:`fence ` instructions +treat these orderings somewhat differently since they don't take an +address. See that instruction's documentation for details. For a simpler introduction to the ordering constraints, see the :doc:`Atomics`. @@ -3418,32 +3417,37 @@ For a simpler introduction to the ordering constraints, see the stronger) operations on the same address. If an address is written ``monotonic``-ally by one thread, and other threads ``monotonic``-ally read that address repeatedly, the other threads must eventually see - the write. This corresponds to the C++0x/C1x - ``memory_order_relaxed``. + the write. This corresponds to the C/C++ ``memory_order_relaxed``. ``acquire`` In addition to the guarantees of ``monotonic``, a *synchronizes-with* edge may be formed with a ``release`` operation. - This is intended to model C++'s ``memory_order_acquire``. + This is intended to model C/C++'s ``memory_order_acquire``. ``release`` In addition to the guarantees of ``monotonic``, if this operation writes a value which is subsequently read by an ``acquire`` - operation, it *synchronizes-with* that operation. (This isn't a - complete description; see the C++0x definition of a release - sequence.) This corresponds to the C++0x/C1x + operation, it *synchronizes-with* that operation. Furthermore, + this occurs even if the value written by a ``release`` operation + has been modified by a read-modify-write operation before being + read. (Such a set of operations comprises a *release + sequence*). This corresponds to the C/C++ ``memory_order_release``. ``acq_rel`` (acquire+release) Acts as both an ``acquire`` and ``release`` operation on its - address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``. + address. This corresponds to the C/C++ ``memory_order_acq_rel``. ``seq_cst`` (sequentially consistent) In addition to the guarantees of ``acq_rel`` (``acquire`` for an operation that only reads, ``release`` for an operation that only writes), there is a global total order on all - sequentially-consistent operations on all addresses, which is - consistent with the *happens-before* partial order and with the - modification orders of all the affected addresses. Each + sequentially-consistent operations on all addresses. Each sequentially-consistent read sees the last preceding write to the - same address in this global order. This corresponds to the C++0x/C1x - ``memory_order_seq_cst`` and Java volatile. + same address in this global order. This corresponds to the C/C++ + ``memory_order_seq_cst`` and Java ``volatile``. + + Note: this global total order is *not* guaranteed to be fully + consistent with the *happens-before* partial order if + non-``seq_cst`` accesses are involved. See the C++ standard + `[atomics.order] `_ section + for more details on the exact guarantees. .. _syncscope: @@ -10762,7 +10766,13 @@ still *synchronize-with* the explicit ``fence`` and establish the A ``fence`` which has ``seq_cst`` ordering, in addition to having both ``acquire`` and ``release`` semantics specified above, participates in -the global program order of other ``seq_cst`` operations and/or fences. +the global program order of other ``seq_cst`` operations and/or +fences. Furthermore, the global ordering created by a ``seq_cst`` +fence must be compatible with the individual total orders of +``monotonic`` (or stronger) memory accesses occurring before and after +such a fence. The exact semantics of this interaction are somewhat +complicated, see the C++ standard's `[atomics.order] +`_ section for more details. A ``fence`` instruction can also take an optional ":ref:`syncscope `" argument. diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h index ed2b513be9608..c9492b4cf778b 100644 --- a/llvm/include/llvm/CodeGen/TargetLowering.h +++ b/llvm/include/llvm/CodeGen/TargetLowering.h @@ -2166,27 +2166,13 @@ class TargetLoweringBase { /// This function should either return a nullptr, or a pointer to an IR-level /// Instruction*. Even complex fence sequences can be represented by a /// single Instruction* through an intrinsic to be lowered later. - /// Backends should override this method to produce target-specific intrinsic - /// for their fences. - /// FIXME: Please note that the default implementation here in terms of - /// IR-level fences exists for historical/compatibility reasons and is - /// *unsound* ! Fences cannot, in general, be used to restore sequential - /// consistency. For example, consider the following example: - /// atomic x = y = 0; - /// int r1, r2, r3, r4; - /// Thread 0: - /// x.store(1); - /// Thread 1: - /// y.store(1); - /// Thread 2: - /// r1 = x.load(); - /// r2 = y.load(); - /// Thread 3: - /// r3 = y.load(); - /// r4 = x.load(); - /// r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all - /// seq_cst. But if they are lowered to monotonic accesses, no amount of - /// IR-level fences can prevent it. + /// + /// The default implementation emits an IR fence before any release (or + /// stronger) operation that stores, and after any acquire (or stronger) + /// operation. This is generally a correct implementation, but backends may + /// override if they wish to use alternative schemes (e.g. the PowerPC + /// standard ABI uses a fence before a seq_cst load instead of after a + /// seq_cst store). /// @{ virtual Instruction *emitLeadingFence(IRBuilderBase &Builder, Instruction *Inst,