-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LangRef] adjust IR atomics specification following C++20 model tweaks. #77263
Conversation
[P0668](https://wg21.link/P0668) and [P0982](https://wg21.link/P0982), both accepted into C++20, changed the atomics memory model slightly, in order to reflect the realities of the existing implementations. The rationale for these changes applies as well to the LLVM IR atomics model. No code changes are expected to be required from this change: it is primarily a matter of more-correctly-documenting the existing state of the world. There's three changes: two of them weaken guarantees, and one strengthens them: 1. The memory ordering guaranteed by some backends/CPUs when seq_cst operations are mixed with acquire/release operations on the same location was weaker than the spec guaranteed. Therefore, the specification is changed to remove the requirement that seq_cst ordering is consistent with happens-before, and replaces it with a slightly weaker requirement of consistency with a new relation named strongly-happens-before. 2. The rules for a "release sequence" were weakened. Previously, an acquire synchronizes with an release even if it observes a later monotonic store from the same thread as the release store. That has now been removed: now, only read-modify-write operations can extend a release sequence. 3. The model for a a seq_cst fence is strengthened, such that placing a seq_cst between monotonic accesses now _is_ sufficient to guarantee sequential consistency in the model (as it always has been on existing implementations.) Note that I've directly referenced the C++ standard's atomics.order section for the precise semantics of seq_cst, instead of fully describing them. They are quite complex, and a lot of work has gone into refining the words in the standard. I'm afraid if I attempt to reiterate them, I would only introduce errors.
@llvm/pr-subscribers-llvm-ir Author: James Y Knight (jyknight) ChangesC++20 accepted two papers, P0668 and P0982, which changed the atomics memory model slightly in order to reflect the realities of the existing implementations. The rationale for these changes applies as well to the LLVM IR atomics model. No code changes are expected to be required from this change: it is primarily a matter of more-correctly-documenting the existing state of the world. There's three changes: two of them weaken guarantees, and one strengthens them:
Note that I've directly referenced the C++ standard's atomics.order section for the precise semantics of seq_cst, instead of fully describing them. They are quite complex, and a lot of work has gone into refining the words in the standard. I'm afraid if I attempt to reiterate them, I would only introduce errors. Full diff: https://github.com/llvm/llvm-project/pull/77263.diff 3 Files Affected:
diff --git a/llvm/docs/Atomics.rst b/llvm/docs/Atomics.rst
index 6ad6e1812cb0a6..09b623f226b4b9 100644
--- a/llvm/docs/Atomics.rst
+++ b/llvm/docs/Atomics.rst
@@ -14,9 +14,16 @@ asynchronous signals.
The atomic instructions are designed specifically to provide readable IR and
optimized code generation for the following:
-* The C++11 ``<atomic>`` header. (`C++11 draft available here
- <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
- <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
+* The C++ ``<atomic>`` header and C <stdatomic.h> headers. These were
+ originally added in C++11 and C11. The memory model has been
+ subsequently adjusted to correct errors in the initial
+ specification, so LLVM currently intends to implement the version
+ specified by C++20. (See the `C++20 draft standard
+ <https://isocpp.org/files/papers/N4860.pdf>`_ or the `latest C++
+ draft <https://eel.is/c++draft/>`_. The latest `C2x draft
+ <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf>`_ is
+ also available, though the text has not yet been updated with the
+ errata corrected by C++20.)
* Proper semantics for Java-style memory, for both ``volatile`` and regular
shared variables. (`Java Specification
@@ -110,13 +117,14 @@ where threads and signals are involved.
atomic store (where the store is conditional for ``cmpxchg``), but no other
memory operation can happen on any thread between the load and store.
-A ``fence`` provides Acquire and/or Release ordering which is not part of
-another operation; it is normally used along with Monotonic memory operations.
-A Monotonic load followed by an Acquire fence is roughly equivalent to an
-Acquire load, and a Monotonic store following a Release fence is roughly
-equivalent to a Release store. SequentiallyConsistent fences behave as both
-an Acquire and a Release fence, and offer some additional complicated
-guarantees, see the C++11 standard for details.
+A ``fence`` provides Acquire and/or Release ordering which is not part
+of another operation; it is normally used along with Monotonic memory
+operations. A Monotonic load followed by an Acquire fence is roughly
+equivalent to an Acquire load, and a Monotonic store following a
+Release fence is roughly equivalent to a Release
+store. SequentiallyConsistent fences behave as both an Acquire and a
+Release fence, and additionally provide a total ordering with some
+complicated guarantees, see the C++ standard for details.
Frontends generating atomic instructions generally need to be aware of the
target to some degree; atomic instructions are guaranteed to be lock-free, and
@@ -222,7 +230,7 @@ essentially guarantees that if you take all the operations affecting a specific
address, a consistent ordering exists.
Relevant standard
- This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
+ This corresponds to the C++/C ``memory_order_relaxed``; see those
standards for the exact definition.
Notes for frontends
@@ -252,8 +260,8 @@ Acquire provides a barrier of the sort necessary to acquire a lock to access
other memory with normal loads and stores.
Relevant standard
- This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
- used for C++11/C11 ``memory_order_consume``.
+ This corresponds to the C++/C ``memory_order_acquire``. It should also be
+ used for C++/C ``memory_order_consume``.
Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
@@ -282,7 +290,7 @@ Release is similar to Acquire, but with a barrier of the sort necessary to
release a lock.
Relevant standard
- This corresponds to the C++11/C11 ``memory_order_release``.
+ This corresponds to the C++/C ``memory_order_release``.
Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
@@ -308,7 +316,7 @@ AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
barrier (for fences and operations which both read and write memory).
Relevant standard
- This corresponds to the C++11/C11 ``memory_order_acq_rel``.
+ This corresponds to the C++/C ``memory_order_acq_rel``.
Notes for frontends
If you are writing a frontend which uses this directly, use with caution.
@@ -331,7 +339,7 @@ and Release semantics for stores. Additionally, it guarantees that a total
ordering exists between all SequentiallyConsistent operations.
Relevant standard
- This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
+ This corresponds to the C++/C ``memory_order_seq_cst``, Java volatile, and
the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
Notes for frontends
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index b5918e3063d868..4f31456dfa1f43 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3307,7 +3307,7 @@ Memory Model for Concurrent Operations
The LLVM IR does not define any way to start parallel threads of
execution or to register signal handlers. Nonetheless, there are
platform-specific ways to create them, and we define LLVM IR's behavior
-in their presence. This model is inspired by the C++0x memory model.
+in their presence. This model is inspired by the C++ memory model.
For a more informal introduction to this model, see the :doc:`Atomics`.
@@ -3315,7 +3315,7 @@ We define a *happens-before* partial order as the least partial order
that
- Is a superset of single-thread program order, and
-- When a *synchronizes-with* ``b``, includes an edge from ``a`` to
+- When ``a`` *synchronizes-with* ``b``, includes an edge from ``a`` to
``b``. *Synchronizes-with* pairs are introduced by platform-specific
techniques, like pthread locks, thread creation, thread joining,
etc., and by atomic instructions. (See also :ref:`Atomic Memory Ordering
@@ -3379,13 +3379,12 @@ Atomic instructions (:ref:`cmpxchg <i_cmpxchg>`,
:ref:`atomicrmw <i_atomicrmw>`, :ref:`fence <i_fence>`,
:ref:`atomic load <i_load>`, and :ref:`atomic store <i_store>`) take
ordering parameters that determine which other atomic instructions on
-the same address they *synchronize with*. These semantics are borrowed
-from Java and C++0x, but are somewhat more colloquial. If these
-descriptions aren't precise enough, check those specs (see spec
-references in the :doc:`atomics guide <Atomics>`).
-:ref:`fence <i_fence>` instructions treat these orderings somewhat
-differently since they don't take an address. See that instruction's
-documentation for details.
+the same address they *synchronize with*. These semantics implement
+the Java or C++ memory models; if these descriptions aren't precise
+enough, check those specs (see spec references in the
+:doc:`atomics guide <Atomics>`). :ref:`fence <i_fence>` instructions
+treat these orderings somewhat differently since they don't take an
+address. See that instruction's documentation for details.
For a simpler introduction to the ordering constraints, see the
:doc:`Atomics`.
@@ -3413,8 +3412,7 @@ For a simpler introduction to the ordering constraints, see the
stronger) operations on the same address. If an address is written
``monotonic``-ally by one thread, and other threads ``monotonic``-ally
read that address repeatedly, the other threads must eventually see
- the write. This corresponds to the C++0x/C1x
- ``memory_order_relaxed``.
+ the write. This corresponds to the C++/C ``memory_order_relaxed``.
``acquire``
In addition to the guarantees of ``monotonic``, a
*synchronizes-with* edge may be formed with a ``release`` operation.
@@ -3422,24 +3420,30 @@ For a simpler introduction to the ordering constraints, see the
``release``
In addition to the guarantees of ``monotonic``, if this operation
writes a value which is subsequently read by an ``acquire``
- operation, it *synchronizes-with* that operation. (This isn't a
- complete description; see the C++0x definition of a release
- sequence.) This corresponds to the C++0x/C1x
+ operation, it *synchronizes-with* that operation. Furthermore,
+ this occurs even if the value written by a ``release`` operation
+ has been modified by a read-modify-write operation before being
+ read. (Such a set of operations comprises a *release
+ sequence*). This corresponds to the C++/C
``memory_order_release``.
``acq_rel`` (acquire+release)
Acts as both an ``acquire`` and ``release`` operation on its
- address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
+ address. This corresponds to the C++/C ``memory_order_acq_rel``.
``seq_cst`` (sequentially consistent)
In addition to the guarantees of ``acq_rel`` (``acquire`` for an
operation that only reads, ``release`` for an operation that only
writes), there is a global total order on all
- sequentially-consistent operations on all addresses, which is
- consistent with the *happens-before* partial order and with the
- modification orders of all the affected addresses. Each
+ sequentially-consistent operations on all addresses. Each
sequentially-consistent read sees the last preceding write to the
- same address in this global order. This corresponds to the C++0x/C1x
+ same address in this global order. This corresponds to the C++/C
``memory_order_seq_cst`` and Java volatile.
+ Note: this global total order is *not* guaranteed to be fully
+ consistent with the *happens-before* partial order if
+ non-``seq_cst`` accesses are involved. See the C++ standard
+ `[atomics.order] <https://eel.is/c++draft/atomics.order>`_ section
+ for more details on the exact guarantees.
+
.. _syncscope:
If an atomic operation is marked ``syncscope("singlethread")``, it only
@@ -10754,7 +10758,13 @@ still *synchronize-with* the explicit ``fence`` and establish the
A ``fence`` which has ``seq_cst`` ordering, in addition to having both
``acquire`` and ``release`` semantics specified above, participates in
-the global program order of other ``seq_cst`` operations and/or fences.
+the global program order of other ``seq_cst`` operations and/or
+fences. Furthermore, the global ordering created by a ``seq_cst``
+fence must be compatible with the individual total orders of
+``monotonic`` (or stronger) memory accesses occurring before and after
+such a fence. The exact semantics of this interaction are somewhat
+complicated, see the C++ standard's `[atomics.order]
+<https://eel.is/c++draft/atomics.order>`_ section for more details.
A ``fence`` instruction can also take an optional
":ref:`syncscope <syncscope>`" argument.
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index ed2b513be96086..c9492b4cf778b6 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -2166,27 +2166,13 @@ class TargetLoweringBase {
/// This function should either return a nullptr, or a pointer to an IR-level
/// Instruction*. Even complex fence sequences can be represented by a
/// single Instruction* through an intrinsic to be lowered later.
- /// Backends should override this method to produce target-specific intrinsic
- /// for their fences.
- /// FIXME: Please note that the default implementation here in terms of
- /// IR-level fences exists for historical/compatibility reasons and is
- /// *unsound* ! Fences cannot, in general, be used to restore sequential
- /// consistency. For example, consider the following example:
- /// atomic<int> x = y = 0;
- /// int r1, r2, r3, r4;
- /// Thread 0:
- /// x.store(1);
- /// Thread 1:
- /// y.store(1);
- /// Thread 2:
- /// r1 = x.load();
- /// r2 = y.load();
- /// Thread 3:
- /// r3 = y.load();
- /// r4 = x.load();
- /// r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all
- /// seq_cst. But if they are lowered to monotonic accesses, no amount of
- /// IR-level fences can prevent it.
+ ///
+ /// The default implementation emits an IR fence before any release (or
+ /// stronger) operation that stores, and after any acquire (or stronger)
+ /// operation. This is generally a correct implementation, but backends may
+ /// override if they wish to use alternative schemes (e.g. the PowerPC
+ /// standard ABI uses a fence before a seq_cst load instead of after a
+ /// seq_cst store).
/// @{
virtual Instruction *emitLeadingFence(IRBuilderBase &Builder,
Instruction *Inst,
|
llvm/docs/Atomics.rst
Outdated
* The C++11 ``<atomic>`` header. (`C++11 draft available here | ||
<http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here | ||
<http://www.open-std.org/jtc1/sc22/wg14/>`_.) | ||
* The C++ ``<atomic>`` header and C <stdatomic.h> headers. These were |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<stdatomic.h>
needs backquotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The content of the patch seems fine. Can we come up with links that are more likely to be stable? In particular, eel.is is just someone's personal server. |
I don't think there are "official" HTML links; eel.is is a widely-referenced url for this. However, I swapped the link to https://eel.is/c++draft/atomics.order for the marginally-more-official-perhaps https://wg21.link/atomics.order which is currently a redirect to the former. I didn't see a redirect to the toplevel of the spec (which I referenced in Atomics.rst), so I just left that "unofficial latest C++ draft". It's right next to a link to isocpp.org for the final C++20 anyhow, so if it goes away there's still a usable link. |
llvm/docs/LangRef.rst
Outdated
@@ -3413,33 +3412,38 @@ For a simpler introduction to the ordering constraints, see the | |||
stronger) operations on the same address. If an address is written | |||
``monotonic``-ally by one thread, and other threads ``monotonic``-ally | |||
read that address repeatedly, the other threads must eventually see | |||
the write. This corresponds to the C++0x/C1x | |||
``memory_order_relaxed``. | |||
the write. This corresponds to the C++/C ``memory_order_relaxed``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: C/C++ is the usual formulation you see here, not C++/C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was puzzled by C++/C
formulation as well but was thinking whether jyknight wants to emphasize that C atomics were based on C++'s :)
llvm/docs/LangRef.rst
Outdated
@@ -3413,33 +3412,38 @@ For a simpler introduction to the ordering constraints, see the | |||
stronger) operations on the same address. If an address is written | |||
``monotonic``-ally by one thread, and other threads ``monotonic``-ally | |||
read that address repeatedly, the other threads must eventually see | |||
the write. This corresponds to the C++0x/C1x | |||
``memory_order_relaxed``. | |||
the write. This corresponds to the C++/C ``memory_order_relaxed``. | |||
``acquire`` | |||
In addition to the guarantees of ``monotonic``, a | |||
*synchronizes-with* edge may be formed with a ``release`` operation. | |||
This is intended to model C++'s ``memory_order_acquire``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: here's a bare C++ that could do with C/C++ :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
llvm/docs/LangRef.rst
Outdated
sequentially-consistent read sees the last preceding write to the | ||
same address in this global order. This corresponds to the C++0x/C1x | ||
same address in this global order. This corresponds to the C++/C | ||
``memory_order_seq_cst`` and Java volatile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: volatile should also get double-backticks here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Any more comments here or is this good to go? |
C++20 accepted two papers, P0668 and P0982, which changed the atomics memory model slightly in order to reflect the realities of the existing implementations.
The rationale for these changes applies as well to the LLVM IR atomics model. No code changes are expected to be required from this change: it is primarily a matter of more-correctly-documenting the existing state of the world.
There's three changes: two of them weaken guarantees, and one strengthens them:
The memory ordering guaranteed by some backends/CPUs when seq_cst operations are mixed with acquire/release operations on the same location was weaker than the spec guaranteed. Therefore, the specification is changed to remove the requirement that seq_cst ordering is consistent with happens-before, and replaces it with a slightly weaker requirement of consistency with a new relation named strongly-happens-before.
The rules for a "release sequence" were weakened. Previously, an acquire synchronizes with an release even if it observes a later monotonic store from the same thread as the release store. That has now been removed: now, only read-modify-write operations can extend a release sequence.
The model for a a seq_cst fence is strengthened, such that placing a seq_cst between monotonic accesses now is sufficient to guarantee sequential consistency in the model (as it always has been on existing implementations.)
Note that I've directly referenced the C++ standard's atomics.order section for the precise semantics of seq_cst, instead of fully describing them. They are quite complex, and a lot of work has gone into refining the words in the standard. I'm afraid if I attempt to reiterate them, I would only introduce errors.