Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomics fixes #992

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
68 changes: 42 additions & 26 deletions src/a.tex
Original file line number Diff line number Diff line change
Expand Up @@ -229,19 +229,27 @@ \section{Load-Reserved/Store-Conditional Instructions}

An SC instruction can never be observed by another RISC-V hart
before the LR instruction that established the reservation.

\begin{commentary}
The LR/SC
sequence can be given acquire semantics by setting the {\em aq} bit on
the LR instruction. The LR/SC sequence can be given release semantics
by setting the {\em rl} bit on the SC instruction. Setting the {\em
aq} bit on the LR instruction, and setting both the {\em aq} and the {\em
rl} bit on the SC instruction makes the LR/SC sequence sequentially
consistent, meaning that it cannot be reordered with earlier or
later memory operations from the same hart.

If neither bit is set on both LR and SC, the LR/SC sequence can be
by setting the {\em rl} bit on the SC instruction. Assuming
suitable mappings for other atomic operations, setting the
{\em aq} bit on the LR instruction, and setting the
{\em rl} bit on the SC instruction makes the LR/SC
sequence sequentially consistent in the C++ {\em memory\_order\_seq\_cst}
sense. Such a sequence does not act as a fence for ordering ordinary
load and store instructions before and after the sequence. Specific
Comment on lines +242 to +243
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true, since the PPO rules order the acquire with all younger instructions and order the release with all older instructions.

In any case, this is why I'd like to constrain the textual changes to the mapping section in the appendix, rather than modifying the normative text in the A chapter that defines the instructions. @daniellustig can you spare some time to work with Hans to organize this PR in a way we'd all find agreeable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misunderstood your request. I did enclose this in a commentary block, which presumably is non-normative?

Regarding the technical issue: If I have

s; lr.aq; sc.rl; l

where the lr and sc refer to the same location, but the s and l refer to different locations, I don't believe there is anything to prevent s and the l from being reordered in between the lr and sc, and then past each other. In that sense, the lr;sc does not act as a fence. You get the same phenomenon with programming-language-level locks and relaxed atomic operations (in spite of some amount of code the incorrectly assumes that critical sections are fences)

If you know of a better way to state this, I would be happy to do so. I think this is worth stating in commentary because many people find it surprising, and I think the Linux kernel has tried to strengthen some atomics primitives to avoid it, something I would argue against.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @hboehm, I misread the TeX: you did, indeed, enclose this note within the commentary, thus making it non-normative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to the technical matter, I agree on further reflection that I can't find a PPO rule that suggests s and l are ordered.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposed change. It did clarify more important things.

As a further improvement, I suggest we state the following fact explicitly:

Making the LR/SC sequence sequentially consistent in the hardware sense requires setting both the aq bit and the rl bit on both of the LR and the SC instruction.

instruction mappings for other C++ atomic operations,
or stronger notions of ``sequential consistency'', may require both
aswaterman marked this conversation as resolved.
Show resolved Hide resolved
bits to be set on either or both of the LR or SC instruction.

If neither bit is set on either LR or SC, the LR/SC sequence can be
observed to occur before or after surrounding memory operations from
the same RISC-V hart. This can be appropriate when the LR/SC
sequence is used to implement a parallel reduction operation.
\end{commentary}

Software should not set the {\em rl} bit on an LR instruction unless the {\em
aq} bit is also set, nor should software set the {\em aq} bit on an SC
Expand Down Expand Up @@ -478,20 +486,6 @@ \section{Atomic Memory Operations}
it cannot be reordered with earlier or later memory operations from the same
hart.

\begin{commentary}
The AMOs were designed to implement the C11 and C++11 memory models
efficiently. Although the FENCE R, RW instruction suffices to
implement the {\em acquire} operation and FENCE RW, W suffices to
implement {\em release}, both imply additional unnecessary ordering as
compared to AMOs with the corresponding {\em aq} or {\em rl} bit set.
\end{commentary}

An example code sequence for a critical section guarded by a
test-and-test-and-set spinlock is shown in Figure~\ref{critical}. Note the
first AMO is marked {\em aq} to order the lock acquisition before the
critical section, and the second AMO is marked {\em rl} to order
the critical section before the lock relinquishment.

\begin{figure}[h!]
\begin{center}
\begin{verbatim}
Expand All @@ -511,14 +505,36 @@ \section{Atomic Memory Operations}
\label{critical}
\end{figure}

\begin{commentary}
The AMOs were designed to implement the C11 and C++11 memory models
hboehm marked this conversation as resolved.
Show resolved Hide resolved
efficiently. Although the FENCE R, RW instruction suffices to
implement the {\em acquire} operation and FENCE RW, W suffices to
implement {\em release}, both imply additional unnecessary ordering as
compared to AMOs with the corresponding {\em aq} or {\em rl} bit set.
\end{commentary}

An example code sequence for a critical section guarded by a
test-and-test-and-set spinlock is shown in Figure~\ref{critical}. Note the
first AMO is marked {\em aq} to order the lock acquisition before the
critical section, and the second AMO is marked {\em rl} to order
the critical section before the lock relinquishment.

\begin{commentary}
We recommend the use of the AMO Swap idiom shown above for both lock
acquire and release to simplify the implementation of speculative lock
elision~\cite{Rajwar:2001:SLE}.
\end{commentary}

The instructions in the ``A'' extension can also be used to provide
sequentially consistent loads and stores. A sequentially consistent load can
be implemented as an LR with both {\em aq} and {\em rl} set. A sequentially
consistent store can be implemented as an AMOSWAP that writes the old value to
x0 and has both {\em aq} and {\em rl} set.
\begin{commentary}
The instructions in the ``A'' extension can be used to provide sequentially
consistent loads and stores, but this constrains hardware
reordering of memory accesses more than necessary.
A C++ sequentially consistent load can be implemented as
an LR with {\em aq} set. However, the LR/SC eventual
success guarantee may slow down concurrent loads from the same effective
address. A sequentially consistent store can be implemented as an AMOSWAP
that writes the old value to x0 and has {\em rl} set. However the superfluous
load may impose ordering constraints that are unnecessary for this use case.
Specific compilation conventions may require both the {\em aq} and {\em rl}
aswaterman marked this conversation as resolved.
Show resolved Hide resolved
bits to be set in either or both the LR and AMOSWAP instructions.
\end{commentary}
1 change: 1 addition & 0 deletions src/memory.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1225,6 +1225,7 @@ \section{Code Porting and Mapping Guidelines}
Table~\ref{tab:c11mappings} provides a mapping of C11/C++11 atomic operations onto RISC-V memory instructions.
If load and store opcodes with {\em aq} and {\em rl} modifiers are introduced, then the mappings in Table~\ref{tab:c11mappings_hypothetical} will suffice.
Note however that the two mappings only interoperate correctly if {\tt atomic\_<op>(memory\_order\_seq\_cst)} is mapped using an LR that has both {\em aq} and {\em rl} set.
Even more importantly, a Table~\ref{tab:c11mappings} sequentially consistent store, followed by a Table~\ref{tab:c11mappings_hypothetical} sequentially consistent load can be reordered unless the Table~\ref{tab:c11mappings} mapping of stores is strengthened by either adding a second fence or mapping the store to {\tt amoswap.rl} instead.

Any AMO can be emulated by an LR/SC pair, but care must be taken to ensure that any PPO orderings that originate from the LR are also made to originate from the SC, and that any PPO orderings that terminate at the SC are also made to terminate at the LR.
For example, the LR must also be made to respect any data dependencies that the AMO has, given that load operations do not otherwise have any notion of a data dependency.
Expand Down