-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document that atomic::fetch_add with unused result must generates ldadd with non-zero output register on Arm with LSE #69503
Comments
@llvm/issue-subscribers-backend-aarch64 Author: None (gonzalobg)
The following code:
void inc(std::atomic<uint32_t>* p) {
p->fetch_add(1, std::memory_order_relaxed);
} discards the result of the atomic rmw add operation. On Arm, it generates an ldadd instruction, but I'd expect it to generate an stadd instead. |
@EugeneZelenko @ostannard @gonzalobg Hi, can I work on this issue? |
There is a litmus test from Will here (https://gcc.gnu.org/pipermail/gcc-patches/2018-October/509632.html) that shows that for this optimization to be sound, the implementation of the acquire fence would need to be strengthened to also fence stores, because EDIT: reproducing Will's test for completeness P0 (atomic_int* y,atomic_int* x) {
atomic_store_explicit(x,1,memory_order_relaxed);
atomic_thread_fence(memory_order_release);
atomic_store_explicit(y,1,memory_order_relaxed);
}
P1 (atomic_int* y,atomic_int* x) {
atomic_fetch_add_explicit(y,1,memory_order_relaxed); // STADD
atomic_thread_fence(memory_order_acquire);
int r0 = atomic_load_explicit(x,memory_order_relaxed);
}
P2 (atomic_int* y) {
int r1 = atomic_load_explicit(y,memory_order_relaxed);
} |
Re-opening this issue to document somewhere in the source code why this optimization is unsound for Arm LSE atomics (it's not unsound for other hw archs with similar ldadd vs stadd vs ldadd wzr variants). @ktkachov-arm would it be possible for someone from Arm to prepare the PR to document this and review it? I believe the following herd tests show that
The relevant part of the manual is the Atomic instructions section (C3.2.12 on the copy I have at hand), which states this precise exception:
The herd example for Will's litmus test is the following (test it, e.g., here: https://developer.arm.com/herd7):
which returns 5 states:
Changing the
which means that observing I believe that if, hypothetically speaking, one wanted to allow these optimizations, then either:
|
The following code:
discards the result of the atomic rmw add operation.
On Arm, it generates an
ldadd
instruction, but I'd expect it to generate anstadd
instead, since the result is not used.See https://gcc.godbolt.org/z/7K5YYEjT9
The text was updated successfully, but these errors were encountered: