LLVM lowers atomic builtin to a function call when lowering sync builtin to a single instruction with LOCK prefix #38194

llvmbot · 2018-09-05T16:15:37Z


Bugzilla Link	38846
Version	trunk
OS	Linux
Reporter	LLVM Bugzilla Contributor
CC	@zygoloid

Extended Description

=== Test program ===

#include

template
struct attribute((packed)) Foo {
T a;
void incAtomic() {
__atomic_add_fetch(&a, 1, __ATOMIC_RELEASE);
}
void incSync() {

    __sync_add_and_fetch(&a, 1);
}

};

int main(int argc, char** argv) {
Foo<uint16_t> f;
f.a = argc;
f.incAtomic();
f.incSync();
return f.a;
}

=== Disassembled instructions ===
(compiled with clang++ --std=c++17 -O3 -Wall -DNDEBUG=1)

main:
push rax
mov word ptr [rsp], di
mov rdi, rsp
mov esi, 1
mov edx, 3
call __atomic_fetch_add_2
lock add word ptr [rsp], 1 # __sync_add_and_fetch
movzx eax, word ptr [rsp]
pop rcx
ret

GCC's documentation for builtins (https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/_005f_005fatomic-Builtins.html) says that __atomic builtsins are intended to replace __sync builtins, so there should be no semantic difference between corresponding ones.

I believe the difference comes from the alignment checking logic (https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGAtomic.cpp#L768) only included in handling __atomic builtins. However, considering that LOCK prefix works for unaligned accesses as well, I'm not sure if that alignment check is necessary. LOCK for unaligned memory access is expensive, but not sure if it is more expensive than library calls.

The text was updated successfully, but these errors were encountered:

zygoloid · 2018-09-05T19:14:47Z

I think your example has undefined behavior in both the __sync and __atomic forms. You're calling the versions of these builtins that take an unsigned short*, and passing a pointer that is not suitably aligned. This is UB, just as it would be if you called a function that performed a non-atomic load on such a pointer.

However, I think that's largely irrelevant for the purpose of this bug, and we usually do try to guarantee that cases like this work, when the alignment can be determined from the form of the expression. (I think the cause of the difference between the _sync* builtins and the _atomic* builtins is that the former make no attempt to make such cases work, whereas the latter do try to make this work.)

The same behavior can be seen for cases whose behavior is supposed to be defined, such as:

struct A {
char c[4];
} *p, *q;
void f() { __atomic_load(p, q, 5); } // calls __atomic_load_4

struct B {
int n;
} *r, *s;
void g() { __atomic_load(r, s, 5); } // emitted as a load

... so it looks like we use a libcall for unaligned accesses for _atomic, and do not try to support unaligned accesses for _sync. (Note that the _sync* functions only take built-in integral and pointer types, for which alignment is implied, whereas the _atomic* functions take more arbitrary types.)

Now, it's wrong to call __atomic_load_4 in the above example: the "optimized" library functions are only valid to call on properly-aligned pointers. We should be calling the generic __atomic_load library function instead.

Also, it's worth noting that choosing to use inlined code rather than a libcall is an ABI decision, not merely a performance choice: if some part of the program uses inlined code, the libcall implementation must not perform nonatomic accesses under a lock. It's always correct to conservatively use a libcall, but it's not correct to inline code if the libatomic implementation would use a lock for that object.

I would note that GCC generates a call to __atomic_load_16 for a 1-byte-aligned 16 byte load -- https://godbolt.org/z/ScqQB9 -- so we're not the only ones to get this wrong. Also, GCC's libatomic implementation of __atomic_load will use a lock to load an object of type 'struct A' above if it straddles a 16-byte-alignment boundary, so GCC + GNU libatomic gets the f() case wrong too by emitting a 'mov' rather than a libcall.

One final problem is that the compiler-rt implementation of __atomic_load does not even depend on the alignment of the pointer: when given a size for which a naturally-aligned load would be atomic, it just calls __c11_atomic_load, which will in general not be atomic when given an unaligned pointer. (GCC's libatomic gets this right, as far as I can see.)

I would conclude:

in principle and in practice, _sync* atomics do not support unaligned pointers
in principle, _atomic* atomics appear to be intended to support unaligned pointers; in practice, such support does not appear to work

(and the above seems to hold across both Clang and GCC).

The best advice I can give you is to avoid misaligned atomic operations.

llvmbot · 2018-09-05T19:46:29Z

Thank you Richard for your detailed reply! Is unaligned atomic operation UB only for these builtins, or for std::atomic as well? Here https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and seems that the generated code is just same as the one from __sync (in therms of atomic operation).

zygoloid · 2018-09-05T21:11:27Z

Thank you Richard for your detailed reply! Is unaligned atomic operation UB
only for these builtins, or for std::atomic as well? Here
https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and
seems that the generated code is just same as the one from __sync (in therms
of atomic operation).

Generally, when you pass a T* to a function, that function is free to assume that the pointer value is suitably aligned for type T. As a result, you must be extremely careful when using pointers that point to members of a struct declared with attribute((packed)).

If you're using atomic operations on a packed struct as a whole, that should work fine. Both _Atomic(T) and std::atomic will increase the alignment of the atomic type as necessary to make natural atomic operations work on the object, though, so atomic<packed_struct> will typically have an alignment greater than 1.

There's implementation divergence in your example: GCC ignores the attribute((packed)) on the member of type atomic<uint16_t> in Foo2 (and gives you a working program that performs a correct, naturally-aligned atomic access). Clang respects the request, which means you're making a member function call with a misaligned this pointer, which is UB.

You might also be interested in this greatly reduced example of what happens if you try to use __atomic ops on pointers that are not naturally aligned: https://gcc.godbolt.org/z/LqXKrt

Note that GCC generates a plain 'mov' for this 1-byte-aligned 4 byte load. That's not atomic in general. Clang generates a call to __atomic_load_4, which in both compiler-rt and libatomic will result in a plain 'mov'.

llvmbot · 2018-09-06T00:51:19Z

I see. Thank you for the explanation! Could you please point me where in the specification the alignment requirement for atomics is described? I tried to find it by myself but couldn't find one.

zygoloid · 2018-09-06T22:47:38Z

I see. Thank you for the explanation! Could you please point me where in the
specification the alignment requirement for atomics is described?

For which form of atomics? (_sync* or _atomic* or _Atomic or std::atomic?)

llvmbot · 2018-09-07T17:55:45Z

I meant std::atomic. (I guess there's no "standard" for builtins?)

zygoloid · 2018-09-07T20:28:36Z

I meant std::atomic. (I guess there's no "standard" for builtins?)

There is no alignment requirement in any specification for std::atomic; rather, the C++ standard requires it to work for an arbitrary trivially-copyable type, and implementations make that efficient by increasing the alignment where necessary.

In libc++ this is done by implementing it in terms of _Atomic (where available); Clang's implementation of _Atomic increases the alignment as necessary. (The alignment of _Atomic(T) should be specified by the psABI for the target, but in practice it is not.) However, when compiled with GCC, it just uses _atomic* and doesn't increase the alignment, presumably resulting in miscompiles in the cases where GCC's _atomic* functions fail to actually be atomic.

In libstdc++, the GCC bug gcc.gnu.org/#87237 is worked around by increasing the alignment on std::atomic when sizeof(T) is a power of 2 <= 16 and alignof(T) is less than sizeof(T):

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/atomic#L189

llvmbot · 2018-09-07T20:33:27Z

Got it. Thank you for the detailed explanation!

zygoloid · 2018-09-07T20:33:51Z

llvm/llvm-bugzilla-archive#38871 filed for the bug that libc++'s std::atomic is not always atomic when compiling with GCC. (It's a GCC bug but libc++ needs to work around it.)

zygoloid · 2018-09-08T00:01:52Z

As of r341734, clang no longer calls the _builtin_foo lib functions for misaligned pointers.

Kojoley · 2021-07-28T13:43:57Z

*** Bug #25709 has been marked as a duplicate of this bug. ***

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

efriedma-quic mentioned this issue Apr 18, 2022

llvm doesn't generate inline sequence for load/store of unaligned atomic variable on x86_64 #33695

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM lowers atomic builtin to a function call when lowering sync builtin to a single instruction with LOCK prefix #38194

LLVM lowers atomic builtin to a function call when lowering sync builtin to a single instruction with LOCK prefix #38194

llvmbot commented Sep 5, 2018

zygoloid mannequin commented Sep 5, 2018

llvmbot commented Sep 5, 2018

zygoloid mannequin commented Sep 5, 2018

llvmbot commented Sep 6, 2018

zygoloid mannequin commented Sep 6, 2018

llvmbot commented Sep 7, 2018

zygoloid mannequin commented Sep 7, 2018

llvmbot commented Sep 7, 2018

zygoloid mannequin commented Sep 7, 2018

zygoloid mannequin commented Sep 8, 2018

Kojoley commented Jul 28, 2021

LLVM lowers __atomic builtin to a function call when lowering __sync builtin to a single instruction with LOCK prefix #38194

LLVM lowers __atomic builtin to a function call when lowering __sync builtin to a single instruction with LOCK prefix #38194

Comments

llvmbot commented Sep 5, 2018

Extended Description

zygoloid mannequin commented Sep 5, 2018

llvmbot commented Sep 5, 2018

zygoloid mannequin commented Sep 5, 2018

llvmbot commented Sep 6, 2018

zygoloid mannequin commented Sep 6, 2018

llvmbot commented Sep 7, 2018

zygoloid mannequin commented Sep 7, 2018

llvmbot commented Sep 7, 2018

zygoloid mannequin commented Sep 7, 2018

zygoloid mannequin commented Sep 8, 2018

Kojoley commented Jul 28, 2021

LLVM lowers atomic builtin to a function call when lowering sync builtin to a single instruction with LOCK prefix #38194

LLVM lowers atomic builtin to a function call when lowering sync builtin to a single instruction with LOCK prefix #38194