-
Notifications
You must be signed in to change notification settings - Fork 15.7k
Description
I was trying to understand this Rust issue. A simplified version in llvm
target datalayout = "e-p6:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
@COUNTER = internal global [4 x i8] zeroinitializer, align 4
define noundef i32 @atomic_fetch_add() unnamed_addr #1 {
start:
%0 = atomicrmw add ptr @COUNTER, i32 1 acquire, align 4
ret i32 %0
}
attributes #0 = { nofree norecurse noreturn nosync nounwind memory(none) "target-cpu"="sm_120" "target-features"="+ptx87" }compiled with llc --mcpu=sm_120 --mattr=+ptx87 produces
.version 8.7
.target sm_120
.address_size 64
// .globl atomic_fetch_add // -- Begin function atomic_fetch_add
.global .align 4 .b8 COUNTER[4];
// @atomic_fetch_add
.visible .func (.param .b32 func_retval0) atomic_fetch_add()
{
.reg .b32 %r<2>;
// %bb.0: // %start
atom.global.add.u32 %r1, [COUNTER], 1;
st.param.b32 [func_retval0], %r1;
ret;
// -- End function
}
[godbolt]
The acquire ordering is silently discarded even if the PTX ISA version and the target CPU would support it.
I assume this is somewhat an artifact as older GPUs and PTX versions (prior to sm_70 and ptx60) did not support orderings. For Volta+ proper support for atomic load and atomic store was added not long ago.
I was able to follow the NVPTX source code. Compared to ISD::ATOMIC_LOAD and ISD::ATOMIC_STORE, ISD::ATOMIC_LOAD_xxx is not specially handled, and is lowered without ordering.
However, shouldn't this be an error or at least a warning?
I was also not able to find this documented somewhere, which I think it should (at least by this issue).