Skip to content

[NVPTX] orderings of atomicrmw instructions are not lowered properly #173993

@kulst

Description

@kulst

I was trying to understand this Rust issue. A simplified version in llvm

target datalayout = "e-p6:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

@COUNTER = internal global [4 x i8] zeroinitializer, align 4

define noundef i32 @atomic_fetch_add() unnamed_addr #1 {
start:
  %0 = atomicrmw add ptr @COUNTER, i32 1 acquire, align 4
  ret i32 %0
}

attributes #0 = { nofree norecurse noreturn nosync nounwind memory(none) "target-cpu"="sm_120" "target-features"="+ptx87" }

compiled with llc --mcpu=sm_120 --mattr=+ptx87 produces

.version 8.7
.target sm_120
.address_size 64

	// .globl	atomic_fetch_add        // -- Begin function atomic_fetch_add
.global .align 4 .b8 COUNTER[4];
                                        // @atomic_fetch_add
.visible .func  (.param .b32 func_retval0) atomic_fetch_add()
{
	.reg .b32 	%r<2>;

// %bb.0:                               // %start
	atom.global.add.u32 	%r1, [COUNTER], 1;
	st.param.b32 	[func_retval0], %r1;
	ret;
                                        // -- End function
}

[godbolt]
The acquire ordering is silently discarded even if the PTX ISA version and the target CPU would support it.

I assume this is somewhat an artifact as older GPUs and PTX versions (prior to sm_70 and ptx60) did not support orderings. For Volta+ proper support for atomic load and atomic store was added not long ago.
I was able to follow the NVPTX source code. Compared to ISD::ATOMIC_LOAD and ISD::ATOMIC_STORE, ISD::ATOMIC_LOAD_xxx is not specially handled, and is lowered without ordering.

However, shouldn't this be an error or at least a warning?
I was also not able to find this documented somewhere, which I think it should (at least by this issue).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions