Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Set MaxAtomicSizeInBitsSupported. #74385

Merged
merged 2 commits into from
Dec 11, 2023

Conversation

jyknight
Copy link
Member

@jyknight jyknight commented Dec 4, 2023

This will result in larger atomic operations getting expanded to __atomic_* libcalls via AtomicExpandPass, which matches what Clang already does in the frontend.

Additionally, adjust some comments, and remove partial code dealing with larger-than-128bit atomics, as it's now unreachable.

AArch64 always supports 128-bit atomics, so there's no conditionals needed here. (Though: we really ought to require that a 128-bit load is available, not just a cmpxchg, which would mean conditioning on LSE2. But that's future work.)

The arm64-irtranslator.ll test was adjusted as it was using an i258 type as a hack to avoid IR atomic lowering to test GlobalISel behavior. Pass -mattr=+lse and use i32, instead, to accomplish that goal in a way that continues to work.

This will result in larger atomic operations getting expanded to
__atomic_* libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.

Additionally, adjust some comments, and remove partial code dealing
with larger-than-128bit atomics, as it's now unreachable.

AArch64 always supports 128-bit atomics, so there's no conditionals
needed here. (Though: we should really require that a 128-bit load is
available, not just a cmpxchg, which would mean LSE2. But that's
future work.)

The arm64-irtranslator.ll test was adjusted as it was using an i258
type as a hack to avoid IR atomic lowering to test GlobalISel
behavior. Pass -mattr=+lse and use i32, instead, to accomplish that
goal in a way that continues to work.
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 4, 2023

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: James Y Knight (jyknight)

Changes

This will result in larger atomic operations getting expanded to _atomic* libcalls via AtomicExpandPass, which matches what Clang already does in the frontend.

Additionally, adjust some comments, and remove partial code dealing with larger-than-128bit atomics, as it's now unreachable.

AArch64 always supports 128-bit atomics, so there's no conditionals needed here. (Though: we really ought to require that a 128-bit load is available, not just a cmpxchg, which would mean conditioning on LSE2. But that's future work.)

The arm64-irtranslator.ll test was adjusted as it was using an i258 type as a hack to avoid IR atomic lowering to test GlobalISel behavior. Pass -mattr=+lse and use i32, instead, to accomplish that goal in a way that continues to work.


Full diff: https://github.com/llvm/llvm-project/pull/74385.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+11-4)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll (+61-104)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index b6a16217dfae3..b82230c9b8d98 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1651,6 +1651,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
   PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();
 
   IsStrictFPEnabled = true;
+  setMaxAtomicSizeInBitsSupported(128);
 }
 
 void AArch64TargetLowering::addTypeForNEON(MVT VT) {
@@ -24888,15 +24889,21 @@ AArch64TargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
                              : AtomicExpansionKind::LLSC;
 }
 
-// For the real atomic operations, we have ldxr/stxr up to 128 bits,
+// The "default" for integer RMW operations is to expand to an LL/SC loop.
+// However, with the LSE instructions (or outline-atomics mode, which provides
+// library routines in place of the LSE-instructions), we can directly emit many
+// operations instead.
+//
+// Floating-point operations are always emitted to a cmpxchg loop, because they
+// may trigger a trap which aborts an LLSC sequence.
 TargetLowering::AtomicExpansionKind
 AArch64TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
+  unsigned Size = AI->getType()->getPrimitiveSizeInBits();
+  assert(Size <= 128 && "AtomicExpandPass should've handled larger sizes.");
+
   if (AI->isFloatingPointOperation())
     return AtomicExpansionKind::CmpXChg;
 
-  unsigned Size = AI->getType()->getPrimitiveSizeInBits();
-  if (Size > 128) return AtomicExpansionKind::None;
-
   bool CanUseLSE128 = Subtarget->hasLSE128() && Size == 128 &&
                       (AI->getOperation() == AtomicRMWInst::Xchg ||
                        AI->getOperation() == AtomicRMWInst::Or ||
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
index 575cd6b874e35..92ddc6309546f 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
@@ -1,5 +1,5 @@
-; RUN: llc -O0 -aarch64-enable-atomic-cfg-tidy=0 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 | FileCheck %s
-; RUN: llc -O3 -aarch64-enable-atomic-cfg-tidy=0 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 | FileCheck %s --check-prefix=O3
+; RUN: llc -O0 -aarch64-enable-atomic-cfg-tidy=0 -mattr=+lse -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 | FileCheck %s
+; RUN: llc -O3 -aarch64-enable-atomic-cfg-tidy=0 -mattr=+lse -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 | FileCheck %s --check-prefix=O3
 
 ; This file checks that the translation from llvm IR to generic MachineInstr
 ; is correct.
@@ -2077,190 +2077,147 @@ done:
 }
 
 ; Try a monotonic atomicrmw xchg
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_xchg(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_xchg
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_XCHG [[ADDR]](p0), [[VAL]] :: (load store monotonic (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw xchg ptr %addr, i256 1 monotonic
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_XCHG [[ADDR]](p0), [[VAL]] :: (load store monotonic (s32) on %ir.addr)
+  %oldval = atomicrmw xchg ptr %addr, i32 1 monotonic
+  ret i32 %oldval
 }
 
 ; Try an acquire atomicrmw add
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_add(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_add
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_ADD [[ADDR]](p0), [[VAL]] :: (load store acquire (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw add ptr %addr, i256 1 acquire
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_ADD [[ADDR]](p0), [[VAL]] :: (load store acquire (s32) on %ir.addr)
+  %oldval = atomicrmw add ptr %addr, i32 1 acquire
+  ret i32 %oldval
 }
 
 ; Try a release atomicrmw sub
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_sub(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_sub
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_SUB [[ADDR]](p0), [[VAL]] :: (load store release (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw sub ptr %addr, i256 1 release
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_SUB [[ADDR]](p0), [[VAL]] :: (load store release (s32) on %ir.addr)
+  %oldval = atomicrmw sub ptr %addr, i32 1 release
+  ret i32 %oldval
 }
 
 ; Try an acq_rel atomicrmw and
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_and(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_and
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_AND [[ADDR]](p0), [[VAL]] :: (load store acq_rel (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw and ptr %addr, i256 1 acq_rel
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
-}
-
-; Try an seq_cst atomicrmw nand
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_AND [[ADDR]](p0), [[VAL]] :: (load store acq_rel (s32) on %ir.addr)
+  %oldval = atomicrmw and ptr %addr, i32 1 acq_rel
+  ret i32 %oldval
+}
+
+; Try an seq_cst atomicrmw nand. NAND isn't supported by LSE, so it
+; expands to G_ATOMIC_CMPXCHG_WITH_SUCCESS.
 define i32 @test_atomicrmw_nand(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_nand
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
+; CHECK-NEXT:  successors: %bb.2(0x80000000)
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_NAND [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw nand ptr %addr, i256 1 seq_cst
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[NEG1:%[0-9]+]]:_(s32) = G_CONSTANT i32 -1
+; CHECK-NEXT:    [[OLDVALSTART:%[0-9]+]]:_(s32) = G_LOAD [[ADDR]](p0) :: (load (s32) from %ir.addr)
+; CHECK:       bb.2.atomicrmw.start:
+; CHECK-NEXT:    successors: %bb.3({{[^)]+}}), %bb.2({{[^)]+}})
+; CHECK:         [[OLDVAL:%[0-9]+]]:_(s32) = G_PHI [[OLDVALSTART]](s32), %bb.1, [[OLDVALRES:%[0-9]+]](s32), %bb.2
+; CHECK-NEXT:    [[AND:%[0-9]+]]:_(s32) = G_AND [[OLDVAL]], [[VAL]]
+; CHECK-NEXT:    [[NEWVAL:%[0-9]+]]:_(s32) = G_XOR [[AND]], [[NEG1]]
+; CHECK:         [[OLDVALRES]]:_(s32), [[SUCCESS:%[0-9]+]]:_(s1) = G_ATOMIC_CMPXCHG_WITH_SUCCESS [[ADDR]](p0), [[OLDVAL]], [[NEWVAL]] :: (load store seq_cst seq_cst (s32) on %ir.addr)
+; CHECK-NEXT:    G_BRCOND [[SUCCESS]](s1), %bb.3
+; CHECK-NEXT:    G_BR %bb.2
+; CHECK:       bb.3.atomicrmw.end:
+  %oldval = atomicrmw nand ptr %addr, i32 1 seq_cst
+  ret i32 %oldval
 }
 
 ; Try an seq_cst atomicrmw or
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_or(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_or
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_OR [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw or ptr %addr, i256 1 seq_cst
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_OR [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s32) on %ir.addr)
+  %oldval = atomicrmw or ptr %addr, i32 1 seq_cst
+  ret i32 %oldval
 }
 
 ; Try an seq_cst atomicrmw xor
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_xor(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_xor
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_XOR [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw xor ptr %addr, i256 1 seq_cst
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_XOR [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s32) on %ir.addr)
+  %oldval = atomicrmw xor ptr %addr, i32 1 seq_cst
+  ret i32 %oldval
 }
 
 ; Try an seq_cst atomicrmw min
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_min(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_min
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_MIN [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw min ptr %addr, i256 1 seq_cst
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_MIN [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s32) on %ir.addr)
+  %oldval = atomicrmw min ptr %addr, i32 1 seq_cst
+  ret i32 %oldval
 }
 
 ; Try an seq_cst atomicrmw max
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_max(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_max
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_MAX [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw max ptr %addr, i256 1 seq_cst
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_MAX [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s32) on %ir.addr)
+  %oldval = atomicrmw max ptr %addr, i32 1 seq_cst
+  ret i32 %oldval
 }
 
 ; Try an seq_cst atomicrmw unsigned min
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_umin(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_umin
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_UMIN [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw umin ptr %addr, i256 1 seq_cst
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_UMIN [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s32) on %ir.addr)
+  %oldval = atomicrmw umin ptr %addr, i32 1 seq_cst
+  ret i32 %oldval
 }
 
 ; Try an seq_cst atomicrmw unsigned max
-; AArch64 will expand some atomicrmw's at the LLVM-IR level so we use a wide type to avoid this.
 define i32 @test_atomicrmw_umax(ptr %addr) {
 ; CHECK-LABEL: name: test_atomicrmw_umax
 ; CHECK:       bb.1 (%ir-block.{{[0-9]+}}):
 ; CHECK-NEXT:  liveins: $x0
 ; CHECK:         [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
-; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
-; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s256) = G_ATOMICRMW_UMAX [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s256) on %ir.addr)
-; CHECK-NEXT:    [[RES:%[0-9]+]]:_(s32) = G_TRUNC [[OLDVALRES]]
-  %oldval = atomicrmw umax ptr %addr, i256 1 seq_cst
-  ; FIXME: We currently can't lower 'ret i256' and it's not the purpose of this
-  ;        test so work around it by truncating to i32 for now.
-  %oldval.trunc = trunc i256 %oldval to i32
-  ret i32 %oldval.trunc
+; CHECK-NEXT:    [[VAL:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+; CHECK-NEXT:    [[OLDVALRES:%[0-9]+]]:_(s32) = G_ATOMICRMW_UMAX [[ADDR]](p0), [[VAL]] :: (load store seq_cst (s32) on %ir.addr)
+  %oldval = atomicrmw umax ptr %addr, i32 1 seq_cst
+  ret i32 %oldval
 }
 
 @addr = global ptr null

@tschuett
Copy link
Member

tschuett commented Dec 5, 2023

IIRC, I can use a 128-bit cmpxchg to implement atomic 128-bit loads.

@jyknight
Copy link
Member Author

jyknight commented Dec 5, 2023

Yes, that's what we already do today (and this PR doesn't change it), but that is:

  • Invalid, because load doesn't work on read-only memory (it's not particularly useful to do an atomic-load on read-only-memory, but is permitted at the C/C++ level, and can arise in generic usages -- we miscompile this code today)
  • Unexpectedly-poorly-performing, because loads cause contention with each-other. (This upsets users who quite reasonably believe that lock-free atomics should not contend on loads)

It's unfortunate that we've been emitting it so far, and on both X86 and AArch64 I think we should be falling back to libatomic unless load is available natively. (But, again: future work, not this PR.)

@jyknight
Copy link
Member Author

Ping; anyone feel like LGTM'ing this easy change? :)

@davemgreen
Copy link
Collaborator

I see you've removed i256 atomics from some of the tests (makes sense). Should we be adding i256 tests somewhere now, to test what this patch is aiming to fix?

@jyknight
Copy link
Member Author

Should we be adding i256 tests somewhere now, to test what this patch is aiming to fix?

Yes! Done.

@jyknight jyknight merged commit 876816f into llvm:main Dec 11, 2023
3 of 4 checks passed
@jyknight jyknight deleted the atomic-max-aarch64 branch December 11, 2023 22:55
@nico
Copy link
Contributor

nico commented Dec 11, 2023

Looks like this breaks tests on Mac: http://45.33.8.238/macm1/74710/step_11.txt

Please take a look.

@MaskRay
Copy link
Member

MaskRay commented Dec 11, 2023

Looks like this breaks tests on Mac: http://45.33.8.238/macm1/74710/step_11.txt

Please take a look.

Fixed by 072cea6 using llc -mtriple= instead of llc -march=

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants