Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Use Zacas for AtomicRMWInst::Nand i32 and XLen. #80119

Merged
merged 2 commits into from
Jan 31, 2024

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Jan 31, 2024

We don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead.

According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC.

Open to opinions on whether this is the right thing to do.

We don't have an AMO instruction for Nand, so with the A extension
we use an LR/SC loop. If we have Zacas we can use a CAS loop instead.

According to the Zacas spec, a CAS loop scales to highly parallel
systems better than LR/SC.

Open to opinions on whether this is the right thing to do.
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 31, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

We don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead.

According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC.

Open to opinions on whether this is the right thing to do.


Patch is 43.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80119.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+5)
  • (modified) llvm/test/CodeGen/RISCV/atomic-rmw.ll (+664-277)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 2f84dff6bc626..45a66a1e4d2f6 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -19517,6 +19517,11 @@ RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
   unsigned Size = AI->getType()->getPrimitiveSizeInBits();
   if (Size == 8 || Size == 16)
     return AtomicExpansionKind::MaskedIntrinsic;
+
+  if (Subtarget.hasStdExtZacas() && AI->getOperation() == AtomicRMWInst::Nand &&
+      (Size == Subtarget.getXLen() || Size == 32))
+    return AtomicExpansionKind::CmpXChg;
+
   return AtomicExpansionKind::None;
 }
 
diff --git a/llvm/test/CodeGen/RISCV/atomic-rmw.ll b/llvm/test/CodeGen/RISCV/atomic-rmw.ll
index d4c067b7b8a40..62206a6a2ece5 100644
--- a/llvm/test/CodeGen/RISCV/atomic-rmw.ll
+++ b/llvm/test/CodeGen/RISCV/atomic-rmw.ll
@@ -2,15 +2,24 @@
 ; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
 ; RUN:   | FileCheck -check-prefix=RV32I %s
 ; RUN: llc -mtriple=riscv32 -mattr=+a -verify-machineinstrs < %s \
-; RUN:   | FileCheck -check-prefixes=RV32IA,RV32IA-WMO %s
+; RUN:   | FileCheck -check-prefixes=RV32IA,RV32IA-NOZACAS,RV32IA-WMO,RV32IA-WMO-NOZACAS %s
 ; RUN: llc -mtriple=riscv32 -mattr=+a,+experimental-ztso -verify-machineinstrs < %s \
-; RUN:   | FileCheck -check-prefixes=RV32IA,RV32IA-TSO %s
+; RUN:   | FileCheck -check-prefixes=RV32IA,RV32IA-NOZACAS,RV32IA-TSO,RV32IA-TSO-NOZACAS %s
 ; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
 ; RUN:   | FileCheck -check-prefix=RV64I %s
 ; RUN: llc -mtriple=riscv64 -mattr=+a -verify-machineinstrs < %s \
-; RUN:   | FileCheck -check-prefixes=RV64IA,RV64IA-WMO %s
+; RUN:   | FileCheck -check-prefixes=RV64IA,RV64IA-NOZACAS,RV64IA-WMO,RV64IA-WMO-NOZACAS %s
 ; RUN: llc -mtriple=riscv64 -mattr=+a,+experimental-ztso -verify-machineinstrs < %s \
-; RUN:   | FileCheck -check-prefixes=RV64IA,RV64IA-TSO %s
+; RUN:   | FileCheck -check-prefixes=RV64IA,RV64IA-NOZACAS,RV64IA-TSO,RV64IA-TSO-NOZACAS %s
+
+; RUN: llc -mtriple=riscv32 -mattr=+a,+experimental-zacas -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV32IA,RV32IA-ZACAS,RV32IA-WMO,RV32IA-WMO-ZACAS %s
+; RUN: llc -mtriple=riscv32 -mattr=+a,+experimental-ztso,+experimental-zacas -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV32IA,RV32IA-ZACAS,RV32IA-TSO,RV32IA-TSO-ZACAS %s
+; RUN: llc -mtriple=riscv64 -mattr=+a,+experimental-zacas -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV64IA,RV64IA-ZACAS,RV64IA-WMO,RV64IA-WMO-ZACAS %s
+; RUN: llc -mtriple=riscv64 -mattr=+a,+experimental-ztso,+experimental-zacas -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV64IA,RV64IA-ZACAS,RV64IA-TSO,RV64IA-TSO-ZACAS %s
 
 define i8 @atomicrmw_xchg_i8_monotonic(ptr %a, i8 %b) nounwind {
 ; RV32I-LABEL: atomicrmw_xchg_i8_monotonic:
@@ -14831,17 +14840,17 @@ define i32 @atomicrmw_nand_i32_monotonic(ptr %a, i32 %b) nounwind {
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
 ;
-; RV32IA-LABEL: atomicrmw_nand_i32_monotonic:
-; RV32IA:       # %bb.0:
-; RV32IA-NEXT:  .LBB150_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-NEXT:    lr.w a2, (a0)
-; RV32IA-NEXT:    and a3, a2, a1
-; RV32IA-NEXT:    not a3, a3
-; RV32IA-NEXT:    sc.w a3, a3, (a0)
-; RV32IA-NEXT:    bnez a3, .LBB150_1
-; RV32IA-NEXT:  # %bb.2:
-; RV32IA-NEXT:    mv a0, a2
-; RV32IA-NEXT:    ret
+; RV32IA-NOZACAS-LABEL: atomicrmw_nand_i32_monotonic:
+; RV32IA-NOZACAS:       # %bb.0:
+; RV32IA-NOZACAS-NEXT:  .LBB150_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-NOZACAS-NEXT:    lr.w a2, (a0)
+; RV32IA-NOZACAS-NEXT:    and a3, a2, a1
+; RV32IA-NOZACAS-NEXT:    not a3, a3
+; RV32IA-NOZACAS-NEXT:    sc.w a3, a3, (a0)
+; RV32IA-NOZACAS-NEXT:    bnez a3, .LBB150_1
+; RV32IA-NOZACAS-NEXT:  # %bb.2:
+; RV32IA-NOZACAS-NEXT:    mv a0, a2
+; RV32IA-NOZACAS-NEXT:    ret
 ;
 ; RV64I-LABEL: atomicrmw_nand_i32_monotonic:
 ; RV64I:       # %bb.0:
@@ -14853,17 +14862,45 @@ define i32 @atomicrmw_nand_i32_monotonic(ptr %a, i32 %b) nounwind {
 ; RV64I-NEXT:    addi sp, sp, 16
 ; RV64I-NEXT:    ret
 ;
-; RV64IA-LABEL: atomicrmw_nand_i32_monotonic:
-; RV64IA:       # %bb.0:
-; RV64IA-NEXT:  .LBB150_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-NEXT:    lr.w a2, (a0)
-; RV64IA-NEXT:    and a3, a2, a1
-; RV64IA-NEXT:    not a3, a3
-; RV64IA-NEXT:    sc.w a3, a3, (a0)
-; RV64IA-NEXT:    bnez a3, .LBB150_1
-; RV64IA-NEXT:  # %bb.2:
-; RV64IA-NEXT:    mv a0, a2
-; RV64IA-NEXT:    ret
+; RV64IA-NOZACAS-LABEL: atomicrmw_nand_i32_monotonic:
+; RV64IA-NOZACAS:       # %bb.0:
+; RV64IA-NOZACAS-NEXT:  .LBB150_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-NOZACAS-NEXT:    lr.w a2, (a0)
+; RV64IA-NOZACAS-NEXT:    and a3, a2, a1
+; RV64IA-NOZACAS-NEXT:    not a3, a3
+; RV64IA-NOZACAS-NEXT:    sc.w a3, a3, (a0)
+; RV64IA-NOZACAS-NEXT:    bnez a3, .LBB150_1
+; RV64IA-NOZACAS-NEXT:  # %bb.2:
+; RV64IA-NOZACAS-NEXT:    mv a0, a2
+; RV64IA-NOZACAS-NEXT:    ret
+;
+; RV32IA-ZACAS-LABEL: atomicrmw_nand_i32_monotonic:
+; RV32IA-ZACAS:       # %bb.0:
+; RV32IA-ZACAS-NEXT:    mv a2, a0
+; RV32IA-ZACAS-NEXT:    lw a0, 0(a0)
+; RV32IA-ZACAS-NEXT:  .LBB150_1: # %atomicrmw.start
+; RV32IA-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32IA-ZACAS-NEXT:    mv a3, a0
+; RV32IA-ZACAS-NEXT:    and a4, a0, a1
+; RV32IA-ZACAS-NEXT:    not a4, a4
+; RV32IA-ZACAS-NEXT:    amocas.w a0, a4, (a2)
+; RV32IA-ZACAS-NEXT:    bne a0, a3, .LBB150_1
+; RV32IA-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV32IA-ZACAS-NEXT:    ret
+;
+; RV64IA-ZACAS-LABEL: atomicrmw_nand_i32_monotonic:
+; RV64IA-ZACAS:       # %bb.0:
+; RV64IA-ZACAS-NEXT:    mv a2, a0
+; RV64IA-ZACAS-NEXT:    lw a0, 0(a0)
+; RV64IA-ZACAS-NEXT:  .LBB150_1: # %atomicrmw.start
+; RV64IA-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64IA-ZACAS-NEXT:    mv a3, a0
+; RV64IA-ZACAS-NEXT:    and a4, a0, a1
+; RV64IA-ZACAS-NEXT:    not a4, a4
+; RV64IA-ZACAS-NEXT:    amocas.w a0, a4, (a2)
+; RV64IA-ZACAS-NEXT:    bne a0, a3, .LBB150_1
+; RV64IA-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV64IA-ZACAS-NEXT:    ret
   %1 = atomicrmw nand ptr %a, i32 %b monotonic
   ret i32 %1
 }
@@ -14879,29 +14916,29 @@ define i32 @atomicrmw_nand_i32_acquire(ptr %a, i32 %b) nounwind {
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
 ;
-; RV32IA-WMO-LABEL: atomicrmw_nand_i32_acquire:
-; RV32IA-WMO:       # %bb.0:
-; RV32IA-WMO-NEXT:  .LBB151_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-WMO-NEXT:    lr.w.aq a2, (a0)
-; RV32IA-WMO-NEXT:    and a3, a2, a1
-; RV32IA-WMO-NEXT:    not a3, a3
-; RV32IA-WMO-NEXT:    sc.w a3, a3, (a0)
-; RV32IA-WMO-NEXT:    bnez a3, .LBB151_1
-; RV32IA-WMO-NEXT:  # %bb.2:
-; RV32IA-WMO-NEXT:    mv a0, a2
-; RV32IA-WMO-NEXT:    ret
-;
-; RV32IA-TSO-LABEL: atomicrmw_nand_i32_acquire:
-; RV32IA-TSO:       # %bb.0:
-; RV32IA-TSO-NEXT:  .LBB151_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-TSO-NEXT:    lr.w a2, (a0)
-; RV32IA-TSO-NEXT:    and a3, a2, a1
-; RV32IA-TSO-NEXT:    not a3, a3
-; RV32IA-TSO-NEXT:    sc.w a3, a3, (a0)
-; RV32IA-TSO-NEXT:    bnez a3, .LBB151_1
-; RV32IA-TSO-NEXT:  # %bb.2:
-; RV32IA-TSO-NEXT:    mv a0, a2
-; RV32IA-TSO-NEXT:    ret
+; RV32IA-WMO-NOZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV32IA-WMO-NOZACAS:       # %bb.0:
+; RV32IA-WMO-NOZACAS-NEXT:  .LBB151_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-WMO-NOZACAS-NEXT:    lr.w.aq a2, (a0)
+; RV32IA-WMO-NOZACAS-NEXT:    and a3, a2, a1
+; RV32IA-WMO-NOZACAS-NEXT:    not a3, a3
+; RV32IA-WMO-NOZACAS-NEXT:    sc.w a3, a3, (a0)
+; RV32IA-WMO-NOZACAS-NEXT:    bnez a3, .LBB151_1
+; RV32IA-WMO-NOZACAS-NEXT:  # %bb.2:
+; RV32IA-WMO-NOZACAS-NEXT:    mv a0, a2
+; RV32IA-WMO-NOZACAS-NEXT:    ret
+;
+; RV32IA-TSO-NOZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV32IA-TSO-NOZACAS:       # %bb.0:
+; RV32IA-TSO-NOZACAS-NEXT:  .LBB151_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-TSO-NOZACAS-NEXT:    lr.w a2, (a0)
+; RV32IA-TSO-NOZACAS-NEXT:    and a3, a2, a1
+; RV32IA-TSO-NOZACAS-NEXT:    not a3, a3
+; RV32IA-TSO-NOZACAS-NEXT:    sc.w a3, a3, (a0)
+; RV32IA-TSO-NOZACAS-NEXT:    bnez a3, .LBB151_1
+; RV32IA-TSO-NOZACAS-NEXT:  # %bb.2:
+; RV32IA-TSO-NOZACAS-NEXT:    mv a0, a2
+; RV32IA-TSO-NOZACAS-NEXT:    ret
 ;
 ; RV64I-LABEL: atomicrmw_nand_i32_acquire:
 ; RV64I:       # %bb.0:
@@ -14913,29 +14950,85 @@ define i32 @atomicrmw_nand_i32_acquire(ptr %a, i32 %b) nounwind {
 ; RV64I-NEXT:    addi sp, sp, 16
 ; RV64I-NEXT:    ret
 ;
-; RV64IA-WMO-LABEL: atomicrmw_nand_i32_acquire:
-; RV64IA-WMO:       # %bb.0:
-; RV64IA-WMO-NEXT:  .LBB151_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-WMO-NEXT:    lr.w.aq a2, (a0)
-; RV64IA-WMO-NEXT:    and a3, a2, a1
-; RV64IA-WMO-NEXT:    not a3, a3
-; RV64IA-WMO-NEXT:    sc.w a3, a3, (a0)
-; RV64IA-WMO-NEXT:    bnez a3, .LBB151_1
-; RV64IA-WMO-NEXT:  # %bb.2:
-; RV64IA-WMO-NEXT:    mv a0, a2
-; RV64IA-WMO-NEXT:    ret
-;
-; RV64IA-TSO-LABEL: atomicrmw_nand_i32_acquire:
-; RV64IA-TSO:       # %bb.0:
-; RV64IA-TSO-NEXT:  .LBB151_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-TSO-NEXT:    lr.w a2, (a0)
-; RV64IA-TSO-NEXT:    and a3, a2, a1
-; RV64IA-TSO-NEXT:    not a3, a3
-; RV64IA-TSO-NEXT:    sc.w a3, a3, (a0)
-; RV64IA-TSO-NEXT:    bnez a3, .LBB151_1
-; RV64IA-TSO-NEXT:  # %bb.2:
-; RV64IA-TSO-NEXT:    mv a0, a2
-; RV64IA-TSO-NEXT:    ret
+; RV64IA-WMO-NOZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV64IA-WMO-NOZACAS:       # %bb.0:
+; RV64IA-WMO-NOZACAS-NEXT:  .LBB151_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-WMO-NOZACAS-NEXT:    lr.w.aq a2, (a0)
+; RV64IA-WMO-NOZACAS-NEXT:    and a3, a2, a1
+; RV64IA-WMO-NOZACAS-NEXT:    not a3, a3
+; RV64IA-WMO-NOZACAS-NEXT:    sc.w a3, a3, (a0)
+; RV64IA-WMO-NOZACAS-NEXT:    bnez a3, .LBB151_1
+; RV64IA-WMO-NOZACAS-NEXT:  # %bb.2:
+; RV64IA-WMO-NOZACAS-NEXT:    mv a0, a2
+; RV64IA-WMO-NOZACAS-NEXT:    ret
+;
+; RV64IA-TSO-NOZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV64IA-TSO-NOZACAS:       # %bb.0:
+; RV64IA-TSO-NOZACAS-NEXT:  .LBB151_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-TSO-NOZACAS-NEXT:    lr.w a2, (a0)
+; RV64IA-TSO-NOZACAS-NEXT:    and a3, a2, a1
+; RV64IA-TSO-NOZACAS-NEXT:    not a3, a3
+; RV64IA-TSO-NOZACAS-NEXT:    sc.w a3, a3, (a0)
+; RV64IA-TSO-NOZACAS-NEXT:    bnez a3, .LBB151_1
+; RV64IA-TSO-NOZACAS-NEXT:  # %bb.2:
+; RV64IA-TSO-NOZACAS-NEXT:    mv a0, a2
+; RV64IA-TSO-NOZACAS-NEXT:    ret
+;
+; RV32IA-WMO-ZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV32IA-WMO-ZACAS:       # %bb.0:
+; RV32IA-WMO-ZACAS-NEXT:    mv a2, a0
+; RV32IA-WMO-ZACAS-NEXT:    lw a0, 0(a0)
+; RV32IA-WMO-ZACAS-NEXT:  .LBB151_1: # %atomicrmw.start
+; RV32IA-WMO-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32IA-WMO-ZACAS-NEXT:    mv a3, a0
+; RV32IA-WMO-ZACAS-NEXT:    and a4, a0, a1
+; RV32IA-WMO-ZACAS-NEXT:    not a4, a4
+; RV32IA-WMO-ZACAS-NEXT:    amocas.w.aq a0, a4, (a2)
+; RV32IA-WMO-ZACAS-NEXT:    bne a0, a3, .LBB151_1
+; RV32IA-WMO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV32IA-WMO-ZACAS-NEXT:    ret
+;
+; RV32IA-TSO-ZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV32IA-TSO-ZACAS:       # %bb.0:
+; RV32IA-TSO-ZACAS-NEXT:    mv a2, a0
+; RV32IA-TSO-ZACAS-NEXT:    lw a0, 0(a0)
+; RV32IA-TSO-ZACAS-NEXT:  .LBB151_1: # %atomicrmw.start
+; RV32IA-TSO-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32IA-TSO-ZACAS-NEXT:    mv a3, a0
+; RV32IA-TSO-ZACAS-NEXT:    and a4, a0, a1
+; RV32IA-TSO-ZACAS-NEXT:    not a4, a4
+; RV32IA-TSO-ZACAS-NEXT:    amocas.w a0, a4, (a2)
+; RV32IA-TSO-ZACAS-NEXT:    bne a0, a3, .LBB151_1
+; RV32IA-TSO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV32IA-TSO-ZACAS-NEXT:    ret
+;
+; RV64IA-WMO-ZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV64IA-WMO-ZACAS:       # %bb.0:
+; RV64IA-WMO-ZACAS-NEXT:    mv a2, a0
+; RV64IA-WMO-ZACAS-NEXT:    lw a0, 0(a0)
+; RV64IA-WMO-ZACAS-NEXT:  .LBB151_1: # %atomicrmw.start
+; RV64IA-WMO-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64IA-WMO-ZACAS-NEXT:    mv a3, a0
+; RV64IA-WMO-ZACAS-NEXT:    and a4, a0, a1
+; RV64IA-WMO-ZACAS-NEXT:    not a4, a4
+; RV64IA-WMO-ZACAS-NEXT:    amocas.w.aq a0, a4, (a2)
+; RV64IA-WMO-ZACAS-NEXT:    bne a0, a3, .LBB151_1
+; RV64IA-WMO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV64IA-WMO-ZACAS-NEXT:    ret
+;
+; RV64IA-TSO-ZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV64IA-TSO-ZACAS:       # %bb.0:
+; RV64IA-TSO-ZACAS-NEXT:    mv a2, a0
+; RV64IA-TSO-ZACAS-NEXT:    lw a0, 0(a0)
+; RV64IA-TSO-ZACAS-NEXT:  .LBB151_1: # %atomicrmw.start
+; RV64IA-TSO-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64IA-TSO-ZACAS-NEXT:    mv a3, a0
+; RV64IA-TSO-ZACAS-NEXT:    and a4, a0, a1
+; RV64IA-TSO-ZACAS-NEXT:    not a4, a4
+; RV64IA-TSO-ZACAS-NEXT:    amocas.w a0, a4, (a2)
+; RV64IA-TSO-ZACAS-NEXT:    bne a0, a3, .LBB151_1
+; RV64IA-TSO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV64IA-TSO-ZACAS-NEXT:    ret
   %1 = atomicrmw nand ptr %a, i32 %b acquire
   ret i32 %1
 }
@@ -14951,29 +15044,29 @@ define i32 @atomicrmw_nand_i32_release(ptr %a, i32 %b) nounwind {
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
 ;
-; RV32IA-WMO-LABEL: atomicrmw_nand_i32_release:
-; RV32IA-WMO:       # %bb.0:
-; RV32IA-WMO-NEXT:  .LBB152_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-WMO-NEXT:    lr.w a2, (a0)
-; RV32IA-WMO-NEXT:    and a3, a2, a1
-; RV32IA-WMO-NEXT:    not a3, a3
-; RV32IA-WMO-NEXT:    sc.w.rl a3, a3, (a0)
-; RV32IA-WMO-NEXT:    bnez a3, .LBB152_1
-; RV32IA-WMO-NEXT:  # %bb.2:
-; RV32IA-WMO-NEXT:    mv a0, a2
-; RV32IA-WMO-NEXT:    ret
-;
-; RV32IA-TSO-LABEL: atomicrmw_nand_i32_release:
-; RV32IA-TSO:       # %bb.0:
-; RV32IA-TSO-NEXT:  .LBB152_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-TSO-NEXT:    lr.w a2, (a0)
-; RV32IA-TSO-NEXT:    and a3, a2, a1
-; RV32IA-TSO-NEXT:    not a3, a3
-; RV32IA-TSO-NEXT:    sc.w a3, a3, (a0)
-; RV32IA-TSO-NEXT:    bnez a3, .LBB152_1
-; RV32IA-TSO-NEXT:  # %bb.2:
-; RV32IA-TSO-NEXT:    mv a0, a2
-; RV32IA-TSO-NEXT:    ret
+; RV32IA-WMO-NOZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV32IA-WMO-NOZACAS:       # %bb.0:
+; RV32IA-WMO-NOZACAS-NEXT:  .LBB152_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-WMO-NOZACAS-NEXT:    lr.w a2, (a0)
+; RV32IA-WMO-NOZACAS-NEXT:    and a3, a2, a1
+; RV32IA-WMO-NOZACAS-NEXT:    not a3, a3
+; RV32IA-WMO-NOZACAS-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IA-WMO-NOZACAS-NEXT:    bnez a3, .LBB152_1
+; RV32IA-WMO-NOZACAS-NEXT:  # %bb.2:
+; RV32IA-WMO-NOZACAS-NEXT:    mv a0, a2
+; RV32IA-WMO-NOZACAS-NEXT:    ret
+;
+; RV32IA-TSO-NOZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV32IA-TSO-NOZACAS:       # %bb.0:
+; RV32IA-TSO-NOZACAS-NEXT:  .LBB152_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-TSO-NOZACAS-NEXT:    lr.w a2, (a0)
+; RV32IA-TSO-NOZACAS-NEXT:    and a3, a2, a1
+; RV32IA-TSO-NOZACAS-NEXT:    not a3, a3
+; RV32IA-TSO-NOZACAS-NEXT:    sc.w a3, a3, (a0)
+; RV32IA-TSO-NOZACAS-NEXT:    bnez a3, .LBB152_1
+; RV32IA-TSO-NOZACAS-NEXT:  # %bb.2:
+; RV32IA-TSO-NOZACAS-NEXT:    mv a0, a2
+; RV32IA-TSO-NOZACAS-NEXT:    ret
 ;
 ; RV64I-LABEL: atomicrmw_nand_i32_release:
 ; RV64I:       # %bb.0:
@@ -14985,29 +15078,85 @@ define i32 @atomicrmw_nand_i32_release(ptr %a, i32 %b) nounwind {
 ; RV64I-NEXT:    addi sp, sp, 16
 ; RV64I-NEXT:    ret
 ;
-; RV64IA-WMO-LABEL: atomicrmw_nand_i32_release:
-; RV64IA-WMO:       # %bb.0:
-; RV64IA-WMO-NEXT:  .LBB152_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-WMO-NEXT:    lr.w a2, (a0)
-; RV64IA-WMO-NEXT:    and a3, a2, a1
-; RV64IA-WMO-NEXT:    not a3, a3
-; RV64IA-WMO-NEXT:    sc.w.rl a3, a3, (a0)
-; RV64IA-WMO-NEXT:    bnez a3, .LBB152_1
-; RV64IA-WMO-NEXT:  # %bb.2:
-; RV64IA-WMO-NEXT:    mv a0, a2
-; RV64IA-WMO-NEXT:    ret
-;
-; RV64IA-TSO-LABEL: atomicrmw_nand_i32_release:
-; RV64IA-TSO:       # %bb.0:
-; RV64IA-TSO-NEXT:  .LBB152_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-TSO-NEXT:    lr.w a2, (a0)
-; RV64IA-TSO-NEXT:    and a3, a2, a1
-; RV64IA-TSO-NEXT:    not a3, a3
-; RV64IA-TSO-NEXT:    sc.w a3, a3, (a0)
-; RV64IA-TSO-NEXT:    bnez a3, .LBB152_1
-; RV64IA-TSO-NEXT:  # %bb.2:
-; RV64IA-TSO-NEXT:    mv a0, a2
-; RV64IA-TSO-NEXT:    ret
+; RV64IA-WMO-NOZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV64IA-WMO-NOZACAS:       # %bb.0:
+; RV64IA-WMO-NOZACAS-NEXT:  .LBB152_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-WMO-NOZACAS-NEXT:    lr.w a2, (a0)
+; RV64IA-WMO-NOZACAS-NEXT:    and a3, a2, a1
+; RV64IA-WMO-NOZACAS-NEXT:    not a3, a3
+; RV64IA-WMO-NOZACAS-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IA-WMO-NOZACAS-NEXT:    bnez a3, .LBB152_1
+; RV64IA-WMO-NOZACAS-NEXT:  # %bb.2:
+; RV64IA-WMO-NOZACAS-NEXT:    mv a0, a2
+; RV64IA-WMO-NOZACAS-NEXT:    ret
+;
+; RV64IA-TSO-NOZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV64IA-TSO-NOZACAS:       # %bb.0:
+; RV64IA-TSO-NOZACAS-NEXT:  .LBB152_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-TSO-NOZACAS-NEXT:    lr.w a2, (a0)
+; RV64IA-TSO-NOZACAS-NEXT:    and a3, a2, a1
+; RV64IA-TSO-NOZACAS-NEXT:    not a3, a3
+; RV64IA-TSO-NOZACAS-NEXT:    sc.w a3, a3, (a0)
+; RV64IA-TSO-NOZACAS-NEXT:    bnez a3, .LBB152_1
+; RV64IA-TSO-NOZACAS-NEXT:  # %bb.2:
+; RV64IA-TSO-NOZACAS-NEXT:    mv a0, a2
+; RV64IA-TSO-NOZACAS-NEXT:    ret
+;
+; RV32IA-WMO-ZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV32IA-WMO-ZACAS:       # %bb.0:
+; RV32IA-WMO-ZACAS-NEXT:    mv a2, a0
+; RV32IA-WMO-ZACAS-NEXT:    lw a0, 0(a0)
+; RV32IA-WMO-ZACAS-NEXT:  .LBB152_1: # %atomicrmw.start
+; RV32IA-WMO-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32IA-WMO-ZACAS-NEXT:    mv a3, a0
+; RV32IA-WMO-ZACAS-NEXT:    and a4, a0, a1
+; RV32IA-WMO-ZACAS-NEXT:    not a4, a4
+; RV32IA-WMO-ZACAS-NEXT:    amocas.w.rl a0, a4, (a2)
+; RV32IA-WMO-ZACAS-NEXT:    bne a0, a3, .LBB152_1
+; RV32IA-WMO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV32IA-WMO-ZACAS-NEXT:    ret
+;
+; RV32IA-TSO-ZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV32IA-TSO-ZACAS:       # %bb.0:
+; RV32IA-TSO-ZACAS-NEXT:    mv a2, a0
+; RV32IA-TSO-ZACAS-NEXT:    lw a0, 0(a0)
+; RV32IA-TSO-ZACAS-NEXT:  .LBB152_1: # %atomicrmw.start
+; RV32IA-TSO-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32IA-TSO-ZACAS-NEXT:    mv a3, a0
+; RV32IA-TSO-ZACAS-NEXT:    and a4, a0, a1
+; RV32IA-TSO-ZACAS-NEXT:    not a4, a4
+; RV32IA-TSO-ZACAS-NEXT:    amocas.w a0, a4, (a2)
+; RV32IA-TSO-ZACAS-NEXT:    bne a0, a3, .LBB152_1
+; RV32IA-TSO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV32IA-TSO-ZACAS-NEXT:    ret
+;
+; RV64IA-WMO-ZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV64IA-WMO-ZACAS:       # %bb.0:
+; RV64IA-WMO-ZACAS-NEXT:    mv a2, a0
+; RV64IA-WMO-ZACAS-NEXT:    lw a0, 0(a0)
+; RV64IA-WMO-ZACAS-NEXT:  .LBB152_1: # %atomicrmw.start
+; RV64IA-WMO-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64IA-WMO-ZACAS-NEXT:    mv a3, a0
+; RV64IA-WMO-ZACAS-NEXT:    and a4, a0, a1
+; RV64IA-WMO-ZACAS-NEXT:    not a4, a4
+; RV64IA-WMO-ZACAS-NEXT:    amocas.w.rl a0, a4, (a2)
+; RV64IA-WMO-ZACAS-NEXT:    bne a0, a3, .LBB152_1
+; RV64IA-WMO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV64IA-WMO-ZACAS-NEXT:    ret
+;
+; RV64IA-TSO-ZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV64IA-TSO-ZACAS:       # %bb.0:
+; RV64IA-TSO-ZACAS-NEXT:    mv a2, a0
+; RV64IA-TSO-ZACAS-NEXT:    lw a0, 0(a0)
+; RV64IA-TSO-ZACAS-NEXT:  .LBB152_1: # %atomicrmw.start
+; RV64IA-TSO-ZACAS-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64IA-TSO-ZACAS-NEXT:    mv a3, a0
+; RV64IA-TSO-ZACAS-NEXT:    and a4, a0, a1
+; RV64IA-TSO-ZACAS-NEXT:    not a4, a4
+; RV64IA-TSO-ZACAS-NEXT:    amocas.w a0, a4, (a2)
+; RV64IA-TSO-ZACAS-NEXT:    bne a0, a3, .LBB152_1
+; RV64IA-TSO-ZACAS-NEXT:  # %bb.2: # %atomicrmw.end
+; RV64IA-TSO-ZACAS-NEXT:    ret
   %1 = atomicrmw nand ptr %a, i32 %b release
   ret i32 %1
 }
@@ -15023,29 +15172,29 @@ define i32 @atomicrmw_nand_i32_acq_rel(ptr %a, i32 %b) nounwind {
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
 ;
-; RV32IA-WMO-LABEL: atomicrmw_nand_i32_acq_rel:
-; RV32IA-WMO:       # %bb.0:
-; RV32IA-WMO-NEXT:  .LBB153_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-WMO-NEXT:    lr.w.aq a2, (a0)
-; RV32IA-WMO-NEXT:    and a3, a2, a1
-; RV32IA-WMO-NEXT:    not a3, a3
-; RV32IA-WMO-NEXT:    sc.w.rl a3, a3, (a0)
-; RV32IA-WMO-NEXT:    bnez a3, .LBB153_1
-; RV32IA-WMO-NEXT:  # %bb.2:
-; RV32IA-WMO-NEXT:    mv a0, a2
-; RV32IA-WMO-NEXT:    ret
-;
-; RV32IA-TSO-LABEL: atomicrmw_nand_i32_acq_rel:
-; RV32IA-TSO:       # %bb.0:
-; RV32IA-TSO-NEXT:  .LBB153_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-TSO-NEXT:    lr.w a2, (a0)
-; RV32IA-TSO-NEXT:    and a3, a2, a1
-; RV32IA-TSO-NEXT:    not a3, a3
-; RV32IA-TSO-NEXT:    sc.w a3, a3, (a0)
-; RV32IA-T...
[truncated]

@@ -19517,6 +19517,11 @@ RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
unsigned Size = AI->getType()->getPrimitiveSizeInBits();
if (Size == 8 || Size == 16)
return AtomicExpansionKind::MaskedIntrinsic;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to use amocas for i8/i16 if both zacas and zabha are supported.

Copy link
Contributor

@asb asb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using amocas in preference to lr/sc when possible seems very sensible to me. LGTM.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me as well.

@topperc topperc merged commit cf401f7 into llvm:main Jan 31, 2024
4 checks passed
@topperc topperc deleted the pr/zacas-nand branch January 31, 2024 23:37
carlosgalvezp pushed a commit to carlosgalvezp/llvm-project that referenced this pull request Feb 1, 2024
We don't have an AMO instruction for Nand, so with the A extension we
use an LR/SC loop. If we have Zacas we can use a CAS loop instead.

According to the Zacas spec, a CAS loop scales to highly parallel
systems better than LR/SC.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants