-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Use Zacas for AtomicRMWInst::Nand i32 and XLen. #80119
Conversation
We don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead. According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC. Open to opinions on whether this is the right thing to do.
@llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) ChangesWe don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead. According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC. Open to opinions on whether this is the right thing to do. Patch is 43.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80119.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 2f84dff6bc626..45a66a1e4d2f6 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -19517,6 +19517,11 @@ RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
unsigned Size = AI->getType()->getPrimitiveSizeInBits();
if (Size == 8 || Size == 16)
return AtomicExpansionKind::MaskedIntrinsic;
+
+ if (Subtarget.hasStdExtZacas() && AI->getOperation() == AtomicRMWInst::Nand &&
+ (Size == Subtarget.getXLen() || Size == 32))
+ return AtomicExpansionKind::CmpXChg;
+
return AtomicExpansionKind::None;
}
diff --git a/llvm/test/CodeGen/RISCV/atomic-rmw.ll b/llvm/test/CodeGen/RISCV/atomic-rmw.ll
index d4c067b7b8a40..62206a6a2ece5 100644
--- a/llvm/test/CodeGen/RISCV/atomic-rmw.ll
+++ b/llvm/test/CodeGen/RISCV/atomic-rmw.ll
@@ -2,15 +2,24 @@
; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
; RUN: | FileCheck -check-prefix=RV32I %s
; RUN: llc -mtriple=riscv32 -mattr=+a -verify-machineinstrs < %s \
-; RUN: | FileCheck -check-prefixes=RV32IA,RV32IA-WMO %s
+; RUN: | FileCheck -check-prefixes=RV32IA,RV32IA-NOZACAS,RV32IA-WMO,RV32IA-WMO-NOZACAS %s
; RUN: llc -mtriple=riscv32 -mattr=+a,+experimental-ztso -verify-machineinstrs < %s \
-; RUN: | FileCheck -check-prefixes=RV32IA,RV32IA-TSO %s
+; RUN: | FileCheck -check-prefixes=RV32IA,RV32IA-NOZACAS,RV32IA-TSO,RV32IA-TSO-NOZACAS %s
; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
; RUN: | FileCheck -check-prefix=RV64I %s
; RUN: llc -mtriple=riscv64 -mattr=+a -verify-machineinstrs < %s \
-; RUN: | FileCheck -check-prefixes=RV64IA,RV64IA-WMO %s
+; RUN: | FileCheck -check-prefixes=RV64IA,RV64IA-NOZACAS,RV64IA-WMO,RV64IA-WMO-NOZACAS %s
; RUN: llc -mtriple=riscv64 -mattr=+a,+experimental-ztso -verify-machineinstrs < %s \
-; RUN: | FileCheck -check-prefixes=RV64IA,RV64IA-TSO %s
+; RUN: | FileCheck -check-prefixes=RV64IA,RV64IA-NOZACAS,RV64IA-TSO,RV64IA-TSO-NOZACAS %s
+
+; RUN: llc -mtriple=riscv32 -mattr=+a,+experimental-zacas -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=RV32IA,RV32IA-ZACAS,RV32IA-WMO,RV32IA-WMO-ZACAS %s
+; RUN: llc -mtriple=riscv32 -mattr=+a,+experimental-ztso,+experimental-zacas -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=RV32IA,RV32IA-ZACAS,RV32IA-TSO,RV32IA-TSO-ZACAS %s
+; RUN: llc -mtriple=riscv64 -mattr=+a,+experimental-zacas -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=RV64IA,RV64IA-ZACAS,RV64IA-WMO,RV64IA-WMO-ZACAS %s
+; RUN: llc -mtriple=riscv64 -mattr=+a,+experimental-ztso,+experimental-zacas -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=RV64IA,RV64IA-ZACAS,RV64IA-TSO,RV64IA-TSO-ZACAS %s
define i8 @atomicrmw_xchg_i8_monotonic(ptr %a, i8 %b) nounwind {
; RV32I-LABEL: atomicrmw_xchg_i8_monotonic:
@@ -14831,17 +14840,17 @@ define i32 @atomicrmw_nand_i32_monotonic(ptr %a, i32 %b) nounwind {
; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret
;
-; RV32IA-LABEL: atomicrmw_nand_i32_monotonic:
-; RV32IA: # %bb.0:
-; RV32IA-NEXT: .LBB150_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-NEXT: lr.w a2, (a0)
-; RV32IA-NEXT: and a3, a2, a1
-; RV32IA-NEXT: not a3, a3
-; RV32IA-NEXT: sc.w a3, a3, (a0)
-; RV32IA-NEXT: bnez a3, .LBB150_1
-; RV32IA-NEXT: # %bb.2:
-; RV32IA-NEXT: mv a0, a2
-; RV32IA-NEXT: ret
+; RV32IA-NOZACAS-LABEL: atomicrmw_nand_i32_monotonic:
+; RV32IA-NOZACAS: # %bb.0:
+; RV32IA-NOZACAS-NEXT: .LBB150_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-NOZACAS-NEXT: lr.w a2, (a0)
+; RV32IA-NOZACAS-NEXT: and a3, a2, a1
+; RV32IA-NOZACAS-NEXT: not a3, a3
+; RV32IA-NOZACAS-NEXT: sc.w a3, a3, (a0)
+; RV32IA-NOZACAS-NEXT: bnez a3, .LBB150_1
+; RV32IA-NOZACAS-NEXT: # %bb.2:
+; RV32IA-NOZACAS-NEXT: mv a0, a2
+; RV32IA-NOZACAS-NEXT: ret
;
; RV64I-LABEL: atomicrmw_nand_i32_monotonic:
; RV64I: # %bb.0:
@@ -14853,17 +14862,45 @@ define i32 @atomicrmw_nand_i32_monotonic(ptr %a, i32 %b) nounwind {
; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret
;
-; RV64IA-LABEL: atomicrmw_nand_i32_monotonic:
-; RV64IA: # %bb.0:
-; RV64IA-NEXT: .LBB150_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-NEXT: lr.w a2, (a0)
-; RV64IA-NEXT: and a3, a2, a1
-; RV64IA-NEXT: not a3, a3
-; RV64IA-NEXT: sc.w a3, a3, (a0)
-; RV64IA-NEXT: bnez a3, .LBB150_1
-; RV64IA-NEXT: # %bb.2:
-; RV64IA-NEXT: mv a0, a2
-; RV64IA-NEXT: ret
+; RV64IA-NOZACAS-LABEL: atomicrmw_nand_i32_monotonic:
+; RV64IA-NOZACAS: # %bb.0:
+; RV64IA-NOZACAS-NEXT: .LBB150_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-NOZACAS-NEXT: lr.w a2, (a0)
+; RV64IA-NOZACAS-NEXT: and a3, a2, a1
+; RV64IA-NOZACAS-NEXT: not a3, a3
+; RV64IA-NOZACAS-NEXT: sc.w a3, a3, (a0)
+; RV64IA-NOZACAS-NEXT: bnez a3, .LBB150_1
+; RV64IA-NOZACAS-NEXT: # %bb.2:
+; RV64IA-NOZACAS-NEXT: mv a0, a2
+; RV64IA-NOZACAS-NEXT: ret
+;
+; RV32IA-ZACAS-LABEL: atomicrmw_nand_i32_monotonic:
+; RV32IA-ZACAS: # %bb.0:
+; RV32IA-ZACAS-NEXT: mv a2, a0
+; RV32IA-ZACAS-NEXT: lw a0, 0(a0)
+; RV32IA-ZACAS-NEXT: .LBB150_1: # %atomicrmw.start
+; RV32IA-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV32IA-ZACAS-NEXT: mv a3, a0
+; RV32IA-ZACAS-NEXT: and a4, a0, a1
+; RV32IA-ZACAS-NEXT: not a4, a4
+; RV32IA-ZACAS-NEXT: amocas.w a0, a4, (a2)
+; RV32IA-ZACAS-NEXT: bne a0, a3, .LBB150_1
+; RV32IA-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV32IA-ZACAS-NEXT: ret
+;
+; RV64IA-ZACAS-LABEL: atomicrmw_nand_i32_monotonic:
+; RV64IA-ZACAS: # %bb.0:
+; RV64IA-ZACAS-NEXT: mv a2, a0
+; RV64IA-ZACAS-NEXT: lw a0, 0(a0)
+; RV64IA-ZACAS-NEXT: .LBB150_1: # %atomicrmw.start
+; RV64IA-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV64IA-ZACAS-NEXT: mv a3, a0
+; RV64IA-ZACAS-NEXT: and a4, a0, a1
+; RV64IA-ZACAS-NEXT: not a4, a4
+; RV64IA-ZACAS-NEXT: amocas.w a0, a4, (a2)
+; RV64IA-ZACAS-NEXT: bne a0, a3, .LBB150_1
+; RV64IA-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV64IA-ZACAS-NEXT: ret
%1 = atomicrmw nand ptr %a, i32 %b monotonic
ret i32 %1
}
@@ -14879,29 +14916,29 @@ define i32 @atomicrmw_nand_i32_acquire(ptr %a, i32 %b) nounwind {
; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret
;
-; RV32IA-WMO-LABEL: atomicrmw_nand_i32_acquire:
-; RV32IA-WMO: # %bb.0:
-; RV32IA-WMO-NEXT: .LBB151_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-WMO-NEXT: lr.w.aq a2, (a0)
-; RV32IA-WMO-NEXT: and a3, a2, a1
-; RV32IA-WMO-NEXT: not a3, a3
-; RV32IA-WMO-NEXT: sc.w a3, a3, (a0)
-; RV32IA-WMO-NEXT: bnez a3, .LBB151_1
-; RV32IA-WMO-NEXT: # %bb.2:
-; RV32IA-WMO-NEXT: mv a0, a2
-; RV32IA-WMO-NEXT: ret
-;
-; RV32IA-TSO-LABEL: atomicrmw_nand_i32_acquire:
-; RV32IA-TSO: # %bb.0:
-; RV32IA-TSO-NEXT: .LBB151_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-TSO-NEXT: lr.w a2, (a0)
-; RV32IA-TSO-NEXT: and a3, a2, a1
-; RV32IA-TSO-NEXT: not a3, a3
-; RV32IA-TSO-NEXT: sc.w a3, a3, (a0)
-; RV32IA-TSO-NEXT: bnez a3, .LBB151_1
-; RV32IA-TSO-NEXT: # %bb.2:
-; RV32IA-TSO-NEXT: mv a0, a2
-; RV32IA-TSO-NEXT: ret
+; RV32IA-WMO-NOZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV32IA-WMO-NOZACAS: # %bb.0:
+; RV32IA-WMO-NOZACAS-NEXT: .LBB151_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-WMO-NOZACAS-NEXT: lr.w.aq a2, (a0)
+; RV32IA-WMO-NOZACAS-NEXT: and a3, a2, a1
+; RV32IA-WMO-NOZACAS-NEXT: not a3, a3
+; RV32IA-WMO-NOZACAS-NEXT: sc.w a3, a3, (a0)
+; RV32IA-WMO-NOZACAS-NEXT: bnez a3, .LBB151_1
+; RV32IA-WMO-NOZACAS-NEXT: # %bb.2:
+; RV32IA-WMO-NOZACAS-NEXT: mv a0, a2
+; RV32IA-WMO-NOZACAS-NEXT: ret
+;
+; RV32IA-TSO-NOZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV32IA-TSO-NOZACAS: # %bb.0:
+; RV32IA-TSO-NOZACAS-NEXT: .LBB151_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-TSO-NOZACAS-NEXT: lr.w a2, (a0)
+; RV32IA-TSO-NOZACAS-NEXT: and a3, a2, a1
+; RV32IA-TSO-NOZACAS-NEXT: not a3, a3
+; RV32IA-TSO-NOZACAS-NEXT: sc.w a3, a3, (a0)
+; RV32IA-TSO-NOZACAS-NEXT: bnez a3, .LBB151_1
+; RV32IA-TSO-NOZACAS-NEXT: # %bb.2:
+; RV32IA-TSO-NOZACAS-NEXT: mv a0, a2
+; RV32IA-TSO-NOZACAS-NEXT: ret
;
; RV64I-LABEL: atomicrmw_nand_i32_acquire:
; RV64I: # %bb.0:
@@ -14913,29 +14950,85 @@ define i32 @atomicrmw_nand_i32_acquire(ptr %a, i32 %b) nounwind {
; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret
;
-; RV64IA-WMO-LABEL: atomicrmw_nand_i32_acquire:
-; RV64IA-WMO: # %bb.0:
-; RV64IA-WMO-NEXT: .LBB151_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-WMO-NEXT: lr.w.aq a2, (a0)
-; RV64IA-WMO-NEXT: and a3, a2, a1
-; RV64IA-WMO-NEXT: not a3, a3
-; RV64IA-WMO-NEXT: sc.w a3, a3, (a0)
-; RV64IA-WMO-NEXT: bnez a3, .LBB151_1
-; RV64IA-WMO-NEXT: # %bb.2:
-; RV64IA-WMO-NEXT: mv a0, a2
-; RV64IA-WMO-NEXT: ret
-;
-; RV64IA-TSO-LABEL: atomicrmw_nand_i32_acquire:
-; RV64IA-TSO: # %bb.0:
-; RV64IA-TSO-NEXT: .LBB151_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-TSO-NEXT: lr.w a2, (a0)
-; RV64IA-TSO-NEXT: and a3, a2, a1
-; RV64IA-TSO-NEXT: not a3, a3
-; RV64IA-TSO-NEXT: sc.w a3, a3, (a0)
-; RV64IA-TSO-NEXT: bnez a3, .LBB151_1
-; RV64IA-TSO-NEXT: # %bb.2:
-; RV64IA-TSO-NEXT: mv a0, a2
-; RV64IA-TSO-NEXT: ret
+; RV64IA-WMO-NOZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV64IA-WMO-NOZACAS: # %bb.0:
+; RV64IA-WMO-NOZACAS-NEXT: .LBB151_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-WMO-NOZACAS-NEXT: lr.w.aq a2, (a0)
+; RV64IA-WMO-NOZACAS-NEXT: and a3, a2, a1
+; RV64IA-WMO-NOZACAS-NEXT: not a3, a3
+; RV64IA-WMO-NOZACAS-NEXT: sc.w a3, a3, (a0)
+; RV64IA-WMO-NOZACAS-NEXT: bnez a3, .LBB151_1
+; RV64IA-WMO-NOZACAS-NEXT: # %bb.2:
+; RV64IA-WMO-NOZACAS-NEXT: mv a0, a2
+; RV64IA-WMO-NOZACAS-NEXT: ret
+;
+; RV64IA-TSO-NOZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV64IA-TSO-NOZACAS: # %bb.0:
+; RV64IA-TSO-NOZACAS-NEXT: .LBB151_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-TSO-NOZACAS-NEXT: lr.w a2, (a0)
+; RV64IA-TSO-NOZACAS-NEXT: and a3, a2, a1
+; RV64IA-TSO-NOZACAS-NEXT: not a3, a3
+; RV64IA-TSO-NOZACAS-NEXT: sc.w a3, a3, (a0)
+; RV64IA-TSO-NOZACAS-NEXT: bnez a3, .LBB151_1
+; RV64IA-TSO-NOZACAS-NEXT: # %bb.2:
+; RV64IA-TSO-NOZACAS-NEXT: mv a0, a2
+; RV64IA-TSO-NOZACAS-NEXT: ret
+;
+; RV32IA-WMO-ZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV32IA-WMO-ZACAS: # %bb.0:
+; RV32IA-WMO-ZACAS-NEXT: mv a2, a0
+; RV32IA-WMO-ZACAS-NEXT: lw a0, 0(a0)
+; RV32IA-WMO-ZACAS-NEXT: .LBB151_1: # %atomicrmw.start
+; RV32IA-WMO-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV32IA-WMO-ZACAS-NEXT: mv a3, a0
+; RV32IA-WMO-ZACAS-NEXT: and a4, a0, a1
+; RV32IA-WMO-ZACAS-NEXT: not a4, a4
+; RV32IA-WMO-ZACAS-NEXT: amocas.w.aq a0, a4, (a2)
+; RV32IA-WMO-ZACAS-NEXT: bne a0, a3, .LBB151_1
+; RV32IA-WMO-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV32IA-WMO-ZACAS-NEXT: ret
+;
+; RV32IA-TSO-ZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV32IA-TSO-ZACAS: # %bb.0:
+; RV32IA-TSO-ZACAS-NEXT: mv a2, a0
+; RV32IA-TSO-ZACAS-NEXT: lw a0, 0(a0)
+; RV32IA-TSO-ZACAS-NEXT: .LBB151_1: # %atomicrmw.start
+; RV32IA-TSO-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV32IA-TSO-ZACAS-NEXT: mv a3, a0
+; RV32IA-TSO-ZACAS-NEXT: and a4, a0, a1
+; RV32IA-TSO-ZACAS-NEXT: not a4, a4
+; RV32IA-TSO-ZACAS-NEXT: amocas.w a0, a4, (a2)
+; RV32IA-TSO-ZACAS-NEXT: bne a0, a3, .LBB151_1
+; RV32IA-TSO-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV32IA-TSO-ZACAS-NEXT: ret
+;
+; RV64IA-WMO-ZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV64IA-WMO-ZACAS: # %bb.0:
+; RV64IA-WMO-ZACAS-NEXT: mv a2, a0
+; RV64IA-WMO-ZACAS-NEXT: lw a0, 0(a0)
+; RV64IA-WMO-ZACAS-NEXT: .LBB151_1: # %atomicrmw.start
+; RV64IA-WMO-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV64IA-WMO-ZACAS-NEXT: mv a3, a0
+; RV64IA-WMO-ZACAS-NEXT: and a4, a0, a1
+; RV64IA-WMO-ZACAS-NEXT: not a4, a4
+; RV64IA-WMO-ZACAS-NEXT: amocas.w.aq a0, a4, (a2)
+; RV64IA-WMO-ZACAS-NEXT: bne a0, a3, .LBB151_1
+; RV64IA-WMO-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV64IA-WMO-ZACAS-NEXT: ret
+;
+; RV64IA-TSO-ZACAS-LABEL: atomicrmw_nand_i32_acquire:
+; RV64IA-TSO-ZACAS: # %bb.0:
+; RV64IA-TSO-ZACAS-NEXT: mv a2, a0
+; RV64IA-TSO-ZACAS-NEXT: lw a0, 0(a0)
+; RV64IA-TSO-ZACAS-NEXT: .LBB151_1: # %atomicrmw.start
+; RV64IA-TSO-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV64IA-TSO-ZACAS-NEXT: mv a3, a0
+; RV64IA-TSO-ZACAS-NEXT: and a4, a0, a1
+; RV64IA-TSO-ZACAS-NEXT: not a4, a4
+; RV64IA-TSO-ZACAS-NEXT: amocas.w a0, a4, (a2)
+; RV64IA-TSO-ZACAS-NEXT: bne a0, a3, .LBB151_1
+; RV64IA-TSO-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV64IA-TSO-ZACAS-NEXT: ret
%1 = atomicrmw nand ptr %a, i32 %b acquire
ret i32 %1
}
@@ -14951,29 +15044,29 @@ define i32 @atomicrmw_nand_i32_release(ptr %a, i32 %b) nounwind {
; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret
;
-; RV32IA-WMO-LABEL: atomicrmw_nand_i32_release:
-; RV32IA-WMO: # %bb.0:
-; RV32IA-WMO-NEXT: .LBB152_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-WMO-NEXT: lr.w a2, (a0)
-; RV32IA-WMO-NEXT: and a3, a2, a1
-; RV32IA-WMO-NEXT: not a3, a3
-; RV32IA-WMO-NEXT: sc.w.rl a3, a3, (a0)
-; RV32IA-WMO-NEXT: bnez a3, .LBB152_1
-; RV32IA-WMO-NEXT: # %bb.2:
-; RV32IA-WMO-NEXT: mv a0, a2
-; RV32IA-WMO-NEXT: ret
-;
-; RV32IA-TSO-LABEL: atomicrmw_nand_i32_release:
-; RV32IA-TSO: # %bb.0:
-; RV32IA-TSO-NEXT: .LBB152_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-TSO-NEXT: lr.w a2, (a0)
-; RV32IA-TSO-NEXT: and a3, a2, a1
-; RV32IA-TSO-NEXT: not a3, a3
-; RV32IA-TSO-NEXT: sc.w a3, a3, (a0)
-; RV32IA-TSO-NEXT: bnez a3, .LBB152_1
-; RV32IA-TSO-NEXT: # %bb.2:
-; RV32IA-TSO-NEXT: mv a0, a2
-; RV32IA-TSO-NEXT: ret
+; RV32IA-WMO-NOZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV32IA-WMO-NOZACAS: # %bb.0:
+; RV32IA-WMO-NOZACAS-NEXT: .LBB152_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-WMO-NOZACAS-NEXT: lr.w a2, (a0)
+; RV32IA-WMO-NOZACAS-NEXT: and a3, a2, a1
+; RV32IA-WMO-NOZACAS-NEXT: not a3, a3
+; RV32IA-WMO-NOZACAS-NEXT: sc.w.rl a3, a3, (a0)
+; RV32IA-WMO-NOZACAS-NEXT: bnez a3, .LBB152_1
+; RV32IA-WMO-NOZACAS-NEXT: # %bb.2:
+; RV32IA-WMO-NOZACAS-NEXT: mv a0, a2
+; RV32IA-WMO-NOZACAS-NEXT: ret
+;
+; RV32IA-TSO-NOZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV32IA-TSO-NOZACAS: # %bb.0:
+; RV32IA-TSO-NOZACAS-NEXT: .LBB152_1: # =>This Inner Loop Header: Depth=1
+; RV32IA-TSO-NOZACAS-NEXT: lr.w a2, (a0)
+; RV32IA-TSO-NOZACAS-NEXT: and a3, a2, a1
+; RV32IA-TSO-NOZACAS-NEXT: not a3, a3
+; RV32IA-TSO-NOZACAS-NEXT: sc.w a3, a3, (a0)
+; RV32IA-TSO-NOZACAS-NEXT: bnez a3, .LBB152_1
+; RV32IA-TSO-NOZACAS-NEXT: # %bb.2:
+; RV32IA-TSO-NOZACAS-NEXT: mv a0, a2
+; RV32IA-TSO-NOZACAS-NEXT: ret
;
; RV64I-LABEL: atomicrmw_nand_i32_release:
; RV64I: # %bb.0:
@@ -14985,29 +15078,85 @@ define i32 @atomicrmw_nand_i32_release(ptr %a, i32 %b) nounwind {
; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret
;
-; RV64IA-WMO-LABEL: atomicrmw_nand_i32_release:
-; RV64IA-WMO: # %bb.0:
-; RV64IA-WMO-NEXT: .LBB152_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-WMO-NEXT: lr.w a2, (a0)
-; RV64IA-WMO-NEXT: and a3, a2, a1
-; RV64IA-WMO-NEXT: not a3, a3
-; RV64IA-WMO-NEXT: sc.w.rl a3, a3, (a0)
-; RV64IA-WMO-NEXT: bnez a3, .LBB152_1
-; RV64IA-WMO-NEXT: # %bb.2:
-; RV64IA-WMO-NEXT: mv a0, a2
-; RV64IA-WMO-NEXT: ret
-;
-; RV64IA-TSO-LABEL: atomicrmw_nand_i32_release:
-; RV64IA-TSO: # %bb.0:
-; RV64IA-TSO-NEXT: .LBB152_1: # =>This Inner Loop Header: Depth=1
-; RV64IA-TSO-NEXT: lr.w a2, (a0)
-; RV64IA-TSO-NEXT: and a3, a2, a1
-; RV64IA-TSO-NEXT: not a3, a3
-; RV64IA-TSO-NEXT: sc.w a3, a3, (a0)
-; RV64IA-TSO-NEXT: bnez a3, .LBB152_1
-; RV64IA-TSO-NEXT: # %bb.2:
-; RV64IA-TSO-NEXT: mv a0, a2
-; RV64IA-TSO-NEXT: ret
+; RV64IA-WMO-NOZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV64IA-WMO-NOZACAS: # %bb.0:
+; RV64IA-WMO-NOZACAS-NEXT: .LBB152_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-WMO-NOZACAS-NEXT: lr.w a2, (a0)
+; RV64IA-WMO-NOZACAS-NEXT: and a3, a2, a1
+; RV64IA-WMO-NOZACAS-NEXT: not a3, a3
+; RV64IA-WMO-NOZACAS-NEXT: sc.w.rl a3, a3, (a0)
+; RV64IA-WMO-NOZACAS-NEXT: bnez a3, .LBB152_1
+; RV64IA-WMO-NOZACAS-NEXT: # %bb.2:
+; RV64IA-WMO-NOZACAS-NEXT: mv a0, a2
+; RV64IA-WMO-NOZACAS-NEXT: ret
+;
+; RV64IA-TSO-NOZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV64IA-TSO-NOZACAS: # %bb.0:
+; RV64IA-TSO-NOZACAS-NEXT: .LBB152_1: # =>This Inner Loop Header: Depth=1
+; RV64IA-TSO-NOZACAS-NEXT: lr.w a2, (a0)
+; RV64IA-TSO-NOZACAS-NEXT: and a3, a2, a1
+; RV64IA-TSO-NOZACAS-NEXT: not a3, a3
+; RV64IA-TSO-NOZACAS-NEXT: sc.w a3, a3, (a0)
+; RV64IA-TSO-NOZACAS-NEXT: bnez a3, .LBB152_1
+; RV64IA-TSO-NOZACAS-NEXT: # %bb.2:
+; RV64IA-TSO-NOZACAS-NEXT: mv a0, a2
+; RV64IA-TSO-NOZACAS-NEXT: ret
+;
+; RV32IA-WMO-ZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV32IA-WMO-ZACAS: # %bb.0:
+; RV32IA-WMO-ZACAS-NEXT: mv a2, a0
+; RV32IA-WMO-ZACAS-NEXT: lw a0, 0(a0)
+; RV32IA-WMO-ZACAS-NEXT: .LBB152_1: # %atomicrmw.start
+; RV32IA-WMO-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV32IA-WMO-ZACAS-NEXT: mv a3, a0
+; RV32IA-WMO-ZACAS-NEXT: and a4, a0, a1
+; RV32IA-WMO-ZACAS-NEXT: not a4, a4
+; RV32IA-WMO-ZACAS-NEXT: amocas.w.rl a0, a4, (a2)
+; RV32IA-WMO-ZACAS-NEXT: bne a0, a3, .LBB152_1
+; RV32IA-WMO-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV32IA-WMO-ZACAS-NEXT: ret
+;
+; RV32IA-TSO-ZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV32IA-TSO-ZACAS: # %bb.0:
+; RV32IA-TSO-ZACAS-NEXT: mv a2, a0
+; RV32IA-TSO-ZACAS-NEXT: lw a0, 0(a0)
+; RV32IA-TSO-ZACAS-NEXT: .LBB152_1: # %atomicrmw.start
+; RV32IA-TSO-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV32IA-TSO-ZACAS-NEXT: mv a3, a0
+; RV32IA-TSO-ZACAS-NEXT: and a4, a0, a1
+; RV32IA-TSO-ZACAS-NEXT: not a4, a4
+; RV32IA-TSO-ZACAS-NEXT: amocas.w a0, a4, (a2)
+; RV32IA-TSO-ZACAS-NEXT: bne a0, a3, .LBB152_1
+; RV32IA-TSO-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV32IA-TSO-ZACAS-NEXT: ret
+;
+; RV64IA-WMO-ZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV64IA-WMO-ZACAS: # %bb.0:
+; RV64IA-WMO-ZACAS-NEXT: mv a2, a0
+; RV64IA-WMO-ZACAS-NEXT: lw a0, 0(a0)
+; RV64IA-WMO-ZACAS-NEXT: .LBB152_1: # %atomicrmw.start
+; RV64IA-WMO-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV64IA-WMO-ZACAS-NEXT: mv a3, a0
+; RV64IA-WMO-ZACAS-NEXT: and a4, a0, a1
+; RV64IA-WMO-ZACAS-NEXT: not a4, a4
+; RV64IA-WMO-ZACAS-NEXT: amocas.w.rl a0, a4, (a2)
+; RV64IA-WMO-ZACAS-NEXT: bne a0, a3, .LBB152_1
+; RV64IA-WMO-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV64IA-WMO-ZACAS-NEXT: ret
+;
+; RV64IA-TSO-ZACAS-LABEL: atomicrmw_nand_i32_release:
+; RV64IA-TSO-ZACAS: # %bb.0:
+; RV64IA-TSO-ZACAS-NEXT: mv a2, a0
+; RV64IA-TSO-ZACAS-NEXT: lw a0, 0(a0)
+; RV64IA-TSO-ZACAS-NEXT: .LBB152_1: # %atomicrmw.start
+; RV64IA-TSO-ZACAS-NEXT: # =>This Inner Loop Header: Depth=1
+; RV64IA-TSO-ZACAS-NEXT: mv a3, a0
+; RV64IA-TSO-ZACAS-NEXT: and a4, a0, a1
+; RV64IA-TSO-ZACAS-NEXT: not a4, a4
+; RV64IA-TSO-ZACAS-NEXT: amocas.w a0, a4, (a2)
+; RV64IA-TSO-ZACAS-NEXT: bne a0, a3, .LBB152_1
+; RV64IA-TSO-ZACAS-NEXT: # %bb.2: # %atomicrmw.end
+; RV64IA-TSO-ZACAS-NEXT: ret
%1 = atomicrmw nand ptr %a, i32 %b release
ret i32 %1
}
@@ -15023,29 +15172,29 @@ define i32 @atomicrmw_nand_i32_acq_rel(ptr %a, i32 %b) nounwind {
; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret
;
-; RV32IA-WMO-LABEL: atomicrmw_nand_i32_acq_rel:
-; RV32IA-WMO: # %bb.0:
-; RV32IA-WMO-NEXT: .LBB153_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-WMO-NEXT: lr.w.aq a2, (a0)
-; RV32IA-WMO-NEXT: and a3, a2, a1
-; RV32IA-WMO-NEXT: not a3, a3
-; RV32IA-WMO-NEXT: sc.w.rl a3, a3, (a0)
-; RV32IA-WMO-NEXT: bnez a3, .LBB153_1
-; RV32IA-WMO-NEXT: # %bb.2:
-; RV32IA-WMO-NEXT: mv a0, a2
-; RV32IA-WMO-NEXT: ret
-;
-; RV32IA-TSO-LABEL: atomicrmw_nand_i32_acq_rel:
-; RV32IA-TSO: # %bb.0:
-; RV32IA-TSO-NEXT: .LBB153_1: # =>This Inner Loop Header: Depth=1
-; RV32IA-TSO-NEXT: lr.w a2, (a0)
-; RV32IA-TSO-NEXT: and a3, a2, a1
-; RV32IA-TSO-NEXT: not a3, a3
-; RV32IA-TSO-NEXT: sc.w a3, a3, (a0)
-; RV32IA-T...
[truncated]
|
@@ -19517,6 +19517,11 @@ RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const { | |||
unsigned Size = AI->getType()->getPrimitiveSizeInBits(); | |||
if (Size == 8 || Size == 16) | |||
return AtomicExpansionKind::MaskedIntrinsic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to use amocas for i8/i16 if both zacas and zabha are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using amocas in preference to lr/sc when possible seems very sensible to me. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me as well.
We don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead. According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC.
We don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead.
According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC.
Open to opinions on whether this is the right thing to do.