[RISCV] Support isel for Zacas for 2*XLen types. #77814

topperc · 2024-01-11T18:37:51Z

This is an alternative to #67918 now that the MC layer amocas.d/amocas.q instructions use GPRPair register class.

While working on this I noticed that the change of setMaxAtomicSizeInBitsSupported caused atomic load/store/rmw for 2*XLen types to be affected. The AtomicExpandPass will not convert them to libcalls immediately. Instead they would get passed through to SelectionDAG and then get converted to different libcalls during type legalization.

I didn't see any way to signal AtomicExpandPass to convert them to the same libcalls using any of the shouldExpandAtomic* hooks. So I've forced them to use to CmpXChg expansion. I've disabled the insertion of fences for atomic load/store when we use CmpXChg.

I've very unsure if this the right thing to do or if we should make changes to AtomicExpand to get back the original libcalls.

This is an alternative to llvm#6718 now that the MC layer amocas.d/amocas.q instructions use GPRPair register class. While working on this I noticed that the change of setMaxAtomicSizeInBitsSupported caused atomic load/store/rmw for 2*XLen types to be affected. The AtomicExpandPass will not convert them to libcalls immediately. Instead they get converted to different library calls by type legalization. I didn't see any way to signal AtomicExpandPass to convert them to the same libcalls using any of the shouldExpandAtomic* hooks. So I've forced them to use to CmpXChg expansion. I've disabled the insertion of fences for atomic load/store when we use CmpXChg. I've very unsure if this the right thing to do or if we should make changes to AtomicExpand to get back the libcalls.

llvmbot · 2024-01-11T18:38:18Z

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

This is an alternative to #6718 now that the MC layer amocas.d/amocas.q instructions use GPRPair register class.

While working on this I noticed that the change of setMaxAtomicSizeInBitsSupported caused atomic load/store/rmw for 2*XLen types to be affected. The AtomicExpandPass will not convert them to libcalls immediately. Instead they would get passed through to SelectionDAG and then get converted to different libcalls during type legalization.

I didn't see any way to signal AtomicExpandPass to convert them to the same libcalls using any of the shouldExpandAtomic* hooks. So I've forced them to use to CmpXChg expansion. I've disabled the insertion of fences for atomic load/store when we use CmpXChg.

I've very unsure if this the right thing to do or if we should make changes to AtomicExpand to get back the original libcalls.

Patch is 958.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77814.diff

6 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+124-2)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+5-3)
(modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (+3-2)
(modified) llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll (+1857-159)
(modified) llvm/test/CodeGen/RISCV/atomic-load-store.ll (+1310-76)
(modified) llvm/test/CodeGen/RISCV/atomic-rmw.ll (+18245-1379)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index cb9ffabc41236e..8ba4d65eacb760 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -629,7 +629,10 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
   }
 
   if (Subtarget.hasStdExtA()) {
-    setMaxAtomicSizeInBitsSupported(Subtarget.getXLen());
+    unsigned MaxAtomicSize = Subtarget.getXLen();
+    if (Subtarget.hasStdExtZacas())
+      MaxAtomicSize = 2 * Subtarget.getXLen();
+    setMaxAtomicSizeInBitsSupported(MaxAtomicSize);
     setMinCmpXchgSizeInBits(32);
   } else if (Subtarget.hasForcedAtomics()) {
     setMaxAtomicSizeInBitsSupported(Subtarget.getXLen());
@@ -1338,6 +1341,9 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::ATOMIC_LOAD_SUB, XLenVT, Expand);
     if (RV64LegalI32 && Subtarget.is64Bit())
       setOperationAction(ISD::ATOMIC_LOAD_SUB, MVT::i32, Expand);
+    if (Subtarget.hasStdExtZacas())
+      setOperationAction(ISD::ATOMIC_CMP_SWAP,
+                         Subtarget.is64Bit() ? MVT::i128 : MVT::i64, Custom);
   }
 
   if (Subtarget.hasForcedAtomics()) {
@@ -11237,6 +11243,76 @@ static SDValue customLegalizeToWOpWithSExt(SDNode *N, SelectionDAG &DAG) {
   return DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, NewRes);
 }
 
+// Create an even/odd pair of X registers holding integer value V.
+static SDValue createGPRPairNode(SelectionDAG &DAG, SDValue V, MVT VT,
+                                 MVT SubRegVT) {
+  SDLoc DL(V.getNode());
+  auto [VLo, VHi] = DAG.SplitScalar(V, DL, SubRegVT, SubRegVT);
+  SDValue RegClass =
+      DAG.getTargetConstant(RISCV::GPRPairRegClassID, DL, MVT::i32);
+  SDValue SubReg0 = DAG.getTargetConstant(RISCV::sub_gpr_even, DL, MVT::i32);
+  SDValue SubReg1 = DAG.getTargetConstant(RISCV::sub_gpr_odd, DL, MVT::i32);
+  const SDValue Ops[] = {RegClass, VLo, SubReg0, VHi, SubReg1};
+  return SDValue(
+      DAG.getMachineNode(TargetOpcode::REG_SEQUENCE, DL, MVT::Untyped, Ops), 0);
+}
+
+static void ReplaceCMP_SWAP_2XLenResults(SDNode *N,
+                                         SmallVectorImpl<SDValue> &Results,
+                                         SelectionDAG &DAG,
+                                         const RISCVSubtarget &Subtarget) {
+  MVT VT = N->getSimpleValueType(0);
+  assert(N->getValueType(0) == (Subtarget.is64Bit() ? MVT::i128 : MVT::i64) &&
+         "AtomicCmpSwap on types less than 2*XLen should be legal");
+  assert(Subtarget.hasStdExtZacas());
+  MVT XLenVT = Subtarget.getXLenVT();
+
+  SDValue Ops[] = {
+      createGPRPairNode(DAG, N->getOperand(2), VT, XLenVT), // Compare value
+      N->getOperand(1),                                     // Ptr
+      createGPRPairNode(DAG, N->getOperand(3), VT, XLenVT), // Store value
+      N->getOperand(0),                                     // Chain in
+  };
+
+  MachineMemOperand *MemOp = cast<MemSDNode>(N)->getMemOperand();
+
+  bool Is64Bit = Subtarget.is64Bit();
+  unsigned Opcode;
+  if (Subtarget.hasStdExtZtso()) {
+    Opcode = Subtarget.is64Bit() ? RISCV::AMOCAS_Q : RISCV::AMOCAS_D_RV32;
+  } else {
+    switch (MemOp->getMergedOrdering()) {
+    default:
+      llvm_unreachable("Unexpected ordering!");
+    case AtomicOrdering::Monotonic:
+      Opcode = Is64Bit ? RISCV::AMOCAS_Q : RISCV::AMOCAS_D_RV32;
+      break;
+    case AtomicOrdering::Acquire:
+      Opcode = Is64Bit ? RISCV::AMOCAS_Q_AQ : RISCV::AMOCAS_D_RV32_AQ;
+      break;
+    case AtomicOrdering::Release:
+      Opcode = Is64Bit ? RISCV::AMOCAS_Q_RL : RISCV::AMOCAS_D_RV32_RL;
+      break;
+    case AtomicOrdering::AcquireRelease:
+    case AtomicOrdering::SequentiallyConsistent:
+      Opcode = Is64Bit ? RISCV::AMOCAS_Q_AQ_RL : RISCV::AMOCAS_D_RV32_AQ_RL;
+      break;
+    }
+  }
+
+  SDLoc DL(N);
+  MachineSDNode *CmpSwap = DAG.getMachineNode(
+      Opcode, DL, DAG.getVTList(MVT::Untyped, MVT::Other), Ops);
+  DAG.setNodeMemRefs(CmpSwap, {MemOp});
+
+  SDValue Lo = DAG.getTargetExtractSubreg(RISCV::sub_gpr_even, DL, XLenVT,
+                                          SDValue(CmpSwap, 0));
+  SDValue Hi = DAG.getTargetExtractSubreg(RISCV::sub_gpr_odd, DL, XLenVT,
+                                          SDValue(CmpSwap, 0));
+  Results.push_back(DAG.getNode(ISD::BUILD_PAIR, DL, VT, Lo, Hi));
+  Results.push_back(SDValue(CmpSwap, 1));
+}
+
 void RISCVTargetLowering::ReplaceNodeResults(SDNode *N,
                                              SmallVectorImpl<SDValue> &Results,
                                              SelectionDAG &DAG) const {
@@ -11244,6 +11320,9 @@ void RISCVTargetLowering::ReplaceNodeResults(SDNode *N,
   switch (N->getOpcode()) {
   default:
     llvm_unreachable("Don't know how to custom type legalize this operation!");
+  case ISD::ATOMIC_CMP_SWAP:
+    ReplaceCMP_SWAP_2XLenResults(N, Results, DAG, Subtarget);
+    break;
   case ISD::STRICT_FP_TO_SINT:
   case ISD::STRICT_FP_TO_UINT:
   case ISD::FP_TO_SINT:
@@ -19003,6 +19082,20 @@ void RISCVTargetLowering::LowerAsmOperandForConstraint(
   TargetLowering::LowerAsmOperandForConstraint(Op, Constraint, Ops, DAG);
 }
 
+bool RISCVTargetLowering::shouldInsertFencesForAtomic(
+    const Instruction *I) const {
+  // We don't need a fence for 2*Xlen. We can use Zacas.
+  if (auto *LI = dyn_cast<LoadInst>(I))
+    return LI->getType()->getPrimitiveSizeInBits() != 2 * Subtarget.getXLen();
+
+  // We don't need a fence for 2*Xlen. We can use Zacas.
+  if (auto *SI = dyn_cast<StoreInst>(I))
+    return SI->getValueOperand()->getType()->getPrimitiveSizeInBits() !=
+           2 * Subtarget.getXLen();
+
+  return false;
+}
+
 Instruction *RISCVTargetLowering::emitLeadingFence(IRBuilderBase &Builder,
                                                    Instruction *Inst,
                                                    AtomicOrdering Ord) const {
@@ -19036,6 +19129,30 @@ Instruction *RISCVTargetLowering::emitTrailingFence(IRBuilderBase &Builder,
   return nullptr;
 }
 
+TargetLowering::AtomicExpansionKind
+RISCVTargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
+  unsigned Size = LI->getType()->getPrimitiveSizeInBits();
+
+  if (Size != 2 * Subtarget.getXLen())
+    return AtomicExpansionKind::None;
+
+  // With Zacas we can use amocas for 2*XLen types.
+  assert(Subtarget.hasStdExtZacas() && "Unexpected extension");
+  return AtomicExpansionKind::CmpXChg;
+}
+
+TargetLowering::AtomicExpansionKind
+RISCVTargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
+  unsigned Size = SI->getValueOperand()->getType()->getPrimitiveSizeInBits();
+
+  if (Size != 2 * Subtarget.getXLen())
+    return AtomicExpansionKind::None;
+
+  // With Zacas we can use amocas for 2*XLen types.
+  assert(Subtarget.hasStdExtZacas() && "Unexpected extension");
+  return AtomicExpansionKind::Expand;
+}
+
 TargetLowering::AtomicExpansionKind
 RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
   // atomicrmw {fadd,fsub} must be expanded to use compare-exchange, as floating
@@ -19053,7 +19170,12 @@ RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
   unsigned Size = AI->getType()->getPrimitiveSizeInBits();
   if (Size == 8 || Size == 16)
     return AtomicExpansionKind::MaskedIntrinsic;
-  return AtomicExpansionKind::None;
+  if (Size != 2 * Subtarget.getXLen())
+    return AtomicExpansionKind::None;
+
+  // With Zacas we can use amocas for 2*XLen types.
+  assert(Subtarget.hasStdExtZacas() && "Unexpected extension");
+  return AtomicExpansionKind::CmpXChg;
 }
 
 static Intrinsic::ID
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index c65953e37b1710..0042c8c5f4e941 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -618,9 +618,7 @@ class RISCVTargetLowering : public TargetLowering {
 
   bool preferZeroCompareBranch() const override { return true; }
 
-  bool shouldInsertFencesForAtomic(const Instruction *I) const override {
-    return isa<LoadInst>(I) || isa<StoreInst>(I);
-  }
+  bool shouldInsertFencesForAtomic(const Instruction *I) const override;
   Instruction *emitLeadingFence(IRBuilderBase &Builder, Instruction *Inst,
                                 AtomicOrdering Ord) const override;
   Instruction *emitTrailingFence(IRBuilderBase &Builder, Instruction *Inst,
@@ -699,6 +697,10 @@ class RISCVTargetLowering : public TargetLowering {
   bool isMulAddWithConstProfitable(SDValue AddNode,
                                    SDValue ConstNode) const override;
 
+  TargetLoweringBase::AtomicExpansionKind
+  shouldExpandAtomicLoadInIR(LoadInst *LI) const override;
+  TargetLoweringBase::AtomicExpansionKind
+  shouldExpandAtomicStoreInIR(StoreInst *SI) const override;
   TargetLowering::AtomicExpansionKind
   shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const override;
   Value *emitMaskedAtomicRMWIntrinsic(IRBuilderBase &Builder, AtomicRMWInst *AI,
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index 24f8d600f1eafc..de28af67050b21 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -754,8 +754,9 @@ bool RISCVRegisterInfo::getRegAllocationHints(
                         bool NeedGPRC) -> void {
     Register Reg = MO.getReg();
     Register PhysReg = Reg.isPhysical() ? Reg : Register(VRM->getPhys(Reg));
-    if (PhysReg && (!NeedGPRC || RISCV::GPRCRegClass.contains(PhysReg))) {
-      assert(!MO.getSubReg() && !VRRegMO.getSubReg() && "Unexpected subreg!");
+    // TODO: Add hints when there are GPRPair subregs?
+    if (PhysReg && (!NeedGPRC || RISCV::GPRCRegClass.contains(PhysReg)) &&
+        !MO.getSubReg() && !VRRegMO.getSubReg()) {
       if (!MRI->isReserved(PhysReg) && !is_contained(Hints, PhysReg))
         TwoAddrHints.insert(PhysReg);
     }
diff --git a/llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll b/llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll
index 5b3e5789e8d910..46e249d2327e49 100644
--- a/llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll
+++ b/llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll
@@ -4219,21 +4219,46 @@ define void @cmpxchg_i64_monotonic_monotonic(ptr %ptr, i64 %cmp, i64 %val) nounw
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
 ;
-; RV32IA-LABEL: cmpxchg_i64_monotonic_monotonic:
-; RV32IA:       # %bb.0:
-; RV32IA-NEXT:    addi sp, sp, -16
-; RV32IA-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IA-NEXT:    sw a2, 4(sp)
-; RV32IA-NEXT:    sw a1, 0(sp)
-; RV32IA-NEXT:    mv a1, sp
-; RV32IA-NEXT:    mv a2, a3
-; RV32IA-NEXT:    mv a3, a4
-; RV32IA-NEXT:    li a4, 0
-; RV32IA-NEXT:    li a5, 0
-; RV32IA-NEXT:    call __atomic_compare_exchange_8
-; RV32IA-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IA-NEXT:    addi sp, sp, 16
-; RV32IA-NEXT:    ret
+; RV32IA-WMO-LABEL: cmpxchg_i64_monotonic_monotonic:
+; RV32IA-WMO:       # %bb.0:
+; RV32IA-WMO-NEXT:    addi sp, sp, -16
+; RV32IA-WMO-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IA-WMO-NEXT:    sw a2, 4(sp)
+; RV32IA-WMO-NEXT:    sw a1, 0(sp)
+; RV32IA-WMO-NEXT:    mv a1, sp
+; RV32IA-WMO-NEXT:    mv a2, a3
+; RV32IA-WMO-NEXT:    mv a3, a4
+; RV32IA-WMO-NEXT:    li a4, 0
+; RV32IA-WMO-NEXT:    li a5, 0
+; RV32IA-WMO-NEXT:    call __atomic_compare_exchange_8
+; RV32IA-WMO-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IA-WMO-NEXT:    addi sp, sp, 16
+; RV32IA-WMO-NEXT:    ret
+;
+; RV32IA-ZACAS-LABEL: cmpxchg_i64_monotonic_monotonic:
+; RV32IA-ZACAS:       # %bb.0:
+; RV32IA-ZACAS-NEXT:    mv a5, a4
+; RV32IA-ZACAS-NEXT:    mv a7, a2
+; RV32IA-ZACAS-NEXT:    mv a4, a3
+; RV32IA-ZACAS-NEXT:    mv a6, a1
+; RV32IA-ZACAS-NEXT:    amocas.d a6, a4, (a0)
+; RV32IA-ZACAS-NEXT:    ret
+;
+; RV32IA-TSO-LABEL: cmpxchg_i64_monotonic_monotonic:
+; RV32IA-TSO:       # %bb.0:
+; RV32IA-TSO-NEXT:    addi sp, sp, -16
+; RV32IA-TSO-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IA-TSO-NEXT:    sw a2, 4(sp)
+; RV32IA-TSO-NEXT:    sw a1, 0(sp)
+; RV32IA-TSO-NEXT:    mv a1, sp
+; RV32IA-TSO-NEXT:    mv a2, a3
+; RV32IA-TSO-NEXT:    mv a3, a4
+; RV32IA-TSO-NEXT:    li a4, 0
+; RV32IA-TSO-NEXT:    li a5, 0
+; RV32IA-TSO-NEXT:    call __atomic_compare_exchange_8
+; RV32IA-TSO-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IA-TSO-NEXT:    addi sp, sp, 16
+; RV32IA-TSO-NEXT:    ret
 ;
 ; RV64I-LABEL: cmpxchg_i64_monotonic_monotonic:
 ; RV64I:       # %bb.0:
@@ -4296,22 +4321,57 @@ define void @cmpxchg_i64_acquire_monotonic(ptr %ptr, i64 %cmp, i64 %val) nounwin
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
 ;
-; RV32IA-LABEL: cmpxchg_i64_acquire_monotonic:
-; RV32IA:       # %bb.0:
-; RV32IA-NEXT:    addi sp, sp, -16
-; RV32IA-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IA-NEXT:    mv a5, a4
-; RV32IA-NEXT:    sw a2, 4(sp)
-; RV32IA-NEXT:    sw a1, 0(sp)
-; RV32IA-NEXT:    mv a1, sp
-; RV32IA-NEXT:    li a4, 2
-; RV32IA-NEXT:    mv a2, a3
-; RV32IA-NEXT:    mv a3, a5
-; RV32IA-NEXT:    li a5, 0
-; RV32IA-NEXT:    call __atomic_compare_exchange_8
-; RV32IA-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IA-NEXT:    addi sp, sp, 16
-; RV32IA-NEXT:    ret
+; RV32IA-WMO-LABEL: cmpxchg_i64_acquire_monotonic:
+; RV32IA-WMO:       # %bb.0:
+; RV32IA-WMO-NEXT:    addi sp, sp, -16
+; RV32IA-WMO-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IA-WMO-NEXT:    mv a5, a4
+; RV32IA-WMO-NEXT:    sw a2, 4(sp)
+; RV32IA-WMO-NEXT:    sw a1, 0(sp)
+; RV32IA-WMO-NEXT:    mv a1, sp
+; RV32IA-WMO-NEXT:    li a4, 2
+; RV32IA-WMO-NEXT:    mv a2, a3
+; RV32IA-WMO-NEXT:    mv a3, a5
+; RV32IA-WMO-NEXT:    li a5, 0
+; RV32IA-WMO-NEXT:    call __atomic_compare_exchange_8
+; RV32IA-WMO-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IA-WMO-NEXT:    addi sp, sp, 16
+; RV32IA-WMO-NEXT:    ret
+;
+; RV32IA-WMO-ZACAS-LABEL: cmpxchg_i64_acquire_monotonic:
+; RV32IA-WMO-ZACAS:       # %bb.0:
+; RV32IA-WMO-ZACAS-NEXT:    mv a5, a4
+; RV32IA-WMO-ZACAS-NEXT:    mv a7, a2
+; RV32IA-WMO-ZACAS-NEXT:    mv a4, a3
+; RV32IA-WMO-ZACAS-NEXT:    mv a6, a1
+; RV32IA-WMO-ZACAS-NEXT:    amocas.d.aq a6, a4, (a0)
+; RV32IA-WMO-ZACAS-NEXT:    ret
+;
+; RV32IA-TSO-LABEL: cmpxchg_i64_acquire_monotonic:
+; RV32IA-TSO:       # %bb.0:
+; RV32IA-TSO-NEXT:    addi sp, sp, -16
+; RV32IA-TSO-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IA-TSO-NEXT:    mv a5, a4
+; RV32IA-TSO-NEXT:    sw a2, 4(sp)
+; RV32IA-TSO-NEXT:    sw a1, 0(sp)
+; RV32IA-TSO-NEXT:    mv a1, sp
+; RV32IA-TSO-NEXT:    li a4, 2
+; RV32IA-TSO-NEXT:    mv a2, a3
+; RV32IA-TSO-NEXT:    mv a3, a5
+; RV32IA-TSO-NEXT:    li a5, 0
+; RV32IA-TSO-NEXT:    call __atomic_compare_exchange_8
+; RV32IA-TSO-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IA-TSO-NEXT:    addi sp, sp, 16
+; RV32IA-TSO-NEXT:    ret
+;
+; RV32IA-TSO-ZACAS-LABEL: cmpxchg_i64_acquire_monotonic:
+; RV32IA-TSO-ZACAS:       # %bb.0:
+; RV32IA-TSO-ZACAS-NEXT:    mv a5, a4
+; RV32IA-TSO-ZACAS-NEXT:    mv a7, a2
+; RV32IA-TSO-ZACAS-NEXT:    mv a4, a3
+; RV32IA-TSO-ZACAS-NEXT:    mv a6, a1
+; RV32IA-TSO-ZACAS-NEXT:    amocas.d a6, a4, (a0)
+; RV32IA-TSO-ZACAS-NEXT:    ret
 ;
 ; RV64I-LABEL: cmpxchg_i64_acquire_monotonic:
 ; RV64I:       # %bb.0:
@@ -4379,22 +4439,57 @@ define void @cmpxchg_i64_acquire_acquire(ptr %ptr, i64 %cmp, i64 %val) nounwind
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
 ;
-; RV32IA-LABEL: cmpxchg_i64_acquire_acquire:
-; RV32IA:       # %bb.0:
-; RV32IA-NEXT:    addi sp, sp, -16
-; RV32IA-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IA-NEXT:    mv a6, a4
-; RV32IA-NEXT:    sw a2, 4(sp)
-; RV32IA-NEXT:    sw a1, 0(sp)
-; RV32IA-NEXT:    mv a1, sp
-; RV32IA-NEXT:    li a4, 2
-; RV32IA-NEXT:    li a5, 2
-; RV32IA-NEXT:    mv a2, a3
-; RV32IA-NEXT:    mv a3, a6
-; RV32IA-NEXT:    call __atomic_compare_exchange_8
-; RV32IA-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IA-NEXT:    addi sp, sp, 16
-; RV32IA-NEXT:    ret
+; RV32IA-WMO-LABEL: cmpxchg_i64_acquire_acquire:
+; RV32IA-WMO:       # %bb.0:
+; RV32IA-WMO-NEXT:    addi sp, sp, -16
+; RV32IA-WMO-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IA-WMO-NEXT:    mv a6, a4
+; RV32IA-WMO-NEXT:    sw a2, 4(sp)
+; RV32IA-WMO-NEXT:    sw a1, 0(sp)
+; RV32IA-WMO-NEXT:    mv a1, sp
+; RV32IA-WMO-NEXT:    li a4, 2
+; RV32IA-WMO-NEXT:    li a5, 2
+; RV32IA-WMO-NEXT:    mv a2, a3
+; RV32IA-WMO-NEXT:    mv a3, a6
+; RV32IA-WMO-NEXT:    call __atomic_compare_exchange_8
+; RV32IA-WMO-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IA-WMO-NEXT:    addi sp, sp, 16
+; RV32IA-WMO-NEXT:    ret
+;
+; RV32IA-WMO-ZACAS-LABEL: cmpxchg_i64_acquire_acquire:
+; RV32IA-WMO-ZACAS:       # %bb.0:
+; RV32IA-WMO-ZACAS-NEXT:    mv a5, a4
+; RV32IA-WMO-ZACAS-NEXT:    mv a7, a2
+; RV32IA-WMO-ZACAS-NEXT:    mv a4, a3
+; RV32IA-WMO-ZACAS-NEXT:    mv a6, a1
+; RV32IA-WMO-ZACAS-NEXT:    amocas.d.aq a6, a4, (a0)
+; RV32IA-WMO-ZACAS-NEXT:    ret
+;
+; RV32IA-TSO-LABEL: cmpxchg_i64_acquire_acquire:
+; RV32IA-TSO:       # %bb.0:
+; RV32IA-TSO-NEXT:    addi sp, sp, -16
+; RV32IA-TSO-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IA-TSO-NEXT:    mv a6, a4
+; RV32IA-TSO-NEXT:    sw a2, 4(sp)
+; RV32IA-TSO-NEXT:    sw a1, 0(sp)
+; RV32IA-TSO-NEXT:    mv a1, sp
+; RV32IA-TSO-NEXT:    li a4, 2
+; RV32IA-TSO-NEXT:    li a5, 2
+; RV32IA-TSO-NEXT:    mv a2, a3
+; RV32IA-TSO-NEXT:    mv a3, a6
+; RV32IA-TSO-NEXT:    call __atomic_compare_exchange_8
+; RV32IA-TSO-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IA-TSO-NEXT:    addi sp, sp, 16
+; RV32IA-TSO-NEXT:    ret
+;
+; RV32IA-TSO-ZACAS-LABEL: cmpxchg_i64_acquire_acquire:
+; RV32IA-TSO-ZACAS:       # %bb.0:
+; RV32IA-TSO-ZACAS-NEXT:    mv a5, a4
+; RV32IA-TSO-ZACAS-NEXT:    mv a7, a2
+; RV32IA-TSO-ZACAS-NEXT:    mv a4, a3
+; RV32IA-TSO-ZACAS-NEXT:    mv a6, a1
+; RV32IA-TSO-ZACAS-NEXT:    amocas.d a6, a4, (a0)
+; RV32IA-TSO-ZACAS-NEXT:    ret
 ;
 ; RV64I-LABEL: cmpxchg_i64_acquire_acquire:
 ; RV64I:       # %bb.0:
@@ -4462,22 +4557,57 @@ define void @cmpxchg_i64_release_monotonic(ptr %ptr, i64 %cmp, i64 %val) nounwin
 ; RV32I-NEXT:    addi sp, sp, 16
 ; RV32I-NEXT:    ret
 ;
-; RV32IA-LABEL: cmpxchg_i64_release_monotonic:
-; RV32IA:       # %bb.0:
-; RV32IA-NEXT:    addi sp, sp, -16
-; RV32IA-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IA-NEXT:    mv a5, a4
-; RV32IA-NEXT:    sw a2, 4(sp)
-; RV32IA-NEXT:    sw a1, 0(sp)
-; RV32IA-NEXT:    mv a1, sp
-; RV32IA-NEXT:    li a4, 3
-; RV32IA-NEXT:    mv a2, a3
-; RV32IA-NEXT:    mv a3, a5
-; RV32IA-NEXT:    li a5, 0
-; RV32IA-NEXT:    call __atomic_compare_exchange_8
-; RV32IA-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IA-NEXT:    addi sp, sp, 16
-; RV32IA-NEXT:    ret
+; RV32IA-WMO-LABEL: cmpxchg_i64_release_monotonic:
+; RV32IA-WMO:       # %bb.0:
+; RV32IA-WMO-NEXT:    addi sp, sp, -16
+; RV32IA-WMO-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IA-WMO-NEXT:    mv a5, a4
+; RV32IA-WMO-NEXT:    sw a2, 4(sp)
+; RV32IA-WMO-NEXT:    sw a1, 0(sp)
+; RV32IA-WMO-NEXT:    mv a1, sp
+; RV32IA-WMO-NEXT:    li a4, 3
+; RV32IA-WMO-NEXT:    mv a2, a3
+; RV32IA-WMO-NEXT:    mv a3, a5
+; RV32IA-WMO-NEXT:    li a5, 0
+; RV32IA-WMO-NEXT:    call __atomic_compare_exchange_8
+; RV32IA-WMO-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IA-WMO-NEXT:    addi sp, sp, 16
+; RV32IA-WMO-NEXT:    ret
+;
+; RV32IA-WMO-ZACAS-LABEL: cmpxchg_i64_release_monotonic:
+; RV32IA-WMO-ZACAS:       # %bb.0:
+; RV32IA-WMO-ZACAS-NEXT:    mv a5, a4
+; RV32IA-WMO-ZACAS-NEXT:    mv a7, a2
+; RV32IA-WMO-ZACAS-NEXT:    mv a4, a3
+; RV32IA-WMO-ZACAS-NEXT:    mv a6, a1
+; RV32IA-WMO-ZACAS-NEXT:    amocas.d.rl a6, a4, (a0)
+; RV32IA-WMO-ZACAS-NEXT:    ret
+;
+; RV32IA-TSO-LABEL: cmpxchg_i64_release_monotonic:
+; RV32IA-TSO:       # %bb.0:
+; RV32IA-TSO-NEXT:    addi sp, sp, -16
+; RV32IA-TSO-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32IA-TSO-NEXT:    mv a5, a4
+; RV32IA-TSO-NEXT:    sw a2, 4(sp)
+; RV32IA-TSO-NEXT:    sw a1, 0(sp)
+; RV32IA-TSO-NEXT:    mv a1, sp
+; RV32IA-TSO-NEXT:    li a4, 3
+; RV32IA-TSO-NEXT:    mv a2, a3
+; RV32IA-TSO-NEXT:    mv a3, a5
+; RV32IA-TSO-NEXT:    li a5, 0
+; RV32IA-TSO-NEXT:    call __atomic_compare_exchange_8
+; RV32IA-TSO-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32IA-TSO-NEXT:    addi sp, sp, 16
+; RV32IA-TSO-NEXT:    ret
+;
+; RV32IA-TSO-ZACAS-LABEL: cmpxchg_i64_release_monotonic:
+; RV32IA-TSO-ZACAS:       # %bb.0:
+; RV32IA-TSO-ZACAS-NEXT:    mv a5, a4
+; R...
[truncated]

dtcxzyw · 2024-01-11T18:46:47Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+  bool Is64Bit = Subtarget.is64Bit();
+  unsigned Opcode;
+  if (Subtarget.hasStdExtZtso()) {
+    Opcode = Subtarget.is64Bit() ? RISCV::AMOCAS_Q : RISCV::AMOCAS_D_RV32;


Suggested change

Opcode = Subtarget.is64Bit() ? RISCV::AMOCAS_Q : RISCV::AMOCAS_D_RV32;

Opcode = Is64Bit ? RISCV::AMOCAS_Q : RISCV::AMOCAS_D_RV32;

dtcxzyw · 2024-01-11T18:56:15Z

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp

-      assert(!MO.getSubReg() && !VRRegMO.getSubReg() && "Unexpected subreg!");
+    // TODO: Add hints when there are GPRPair subregs?
+    if (PhysReg && (!NeedGPRC || RISCV::GPRCRegClass.contains(PhysReg)) &&
+        !MO.getSubReg() && !VRRegMO.getSubReg()) {


I'd like to see test changes caused by this.

dtcxzyw · 2024-01-11T18:59:08Z

llvm/test/CodeGen/RISCV/atomic-load-store.ll

+; RV32IA-ZACAS-NEXT:    li a2, 0
+; RV32IA-ZACAS-NEXT:    li a3, 0
+; RV32IA-ZACAS-NEXT:    amocas.d a2, a2, (a0)
+; RV32IA-ZACAS-NEXT:    mv a0, a2
+; RV32IA-ZACAS-NEXT:    mv a1, a3


Suggested change

; RV32IA-ZACAS-NEXT: li a2, 0

; RV32IA-ZACAS-NEXT: li a3, 0

; RV32IA-ZACAS-NEXT: amocas.d a2, a2, (a0)

; RV32IA-ZACAS-NEXT: mv a0, a2

; RV32IA-ZACAS-NEXT: mv a1, a3

; RV32IA-ZACAS-NEXT: mv a2, a0

; RV32IA-ZACAS-NEXT: amocas.d a0, a0, (a2)

I don't know how to give some hints to RA for better codegen :(

asb · 2024-01-16T11:34:07Z

I didn't see any way to signal AtomicExpandPass to convert them to the same libcalls using any of the shouldExpandAtomic* hooks. So I've forced them to use to CmpXChg expansion. I've disabled the insertion of fences for atomic load/store when we use CmpXChg.

I've very unsure if this the right thing to do or if we should make changes to AtomicExpand to get back the original libcalls.

I believe that the correct thing is to either consistently generate lock-free implementations for all operations of a given bitwidth, or to dispatch to the libcall. So using the zacas instructions is correct. We may want to update MaxAtomicInlineWidth in clang/lib/Basic/Targets/RISCV.h too.

Looking at other targets that do setMaxAtomicSizeInBitsSupported differently based on a target feature, I'm a bit surprised there's no consideration for potential ABI compatibility. @jyknight, isn't it right that if you have use of atomics cross an ABI boundary then mixing translation units compiled with different maximum atomic width may generate undesirable results? e.g. one TU might perform operations on a memory location using lock-based libcalls while the other uses native lock-free instructions inline. Given distros are looking at rolling out x86-64_v2 as a default this might become more common. But there seems to be no strategy to mitigate this (e.g. always going through a helper function in libatomic). Perhaps this kind of use of double-width atomics across ABI boundaries just doesn't happen in practice?

jyknight · 2024-01-16T17:29:37Z

The intent is that linking together code with differing max atomic widths should not cause ABI incompatibility, under the assumption that libatomic always knows how to use lock-free atomics for a given size if any code linked against it might do so. Which effectively implies that it must know about all microarchitecture levels the compiler knows how to build, and potentially dynamically dispatch to the right implementation.

This ensures that one TU using a libcall, and another using a native operation will work.

However, we don't yet implement a compliant libatomic in LLVM, though @Logikable is going to work on that. GCC libatomic does do this.

jyknight · 2024-01-16T17:31:29Z

This change isn't OK - we shouldn't add any new expansions of Load to CmpXChg. Unless there is an atomic 128-bit load operation also available in RISC-V, we cannot use amocas.q for implementing atomics.

We made the mistake of expanding load to cmpxchg on X86 and AArch64, but it's wrong. An atomic load
should work on read-only memory, and expanding to cmpxchg does not. Furthermore, turning memory reads into writes is extremely bad for scalability.

Luckily, on newer X86 and AArch64 microarchitectures there is support for 128-bit atomic load/store, so we can DTRT on newer microarchitectures.

GCC fixed this partially on X86: the generated code always defers to libatomic unless the 128-bit atomic load/store is available, but libatomic still has the broken behavior, for compatibility with older generated code. I believe they avoided making the mistake on AArch64. (See #75081).

topperc · 2024-01-16T17:48:44Z

This change isn't OK - we shouldn't add any new expansions of Load to CmpXChg. Unless there is an atomic 128-bit load operation also available in RISC-V, we cannot use amocas.q for implementing atomics.

There is no 128-bit atomic load for RISC-V currently defined

jyknight · 2024-01-17T16:13:22Z

There is no 128-bit atomic load for RISC-V currently defined

Perhaps the simplest path forward to address that would be to define an extension which has no instructions, but simply specifies that a 128-bit vector load or store is a single memory operation. E.g.

	vsetivli	zero, 2, e64, m1 # (optionally ta, ma)
	vle64.v	v8, (a0)
	vse64.v	v8, (a1)

That'd be analogous to how it's done on x86/aarch64.

topperc · 2024-01-17T16:58:55Z

There is no 128-bit atomic load for RISC-V currently defined

Perhaps the simplest path forward to address that would be to define an extension which has no instructions, but simply specifies that a 128-bit vector load or store is a single memory operation. E.g.
	vsetivli	zero, 2, e64, m1 # (optionally ta, ma)
	vle64.v	v8, (a0)
	vse64.v	v8, (a1)
That'd be analogous to how it's done on x86/aarch64.

RISC-V doesn't always have vector instructions, and the vector iSA does not require a 128-bit vector load to be atomic. It would depend on the microarchitecture.

jyknight · 2024-01-17T17:13:13Z

RISC-V doesn't always have vector instructions, and the vector iSA does not require a 128-bit vector load to be atomic. It would depend on the microarchitecture.

Yes, that's why I suggested an ISA extension that a given CPU could advertise support for, which does provide that guarantee.

topperc · 2024-01-17T17:39:10Z

RISC-V doesn't always have vector instructions, and the vector iSA does not require a 128-bit vector load to be atomic. It would depend on the microarchitecture.

Yes, that's why I suggested an ISA extension that a given CPU could advertise support for, which does provide that guarantee.

And the lowerings in this patch would require that extension be present?

jyknight · 2024-01-17T19:44:30Z

And the lowerings in this patch would require that extension be present?

Yeah.

For RV64 128-bit atomics support: require Zacas, V, and that new extension which promises that 128-bit vector load/store are a single memory operation. Then emit vector instructions for 128-bit atomic load/store.

For RV32 64-bit atomics support: require Zacas, D, and a new extension which promises that 64-bit FLD/FSD are a single memory operation. Emit those for 64-bit atomic load/store.

Alternatively to requiring the V or D extensiosn, an extension implementing integer load/store instructions of double-register widths (reusing the encodings of LQ/SQ in RV64 and LD/SD in RV32, which aren't currently supported) could be nice. I see the latter is actually already proposed in https://github.com/riscv/riscv-zilsd/blob/main/zilsd.adoc. But even with that, you'd still need the previously-described extension promising a single memory operation for double-word load/stores.

topperc · 2024-03-25T23:54:58Z

I'm going to close this. The libatomic ABI is a bigger blocker to this than support for 128-bit atomic load.

topperc requested review from asb, jrtc27, preames, dtcxzyw and wangpc-pp January 11, 2024 18:37

llvmbot added the backend:RISC-V label Jan 11, 2024

dtcxzyw reviewed Jan 11, 2024

View reviewed changes

asb mentioned this pull request Jan 17, 2024

[RISCV] Add experimental support of Zaamo and Zalrsc #77424

Open

topperc marked this pull request as draft February 2, 2024 19:45

sorear mentioned this pull request Feb 20, 2024

expected implications of Zacas in RVA23 riscv/riscv-profiles#144

Closed

sorear mentioned this pull request Mar 11, 2024

2*XLEN loads and stores riscv/riscv-zalasr#6

Closed

sorear mentioned this pull request Mar 19, 2024

software need for 16-byte atomic loads riscv/riscv-isa-manual#1537

Open

topperc closed this Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Support isel for Zacas for 2*XLen types. #77814

[RISCV] Support isel for Zacas for 2*XLen types. #77814

topperc commented Jan 11, 2024 •

edited by dtcxzyw

Loading

llvmbot commented Jan 11, 2024

dtcxzyw Jan 11, 2024

dtcxzyw Jan 11, 2024

dtcxzyw Jan 11, 2024

asb commented Jan 16, 2024

jyknight commented Jan 16, 2024

jyknight commented Jan 16, 2024

topperc commented Jan 16, 2024

jyknight commented Jan 17, 2024

topperc commented Jan 17, 2024

jyknight commented Jan 17, 2024

topperc commented Jan 17, 2024

jyknight commented Jan 17, 2024

topperc commented Mar 25, 2024

	Opcode = Subtarget.is64Bit() ? RISCV::AMOCAS_Q : RISCV::AMOCAS_D_RV32;
	Opcode = Is64Bit ? RISCV::AMOCAS_Q : RISCV::AMOCAS_D_RV32;

[RISCV] Support isel for Zacas for 2*XLen types. #77814

[RISCV] Support isel for Zacas for 2*XLen types. #77814

Conversation

topperc commented Jan 11, 2024 • edited by dtcxzyw Loading

llvmbot commented Jan 11, 2024

dtcxzyw Jan 11, 2024

Choose a reason for hiding this comment

dtcxzyw Jan 11, 2024

Choose a reason for hiding this comment

dtcxzyw Jan 11, 2024

Choose a reason for hiding this comment

asb commented Jan 16, 2024

jyknight commented Jan 16, 2024

jyknight commented Jan 16, 2024

topperc commented Jan 16, 2024

jyknight commented Jan 17, 2024

topperc commented Jan 17, 2024

jyknight commented Jan 17, 2024

topperc commented Jan 17, 2024

jyknight commented Jan 17, 2024

topperc commented Mar 25, 2024

topperc commented Jan 11, 2024 •

edited by dtcxzyw

Loading