[RISCV] Select atomic_{load/store} to pseudos and expand them later #67108

wangpc-pp · 2023-09-22T09:43:26Z

In previous implementation, we will insert leading/trailing fences
in AtomicExpandPass and change the memory order to monotonic
before ISel. And then atomic load/store are selected to normal
load/store instructions.

In this patch, we won't insert fences for atomic load/store before
ISel and we select them to pseudos. Then, we expand them just like
other atomic operations in RISCVExpandAtomicPseudo.

I do this for two reasons:

All atomic operations are expanded in RISCVExpandAtomicPseudo
except atomic load/store, which is inconsistent and a little
confusing when digging into the implementation.
For some hardware implementations, load+fence and fence+store
can be fused to optimized macro instructions. This requires that
fence and load/store are glued together. We can achieve this via
defining DAGMutations in RISCVMacroFusion.cpp, but I think
this expansion method is more straightforward since we consider
atomic load/store as whole instructions.

In previous implementation, we will insert leading/trailing fences in `AtomicExpandPass` and change the memory order to `monotonic` before ISel. And then atomic load/store are selected to normal load/store instructions. In this patch, we won't insert fences for atomic load/store before ISel and we select them to pseudos. Then, we expand them just like other atomic operations in `RISCVExpandAtomicPseudo`. I do this for two reason: 1. All atomic operations are expanded in `RISCVExpandAtomicPseudo` except atomic load/store, which is inconsistent and a little confusing when digging into the implementation. 2. For some hardware implementations, `load+fence` and `fence+store` can be fused to optimized macro instructions. This requires that fence and load/store are glued together. We can achieve this via defining `DAGMutation`s in `RISCVMacroFusion.cpp`, but I think this expansion method is more straightforward since we consider atomic load/store as whole instructions.

llvmbot · 2023-09-22T09:44:31Z

@llvm/pr-subscribers-backend-risc-v

Changes

In previous implementation, we will insert leading/trailing fences
in AtomicExpandPass and change the memory order to monotonic
before ISel. And then atomic load/store are selected to normal
load/store instructions.

In this patch, we won't insert fences for atomic load/store before
ISel and we select them to pseudos. Then, we expand them just like
other atomic operations in RISCVExpandAtomicPseudo.

I do this for two reason:

All atomic operations are expanded in RISCVExpandAtomicPseudo
except atomic load/store, which is inconsistent and a little
confusing when digging into the implementation.
For some hardware implementations, load+fence and fence+store
can be fused to optimized macro instructions. This requires that
fence and load/store are glued together. We can achieve this via
defining DAGMutations in RISCVMacroFusion.cpp, but I think
this expansion method is more straightforward since we consider
atomic load/store as whole instructions.

Full diff: https://github.com/llvm/llvm-project/pull/67108.diff

5 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp (+105-1)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (-33)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (-8)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+35-3)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoA.td (+84-10)

diff --git a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
index bb772fc5da92244..ba71b18d77c7abb 100644
--- a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
@@ -48,6 +48,9 @@ class RISCVExpandAtomicPseudo : public MachineFunctionPass {
   bool expandMBB(MachineBasicBlock &MBB);
   bool expandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
                 MachineBasicBlock::iterator &NextMBBI);
+  bool expandAtomicLoadStore(MachineBasicBlock &MBB,
+                             MachineBasicBlock::iterator MBBI,
+                             unsigned int Opcode, bool IsLoad);
   bool expandAtomicBinOp(MachineBasicBlock &MBB,
                          MachineBasicBlock::iterator MBBI, AtomicRMWInst::BinOp,
                          bool IsMasked, int Width,
@@ -111,6 +114,22 @@ bool RISCVExpandAtomicPseudo::expandMI(MachineBasicBlock &MBB,
   // expanded instructions for each pseudo is correct in the Size field of the
   // tablegen definition for the pseudo.
   switch (MBBI->getOpcode()) {
+  case RISCV::PseudoAtomicLB:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::LB, true);
+  case RISCV::PseudoAtomicLH:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::LH, true);
+  case RISCV::PseudoAtomicLW:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::LW, true);
+  case RISCV::PseudoAtomicLD:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::LD, true);
+  case RISCV::PseudoAtomicSB:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::SB, false);
+  case RISCV::PseudoAtomicSH:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::SH, false);
+  case RISCV::PseudoAtomicSW:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::SW, false);
+  case RISCV::PseudoAtomicSD:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::SD, false);
   case RISCV::PseudoAtomicLoadNand32:
     return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Nand, false, 32,
                              NextMBBI);
@@ -385,6 +404,91 @@ static void doMaskedAtomicBinOpExpansion(const RISCVInstrInfo *TII,
       .addMBB(LoopMBB);
 }
 
+static void insertFence(const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+                        MachineBasicBlock::iterator MBBI, DebugLoc DL,
+                        AtomicOrdering Ordering) {
+  // fence acq_rel -> fence.tso
+  if (Ordering == AtomicOrdering::AcquireRelease) {
+    BuildMI(MBB, MBBI, DL, TII->get(RISCV::FENCE_TSO));
+  } else {
+    int Pred, Succ;
+    switch (Ordering) {
+    default:
+      llvm_unreachable("Unsupported AtomicOrdering");
+    case AtomicOrdering::Acquire:
+      // fence acquire -> fence r, rw
+      Pred = 0b10;
+      Succ = 0b11;
+      break;
+    case AtomicOrdering::Release:
+      // fence release -> fence rw, w
+      Pred = 0b11;
+      Succ = 0b01;
+      break;
+    case AtomicOrdering::SequentiallyConsistent:
+      // fence seq_cst -> fence rw, rw
+      Pred = 0b11;
+      Succ = 0b11;
+      break;
+    }
+    BuildMI(MBB, MBBI, DL, TII->get(RISCV::FENCE)).addImm(Pred).addImm(Succ);
+  }
+}
+
+static void emitLeadingFence(const RISCVSubtarget *Subtarget,
+                             const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+                             MachineBasicBlock::iterator MBBI, DebugLoc DL,
+                             AtomicOrdering Ordering, bool IsLoad) {
+  if (Subtarget->hasStdExtZtso()) {
+    if (IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent)
+      insertFence(TII, MBB, MBBI, DL, Ordering);
+    return;
+  }
+
+  if (IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent) {
+    insertFence(TII, MBB, MBBI, DL, Ordering);
+    return;
+  }
+
+  if (!IsLoad && isReleaseOrStronger(Ordering))
+    insertFence(TII, MBB, MBBI, DL, AtomicOrdering::Release);
+}
+
+static void emitTrailingFence(const RISCVSubtarget *Subtarget,
+                              const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+                              MachineBasicBlock::iterator MBBI, DebugLoc DL,
+                              AtomicOrdering Ordering, bool IsLoad) {
+  if (Subtarget->hasStdExtZtso()) {
+    if (!IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent)
+      insertFence(TII, MBB, MBBI, DL, Ordering);
+    return;
+  }
+
+  if (IsLoad && isAcquireOrStronger(Ordering)) {
+    insertFence(TII, MBB, MBBI, DL, AtomicOrdering::Acquire);
+    return;
+  }
+
+  if (Subtarget->enableSeqCstTrailingFence() && !IsLoad &&
+      Ordering == AtomicOrdering::SequentiallyConsistent)
+    insertFence(TII, MBB, MBBI, DL, AtomicOrdering::SequentiallyConsistent);
+}
+
+bool RISCVExpandAtomicPseudo::expandAtomicLoadStore(
+    MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
+    unsigned int Opcode, bool IsLoad) {
+  auto Ordering = static_cast<AtomicOrdering>(MBBI->getOperand(3).getImm());
+  DebugLoc DL = MBBI->getDebugLoc();
+  emitLeadingFence(STI, TII, MBB, MBBI, DL, Ordering, IsLoad);
+  BuildMI(MBB, MBBI, DL, TII->get(Opcode))
+      .add(MBBI->getOperand(0))
+      .add(MBBI->getOperand(1))
+      .add(MBBI->getOperand(2));
+  emitTrailingFence(STI, TII, MBB, MBBI, DL, Ordering, IsLoad);
+  MBBI->eraseFromParent();
+  return true;
+}
+
 bool RISCVExpandAtomicPseudo::expandAtomicBinOp(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
     AtomicRMWInst::BinOp BinOp, bool IsMasked, int Width,
@@ -554,7 +658,7 @@ bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
   computeAndAddLiveIns(LiveRegs, *DoneMBB);
 
   return true;
-}
+  }
 
 // If a BNE on the cmpxchg comparison result immediately follows the cmpxchg
 // operation, it can be folded into the cmpxchg expansion by
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f1cea6c6756f4fc..552666e8abbd88a 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -17770,39 +17770,6 @@ void RISCVTargetLowering::LowerAsmOperandForConstraint(
   TargetLowering::LowerAsmOperandForConstraint(Op, Constraint, Ops, DAG);
 }
 
-Instruction *RISCVTargetLowering::emitLeadingFence(IRBuilderBase &Builder,
-                                                   Instruction *Inst,
-                                                   AtomicOrdering Ord) const {
-  if (Subtarget.hasStdExtZtso()) {
-    if (isa<LoadInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
-      return Builder.CreateFence(Ord);
-    return nullptr;
-  }
-
-  if (isa<LoadInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
-    return Builder.CreateFence(Ord);
-  if (isa<StoreInst>(Inst) && isReleaseOrStronger(Ord))
-    return Builder.CreateFence(AtomicOrdering::Release);
-  return nullptr;
-}
-
-Instruction *RISCVTargetLowering::emitTrailingFence(IRBuilderBase &Builder,
-                                                    Instruction *Inst,
-                                                    AtomicOrdering Ord) const {
-  if (Subtarget.hasStdExtZtso()) {
-    if (isa<StoreInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
-      return Builder.CreateFence(Ord);
-    return nullptr;
-  }
-
-  if (isa<LoadInst>(Inst) && isAcquireOrStronger(Ord))
-    return Builder.CreateFence(AtomicOrdering::Acquire);
-  if (Subtarget.enableSeqCstTrailingFence() && isa<StoreInst>(Inst) &&
-      Ord == AtomicOrdering::SequentiallyConsistent)
-    return Builder.CreateFence(AtomicOrdering::SequentiallyConsistent);
-  return nullptr;
-}
-
 TargetLowering::AtomicExpansionKind
 RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
   // atomicrmw {fadd,fsub} must be expanded to use compare-exchange, as floating
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 815b9be47f56026..52ace789a702c07 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -613,14 +613,6 @@ class RISCVTargetLowering : public TargetLowering {
 
   bool preferZeroCompareBranch() const override { return true; }
 
-  bool shouldInsertFencesForAtomic(const Instruction *I) const override {
-    return isa<LoadInst>(I) || isa<StoreInst>(I);
-  }
-  Instruction *emitLeadingFence(IRBuilderBase &Builder, Instruction *Inst,
-                                AtomicOrdering Ord) const override;
-  Instruction *emitTrailingFence(IRBuilderBase &Builder, Instruction *Inst,
-                                 AtomicOrdering Ord) const override;
-
   bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
                                   EVT VT) const override;
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 816ceaf95607e71..e56a91026702e31 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1270,11 +1270,45 @@ unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
   if (MI.isMetaInstruction())
     return 0;
 
+  const MachineFunction &MF = *MI.getParent()->getParent();
+  const auto &ST = MF.getSubtarget<RISCVSubtarget>();
+
   unsigned Opcode = MI.getOpcode();
+  switch (Opcode) {
+  default:
+    break;
+  case RISCV::PseudoAtomicLB:
+  case RISCV::PseudoAtomicLH:
+  case RISCV::PseudoAtomicLW:
+  case RISCV::PseudoAtomicLD: {
+    auto Ordering = static_cast<AtomicOrdering>(MI.getOperand(3).getImm());
+    switch (Ordering) {
+    default:
+      return 4;
+    case AtomicOrdering::Acquire:
+      return 8;
+    case AtomicOrdering::SequentiallyConsistent:
+      return ST.hasStdExtZtso() ? 8 : 12;
+    }
+  }
+  case RISCV::PseudoAtomicSB:
+  case RISCV::PseudoAtomicSH:
+  case RISCV::PseudoAtomicSW:
+  case RISCV::PseudoAtomicSD: {
+    auto Ordering = static_cast<AtomicOrdering>(MI.getOperand(3).getImm());
+    switch (Ordering) {
+    default:
+      return 4;
+    case AtomicOrdering::Release:
+      return 8;
+    case AtomicOrdering::SequentiallyConsistent:
+      return ST.hasStdExtZtso() ? 8 : ST.enableSeqCstTrailingFence() ? 12 : 8;
+    }
+  }
+  }
 
   if (Opcode == TargetOpcode::INLINEASM ||
       Opcode == TargetOpcode::INLINEASM_BR) {
-    const MachineFunction &MF = *MI.getParent()->getParent();
     const auto &TM = static_cast<const RISCVTargetMachine &>(MF.getTarget());
     return getInlineAsmLength(MI.getOperand(0).getSymbolName(),
                               *TM.getMCAsmInfo());
@@ -1282,8 +1316,6 @@ unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
 
   if (!MI.memoperands_empty()) {
     MachineMemOperand *MMO = *(MI.memoperands_begin());
-    const MachineFunction &MF = *MI.getParent()->getParent();
-    const auto &ST = MF.getSubtarget<RISCVSubtarget>();
     if (ST.hasStdExtZihintntl() && MMO->isNonTemporal()) {
       if (ST.hasStdExtCOrZca() && ST.enableRVCHintInstrs()) {
         if (isCompressibleInst(MI, STI))
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoA.td b/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
index c43af14bb7f7005..5d806de621f61be 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
@@ -110,21 +110,95 @@ defm AMOCAS_Q : AMO_rr_aq_rl<0b00101, 0b100, "amocas.q">;
 //===----------------------------------------------------------------------===//
 
 // Atomic load/store are available under both +a and +force-atomics.
-// Fences will be inserted for atomic load/stores according to the logic in
-// RISCVTargetLowering::{emitLeadingFence,emitTrailingFence}.
+
+// Pseudo atomic load instructions.
+class PseudoAtomicLoad
+    : Pseudo<(outs GPR:$rd), (ins GPRMem:$rs1, simm12:$imm12, ixlenimm:$ordering), []> {
+  let hasSideEffects = 1;
+  let mayLoad = 1;
+  let mayStore = 0;
+}
+
+// Pseudo atomic store instructions.
+class PseudoAtomicStore
+    : Pseudo<(outs), (ins GPR:$rs2, GPRMem:$rs1, simm12:$imm12, ixlenimm:$ordering), []> {
+  let hasSideEffects = 1;
+  let mayLoad = 0;
+  let mayStore = 1;
+}
+
+let IsAtomic = 1 in {
+class AtomicLoadPatFrag<PatFrags base> : PatFrag<(ops node:$ptr), (base node:$ptr)>;
+class AtomicStorePatFrag<PatFrags base> : PatFrag<(ops node:$val, node:$ptr),
+                                                 (base node:$val, node:$ptr)>;
+}
+
+// An atomic load operation that doesn't actually need to be atomic.
+let IsAtomicOrderingAcquireOrStronger = 0 in
+class relaxed_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic load operation that needs acquire semantics.
+let IsAtomicOrderingAcquire = 1 in
+class acquire_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic load operation that needs sequential consistency semantics.
+let IsAtomicOrderingSequentiallyConsistent = 1 in
+class seq_cst_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic store operation that doesn't actually need to be atomic.
+let IsAtomicOrderingReleaseOrStronger = 0 in
+class relaxed_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+// An atomic store operation that needs release semantics.
+let IsAtomicOrderingRelease = 1 in
+class release_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+// An atomic store operation that needs sequential consistency semantics.
+let IsAtomicOrderingSequentiallyConsistent = 1 in
+class seq_cst_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+multiclass AtomicLdPat<PatFrag LoadOp, RVInst Inst> {
+  def : Pat<(XLenVT (relaxed_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+            (Inst GPR:$rs1, simm12:$imm12, 2)>;
+  def : Pat<(XLenVT (acquire_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+            (Inst GPR:$rs1, simm12:$imm12, 4)>;
+  def : Pat<(XLenVT (seq_cst_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+            (Inst GPR:$rs1, simm12:$imm12, 7)>;
+}
+
+multiclass AtomicStPat<PatFrag StoreOp, RVInst Inst> {
+  def : Pat<(relaxed_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+            (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 2)>;
+  def : Pat<(release_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+            (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 5)>;
+  def : Pat<(seq_cst_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+            (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 7)>;
+}
+
 let Predicates = [HasAtomicLdSt] in {
-  def : LdPat<atomic_load_8,  LB>;
-  def : LdPat<atomic_load_16, LH>;
-  def : LdPat<atomic_load_32, LW>;
+  def PseudoAtomicLB : PseudoAtomicLoad;
+  def PseudoAtomicLH : PseudoAtomicLoad;
+  def PseudoAtomicLW : PseudoAtomicLoad;
+
+  def PseudoAtomicSB : PseudoAtomicStore;
+  def PseudoAtomicSH : PseudoAtomicStore;
+  def PseudoAtomicSW : PseudoAtomicStore;
 
-  def : StPat<atomic_store_8,  SB, GPR, XLenVT>;
-  def : StPat<atomic_store_16, SH, GPR, XLenVT>;
-  def : StPat<atomic_store_32, SW, GPR, XLenVT>;
+  defm : AtomicLdPat<atomic_load_8,  PseudoAtomicLB>;
+  defm : AtomicLdPat<atomic_load_16, PseudoAtomicLH>;
+  defm : AtomicLdPat<atomic_load_32, PseudoAtomicLW>;
+
+  defm : AtomicStPat<atomic_store_8,  PseudoAtomicSB>;
+  defm : AtomicStPat<atomic_store_16, PseudoAtomicSH>;
+  defm : AtomicStPat<atomic_store_32, PseudoAtomicSW>;
 }
 
 let Predicates = [HasAtomicLdSt, IsRV64] in {
-  def : LdPat<atomic_load_64, LD, i64>;
-  def : StPat<atomic_store_64, SD, GPR, i64>;
+  def PseudoAtomicLD : PseudoAtomicLoad;
+  def PseudoAtomicSD : PseudoAtomicStore;
+
+  defm : AtomicLdPat<atomic_load_64, PseudoAtomicLD>;
+  defm : AtomicStPat<atomic_store_64, PseudoAtomicSD>;
 }
 
 /// AMOs

github-actions · 2023-09-22T09:56:36Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…later clang-format

wangpc-pp · 2023-09-27T07:04:52Z

If we want to make fence and load/store schedulable separately, we can just expand them before RA in RISCVPreRAExpandPseudo.

wangpc-pp · 2023-10-08T08:42:24Z

Ping.
Any feedbacks are welcome!

wangpc-pp · 2023-10-20T06:26:01Z

Ping.

asb · 2023-10-20T15:57:41Z

First of all, sorry that this fell of the review queue.

Some thoughts:

The diff vs what we have today isn't massive, so I don't have a strong aversion to this. Though particularly on the tablegen side, there is a bit more target-specific complexity I think.
You're right that in general atomic operations are expanded later on, though I wouldn't say it necessarily follows that its better to do all in the same way. Alternatively, you could view the current setup as the RISC-V backend using the target-independent hooks wherever possible for atomics codegen and falling back to late stage expansion via a target-specific codepath only when necessary.
The docs on atomics in LLVM are I think actually pretty good https://llvm.org/docs/Atomics.html and I tried to expand them to properly cover what we do on RISC-V, so that should hopefully help anyone diving it. It's hard for me to judge how much the docs make sense for someone who's never messed with atomics lowering in LLVM though!
If we go in this direction, please add a doc comment to record the motivation for not using the emitLeadingFence/emitTrailingFence hooks.
Your second reason does make sense to me, but it would be good to get a second opinion - e.g .@preames @topperc

asb · 2023-10-20T15:59:18Z

Oh, and was #69685 just an opportune moment to re-advertise this patch - or did you see that work interacting with this change in some way? if so, could you elaborate please? Many thanks.

jrtc27 · 2023-10-20T17:31:32Z

Do other architectures use this approach? Having custom pseudo-based lowering for something SDAG can represent directly just fine is pretty sad (and adds more work for us downstream, too), it's a bunch of code that doesn't really need to exist. I would much rather see the generic lowering be adapted to support the optimisations you want (if they are in fact important) rather than making things clunkier like this. I won't block the patch, but I will heavily encourage this change to not be applied if at all possible.

wangpc-pp · 2023-10-23T05:44:02Z

Oh, and was #69685 just an opportune moment to re-advertise this patch - or did you see that work interacting with this change in some way? if so, could you elaborate please? Many thanks.

Zalasr extension provide instructions for atomic load/store (for example, load acquire and store release). If we are going to support CodeGen of this extension, then we shouldn't expand atomic load/store before ISel, or the patterns may not be matched because we may insert some LLVM IRs between fence instruction and load/store (and we may not be able to describe the patterns in TableGen). For AArch64, this kind of atomic load/store instructions will be selected in the way of this PR (actually, this PR is inspired by AArch64 target :-)), except that they are selected to actual instructions instead of pseudos.

Fused atomic load/store are just like Zalasr extension, except the addressing mode (Zalasr supports zero offset address only, while fused atomic load/store can be like normal load/store instructions). The fused version of atomic load/store is just a workaround since RISCV is lack of such instructions, while Zalasr has fill the hole.

Before this PR, the lowering path of atomic load/store is:

atomic load/store (LLVM IR) -> *Expand Atomic instructions Pass* -> monotonic load/store with leading/trailing fence (LLVM IR) -> SDAG -> *ISel* -> normal load/store instructions with leading/trailing fence

After this PR, it will be like:

atomic load/store (LLVM IR) -> SDAG -> *ISel* -> pseudo atomic load/store -> *RISC-V atomic pseudo instruction expansion pass* -> normal load/store instructions with leading/trailing fence

If Zalasr is enabled, then we can expand pseudo atomic load/store to instructions in Zalasr instead of normal load/store instructions with leading/trailing fence.

I don't know if the above statement has made my intention of this PR clearer, let me know if there are any questions. :-)
CC @jrtc27

topperc · 2023-10-23T05:53:42Z

Oh, and was #69685 just an opportune moment to re-advertise this patch - or did you see that work interacting with this change in some way? if so, could you elaborate please? Many thanks.

Zalasr extension provide instructions for atomic load/store (for example, load acquire and store release). If we are going to support CodeGen of this extension, then we shouldn't expand atomic load/store before ISel, or the patterns may not be matched because we may insert some LLVM IRs between fence instruction and load/store (and we may not be able to describe the patterns in TableGen). For AArch64, this kind of atomic load/store instructions will be selected in the way of this PR (actually, this PR is inspired by AArch64 target :-)), except that they are selected to actual instructions instead of pseudos.

Fused atomic load/store are just like Zalasr extension, except the addressing mode (Zalasr supports zero offset address only, while fused atomic load/store can be like normal load/store instructions). The fused version of atomic load/store is just a workaround since RISCV is lack of such instructions, while Zalasr has fill the hole.

Before this PR, the lowering path of atomic load/store is:
atomic load/store (LLVM IR) -> *Expand Atomic instructions Pass* -> monotonic load/store with leading/trailing fence (LLVM IR) -> SDAG -> *ISel* -> normal load/store instructions with leading/trailing fence
After this PR, it will be like:
atomic load/store (LLVM IR) -> SDAG -> *ISel* -> pseudo atomic load/store -> *RISC-V atomic pseudo instruction expansion pass* -> normal load/store instructions with leading/trailing fence
If Zalasr is enabled, then we can expand pseudo atomic load/store to instructions in Zalasr instead of normal load/store instructions with leading/trailing fence.

I don't know if the above statement has made my intention of this PR clearer, let me know if there are any questions. :-) CC @jrtc27

If we have Zalasr, wouldn't we disable the fence emission rather than trying to pattern match it?

jrtc27 · 2023-10-23T06:07:14Z

Yes, just don't expand them into memory access + fence... you don't need a pseudo for that

wangpc-pp · 2023-10-23T07:07:57Z

If we have Zalasr, wouldn't we disable the fence emission rather than trying to pattern match it?
Yes, just don't expand them into memory access + fence... you don't need a pseudo for that

Yes, I agree. We can just do the instruction selection like AArch64, and the patterns will be like these in this PR except that the selected instructions are Zalasr instructions instead of pseudos.

The existence of pseudos is to unify the implementation of Zalasr and fused atomic load/store, can this be a strong reason? And there is one more information that MacroFusion way may not be perfect because some fence+load/store are still seperated when postRA scheduler is enabled. After some rough investigation, this may be bacause there is no data dependency between fence and load/store and the dependency analysis is not so smart.

llvmbot added the backend:RISC-V label Sep 22, 2023

wangpc-pp requested review from topperc, BeMg and nikic September 22, 2023 09:44

wangpc-pp force-pushed the main-atomic-load-store-pseudos branch from a675062 to ef38045 Compare September 22, 2023 09:44

wangpc-pp requested review from asb, preames and kito-cheng September 22, 2023 09:46

fixup! [RISCV] Select atomic_{load/store} to pseudos and expand them …

c358647

…later clang-format

nikic removed their request for review September 25, 2023 19:11

wangpc-pp mentioned this pull request Oct 20, 2023

[RISCV][MC] MC layer support for the experimental zalasr extension #69685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Select atomic_{load/store} to pseudos and expand them later #67108

[RISCV] Select atomic_{load/store} to pseudos and expand them later #67108

wangpc-pp commented Sep 22, 2023 •

edited

Loading

llvmbot commented Sep 22, 2023

github-actions bot commented Sep 22, 2023 •

edited

Loading

wangpc-pp commented Sep 27, 2023

wangpc-pp commented Oct 8, 2023

wangpc-pp commented Oct 20, 2023

asb commented Oct 20, 2023

asb commented Oct 20, 2023

jrtc27 commented Oct 20, 2023 •

edited

Loading

wangpc-pp commented Oct 23, 2023 •

edited

Loading

topperc commented Oct 23, 2023

jrtc27 commented Oct 23, 2023

wangpc-pp commented Oct 23, 2023 •

edited

Loading

[RISCV] Select atomic_{load/store} to pseudos and expand them later #67108

Are you sure you want to change the base?

[RISCV] Select atomic_{load/store} to pseudos and expand them later #67108

Conversation

wangpc-pp commented Sep 22, 2023 • edited Loading

llvmbot commented Sep 22, 2023

github-actions bot commented Sep 22, 2023 • edited Loading

wangpc-pp commented Sep 27, 2023

wangpc-pp commented Oct 8, 2023

wangpc-pp commented Oct 20, 2023

asb commented Oct 20, 2023

asb commented Oct 20, 2023

jrtc27 commented Oct 20, 2023 • edited Loading

wangpc-pp commented Oct 23, 2023 • edited Loading

topperc commented Oct 23, 2023

jrtc27 commented Oct 23, 2023

wangpc-pp commented Oct 23, 2023 • edited Loading

wangpc-pp commented Sep 22, 2023 •

edited

Loading

github-actions bot commented Sep 22, 2023 •

edited

Loading

jrtc27 commented Oct 20, 2023 •

edited

Loading

wangpc-pp commented Oct 23, 2023 •

edited

Loading

wangpc-pp commented Oct 23, 2023 •

edited

Loading