Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Select atomic_{load/store} to pseudos and expand them later #67108

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Sep 22, 2023

In previous implementation, we will insert leading/trailing fences
in AtomicExpandPass and change the memory order to monotonic
before ISel. And then atomic load/store are selected to normal
load/store instructions.

In this patch, we won't insert fences for atomic load/store before
ISel and we select them to pseudos. Then, we expand them just like
other atomic operations in RISCVExpandAtomicPseudo.

I do this for two reasons:

  1. All atomic operations are expanded in RISCVExpandAtomicPseudo
    except atomic load/store, which is inconsistent and a little
    confusing when digging into the implementation.

  2. For some hardware implementations, load+fence and fence+store
    can be fused to optimized macro instructions. This requires that
    fence and load/store are glued together. We can achieve this via
    defining DAGMutations in RISCVMacroFusion.cpp, but I think
    this expansion method is more straightforward since we consider
    atomic load/store as whole instructions.

In previous implementation, we will insert leading/trailing fences
in `AtomicExpandPass` and change the memory order to `monotonic`
before ISel. And then atomic load/store are selected to normal
load/store instructions.

In this patch, we won't insert fences for atomic load/store before
ISel and we select them to pseudos. Then, we expand them just like
other atomic operations in `RISCVExpandAtomicPseudo`.

I do this for two reason:

1. All atomic operations are expanded in `RISCVExpandAtomicPseudo`
  except atomic load/store, which is inconsistent and a little
  confusing when digging into the implementation.

2. For some hardware implementations, `load+fence` and `fence+store`
  can be fused to optimized macro instructions. This requires that
  fence and load/store are glued together. We can achieve this via
  defining `DAGMutation`s in `RISCVMacroFusion.cpp`, but I think
  this expansion method is more straightforward since we consider
  atomic load/store as whole instructions.
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 22, 2023

@llvm/pr-subscribers-backend-risc-v

Changes

In previous implementation, we will insert leading/trailing fences
in AtomicExpandPass and change the memory order to monotonic
before ISel. And then atomic load/store are selected to normal
load/store instructions.

In this patch, we won't insert fences for atomic load/store before
ISel and we select them to pseudos. Then, we expand them just like
other atomic operations in RISCVExpandAtomicPseudo.

I do this for two reason:

  1. All atomic operations are expanded in RISCVExpandAtomicPseudo
    except atomic load/store, which is inconsistent and a little
    confusing when digging into the implementation.

  2. For some hardware implementations, load+fence and fence+store
    can be fused to optimized macro instructions. This requires that
    fence and load/store are glued together. We can achieve this via
    defining DAGMutations in RISCVMacroFusion.cpp, but I think
    this expansion method is more straightforward since we consider
    atomic load/store as whole instructions.


Full diff: https://github.com/llvm/llvm-project/pull/67108.diff

5 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp (+105-1)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (-33)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (-8)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+35-3)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoA.td (+84-10)
diff --git a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
index bb772fc5da92244..ba71b18d77c7abb 100644
--- a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
@@ -48,6 +48,9 @@ class RISCVExpandAtomicPseudo : public MachineFunctionPass {
   bool expandMBB(MachineBasicBlock &MBB);
   bool expandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
                 MachineBasicBlock::iterator &NextMBBI);
+  bool expandAtomicLoadStore(MachineBasicBlock &MBB,
+                             MachineBasicBlock::iterator MBBI,
+                             unsigned int Opcode, bool IsLoad);
   bool expandAtomicBinOp(MachineBasicBlock &MBB,
                          MachineBasicBlock::iterator MBBI, AtomicRMWInst::BinOp,
                          bool IsMasked, int Width,
@@ -111,6 +114,22 @@ bool RISCVExpandAtomicPseudo::expandMI(MachineBasicBlock &MBB,
   // expanded instructions for each pseudo is correct in the Size field of the
   // tablegen definition for the pseudo.
   switch (MBBI->getOpcode()) {
+  case RISCV::PseudoAtomicLB:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::LB, true);
+  case RISCV::PseudoAtomicLH:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::LH, true);
+  case RISCV::PseudoAtomicLW:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::LW, true);
+  case RISCV::PseudoAtomicLD:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::LD, true);
+  case RISCV::PseudoAtomicSB:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::SB, false);
+  case RISCV::PseudoAtomicSH:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::SH, false);
+  case RISCV::PseudoAtomicSW:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::SW, false);
+  case RISCV::PseudoAtomicSD:
+    return expandAtomicLoadStore(MBB, MBBI, RISCV::SD, false);
   case RISCV::PseudoAtomicLoadNand32:
     return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Nand, false, 32,
                              NextMBBI);
@@ -385,6 +404,91 @@ static void doMaskedAtomicBinOpExpansion(const RISCVInstrInfo *TII,
       .addMBB(LoopMBB);
 }
 
+static void insertFence(const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+                        MachineBasicBlock::iterator MBBI, DebugLoc DL,
+                        AtomicOrdering Ordering) {
+  // fence acq_rel -> fence.tso
+  if (Ordering == AtomicOrdering::AcquireRelease) {
+    BuildMI(MBB, MBBI, DL, TII->get(RISCV::FENCE_TSO));
+  } else {
+    int Pred, Succ;
+    switch (Ordering) {
+    default:
+      llvm_unreachable("Unsupported AtomicOrdering");
+    case AtomicOrdering::Acquire:
+      // fence acquire -> fence r, rw
+      Pred = 0b10;
+      Succ = 0b11;
+      break;
+    case AtomicOrdering::Release:
+      // fence release -> fence rw, w
+      Pred = 0b11;
+      Succ = 0b01;
+      break;
+    case AtomicOrdering::SequentiallyConsistent:
+      // fence seq_cst -> fence rw, rw
+      Pred = 0b11;
+      Succ = 0b11;
+      break;
+    }
+    BuildMI(MBB, MBBI, DL, TII->get(RISCV::FENCE)).addImm(Pred).addImm(Succ);
+  }
+}
+
+static void emitLeadingFence(const RISCVSubtarget *Subtarget,
+                             const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+                             MachineBasicBlock::iterator MBBI, DebugLoc DL,
+                             AtomicOrdering Ordering, bool IsLoad) {
+  if (Subtarget->hasStdExtZtso()) {
+    if (IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent)
+      insertFence(TII, MBB, MBBI, DL, Ordering);
+    return;
+  }
+
+  if (IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent) {
+    insertFence(TII, MBB, MBBI, DL, Ordering);
+    return;
+  }
+
+  if (!IsLoad && isReleaseOrStronger(Ordering))
+    insertFence(TII, MBB, MBBI, DL, AtomicOrdering::Release);
+}
+
+static void emitTrailingFence(const RISCVSubtarget *Subtarget,
+                              const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+                              MachineBasicBlock::iterator MBBI, DebugLoc DL,
+                              AtomicOrdering Ordering, bool IsLoad) {
+  if (Subtarget->hasStdExtZtso()) {
+    if (!IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent)
+      insertFence(TII, MBB, MBBI, DL, Ordering);
+    return;
+  }
+
+  if (IsLoad && isAcquireOrStronger(Ordering)) {
+    insertFence(TII, MBB, MBBI, DL, AtomicOrdering::Acquire);
+    return;
+  }
+
+  if (Subtarget->enableSeqCstTrailingFence() && !IsLoad &&
+      Ordering == AtomicOrdering::SequentiallyConsistent)
+    insertFence(TII, MBB, MBBI, DL, AtomicOrdering::SequentiallyConsistent);
+}
+
+bool RISCVExpandAtomicPseudo::expandAtomicLoadStore(
+    MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
+    unsigned int Opcode, bool IsLoad) {
+  auto Ordering = static_cast<AtomicOrdering>(MBBI->getOperand(3).getImm());
+  DebugLoc DL = MBBI->getDebugLoc();
+  emitLeadingFence(STI, TII, MBB, MBBI, DL, Ordering, IsLoad);
+  BuildMI(MBB, MBBI, DL, TII->get(Opcode))
+      .add(MBBI->getOperand(0))
+      .add(MBBI->getOperand(1))
+      .add(MBBI->getOperand(2));
+  emitTrailingFence(STI, TII, MBB, MBBI, DL, Ordering, IsLoad);
+  MBBI->eraseFromParent();
+  return true;
+}
+
 bool RISCVExpandAtomicPseudo::expandAtomicBinOp(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
     AtomicRMWInst::BinOp BinOp, bool IsMasked, int Width,
@@ -554,7 +658,7 @@ bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
   computeAndAddLiveIns(LiveRegs, *DoneMBB);
 
   return true;
-}
+  }
 
 // If a BNE on the cmpxchg comparison result immediately follows the cmpxchg
 // operation, it can be folded into the cmpxchg expansion by
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f1cea6c6756f4fc..552666e8abbd88a 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -17770,39 +17770,6 @@ void RISCVTargetLowering::LowerAsmOperandForConstraint(
   TargetLowering::LowerAsmOperandForConstraint(Op, Constraint, Ops, DAG);
 }
 
-Instruction *RISCVTargetLowering::emitLeadingFence(IRBuilderBase &Builder,
-                                                   Instruction *Inst,
-                                                   AtomicOrdering Ord) const {
-  if (Subtarget.hasStdExtZtso()) {
-    if (isa<LoadInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
-      return Builder.CreateFence(Ord);
-    return nullptr;
-  }
-
-  if (isa<LoadInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
-    return Builder.CreateFence(Ord);
-  if (isa<StoreInst>(Inst) && isReleaseOrStronger(Ord))
-    return Builder.CreateFence(AtomicOrdering::Release);
-  return nullptr;
-}
-
-Instruction *RISCVTargetLowering::emitTrailingFence(IRBuilderBase &Builder,
-                                                    Instruction *Inst,
-                                                    AtomicOrdering Ord) const {
-  if (Subtarget.hasStdExtZtso()) {
-    if (isa<StoreInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
-      return Builder.CreateFence(Ord);
-    return nullptr;
-  }
-
-  if (isa<LoadInst>(Inst) && isAcquireOrStronger(Ord))
-    return Builder.CreateFence(AtomicOrdering::Acquire);
-  if (Subtarget.enableSeqCstTrailingFence() && isa<StoreInst>(Inst) &&
-      Ord == AtomicOrdering::SequentiallyConsistent)
-    return Builder.CreateFence(AtomicOrdering::SequentiallyConsistent);
-  return nullptr;
-}
-
 TargetLowering::AtomicExpansionKind
 RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
   // atomicrmw {fadd,fsub} must be expanded to use compare-exchange, as floating
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 815b9be47f56026..52ace789a702c07 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -613,14 +613,6 @@ class RISCVTargetLowering : public TargetLowering {
 
   bool preferZeroCompareBranch() const override { return true; }
 
-  bool shouldInsertFencesForAtomic(const Instruction *I) const override {
-    return isa<LoadInst>(I) || isa<StoreInst>(I);
-  }
-  Instruction *emitLeadingFence(IRBuilderBase &Builder, Instruction *Inst,
-                                AtomicOrdering Ord) const override;
-  Instruction *emitTrailingFence(IRBuilderBase &Builder, Instruction *Inst,
-                                 AtomicOrdering Ord) const override;
-
   bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
                                   EVT VT) const override;
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 816ceaf95607e71..e56a91026702e31 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1270,11 +1270,45 @@ unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
   if (MI.isMetaInstruction())
     return 0;
 
+  const MachineFunction &MF = *MI.getParent()->getParent();
+  const auto &ST = MF.getSubtarget<RISCVSubtarget>();
+
   unsigned Opcode = MI.getOpcode();
+  switch (Opcode) {
+  default:
+    break;
+  case RISCV::PseudoAtomicLB:
+  case RISCV::PseudoAtomicLH:
+  case RISCV::PseudoAtomicLW:
+  case RISCV::PseudoAtomicLD: {
+    auto Ordering = static_cast<AtomicOrdering>(MI.getOperand(3).getImm());
+    switch (Ordering) {
+    default:
+      return 4;
+    case AtomicOrdering::Acquire:
+      return 8;
+    case AtomicOrdering::SequentiallyConsistent:
+      return ST.hasStdExtZtso() ? 8 : 12;
+    }
+  }
+  case RISCV::PseudoAtomicSB:
+  case RISCV::PseudoAtomicSH:
+  case RISCV::PseudoAtomicSW:
+  case RISCV::PseudoAtomicSD: {
+    auto Ordering = static_cast<AtomicOrdering>(MI.getOperand(3).getImm());
+    switch (Ordering) {
+    default:
+      return 4;
+    case AtomicOrdering::Release:
+      return 8;
+    case AtomicOrdering::SequentiallyConsistent:
+      return ST.hasStdExtZtso() ? 8 : ST.enableSeqCstTrailingFence() ? 12 : 8;
+    }
+  }
+  }
 
   if (Opcode == TargetOpcode::INLINEASM ||
       Opcode == TargetOpcode::INLINEASM_BR) {
-    const MachineFunction &MF = *MI.getParent()->getParent();
     const auto &TM = static_cast<const RISCVTargetMachine &>(MF.getTarget());
     return getInlineAsmLength(MI.getOperand(0).getSymbolName(),
                               *TM.getMCAsmInfo());
@@ -1282,8 +1316,6 @@ unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
 
   if (!MI.memoperands_empty()) {
     MachineMemOperand *MMO = *(MI.memoperands_begin());
-    const MachineFunction &MF = *MI.getParent()->getParent();
-    const auto &ST = MF.getSubtarget<RISCVSubtarget>();
     if (ST.hasStdExtZihintntl() && MMO->isNonTemporal()) {
       if (ST.hasStdExtCOrZca() && ST.enableRVCHintInstrs()) {
         if (isCompressibleInst(MI, STI))
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoA.td b/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
index c43af14bb7f7005..5d806de621f61be 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
@@ -110,21 +110,95 @@ defm AMOCAS_Q : AMO_rr_aq_rl<0b00101, 0b100, "amocas.q">;
 //===----------------------------------------------------------------------===//
 
 // Atomic load/store are available under both +a and +force-atomics.
-// Fences will be inserted for atomic load/stores according to the logic in
-// RISCVTargetLowering::{emitLeadingFence,emitTrailingFence}.
+
+// Pseudo atomic load instructions.
+class PseudoAtomicLoad
+    : Pseudo<(outs GPR:$rd), (ins GPRMem:$rs1, simm12:$imm12, ixlenimm:$ordering), []> {
+  let hasSideEffects = 1;
+  let mayLoad = 1;
+  let mayStore = 0;
+}
+
+// Pseudo atomic store instructions.
+class PseudoAtomicStore
+    : Pseudo<(outs), (ins GPR:$rs2, GPRMem:$rs1, simm12:$imm12, ixlenimm:$ordering), []> {
+  let hasSideEffects = 1;
+  let mayLoad = 0;
+  let mayStore = 1;
+}
+
+let IsAtomic = 1 in {
+class AtomicLoadPatFrag<PatFrags base> : PatFrag<(ops node:$ptr), (base node:$ptr)>;
+class AtomicStorePatFrag<PatFrags base> : PatFrag<(ops node:$val, node:$ptr),
+                                                 (base node:$val, node:$ptr)>;
+}
+
+// An atomic load operation that doesn't actually need to be atomic.
+let IsAtomicOrderingAcquireOrStronger = 0 in
+class relaxed_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic load operation that needs acquire semantics.
+let IsAtomicOrderingAcquire = 1 in
+class acquire_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic load operation that needs sequential consistency semantics.
+let IsAtomicOrderingSequentiallyConsistent = 1 in
+class seq_cst_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic store operation that doesn't actually need to be atomic.
+let IsAtomicOrderingReleaseOrStronger = 0 in
+class relaxed_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+// An atomic store operation that needs release semantics.
+let IsAtomicOrderingRelease = 1 in
+class release_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+// An atomic store operation that needs sequential consistency semantics.
+let IsAtomicOrderingSequentiallyConsistent = 1 in
+class seq_cst_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+multiclass AtomicLdPat<PatFrag LoadOp, RVInst Inst> {
+  def : Pat<(XLenVT (relaxed_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+            (Inst GPR:$rs1, simm12:$imm12, 2)>;
+  def : Pat<(XLenVT (acquire_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+            (Inst GPR:$rs1, simm12:$imm12, 4)>;
+  def : Pat<(XLenVT (seq_cst_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+            (Inst GPR:$rs1, simm12:$imm12, 7)>;
+}
+
+multiclass AtomicStPat<PatFrag StoreOp, RVInst Inst> {
+  def : Pat<(relaxed_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+            (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 2)>;
+  def : Pat<(release_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+            (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 5)>;
+  def : Pat<(seq_cst_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+            (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 7)>;
+}
+
 let Predicates = [HasAtomicLdSt] in {
-  def : LdPat<atomic_load_8,  LB>;
-  def : LdPat<atomic_load_16, LH>;
-  def : LdPat<atomic_load_32, LW>;
+  def PseudoAtomicLB : PseudoAtomicLoad;
+  def PseudoAtomicLH : PseudoAtomicLoad;
+  def PseudoAtomicLW : PseudoAtomicLoad;
+
+  def PseudoAtomicSB : PseudoAtomicStore;
+  def PseudoAtomicSH : PseudoAtomicStore;
+  def PseudoAtomicSW : PseudoAtomicStore;
 
-  def : StPat<atomic_store_8,  SB, GPR, XLenVT>;
-  def : StPat<atomic_store_16, SH, GPR, XLenVT>;
-  def : StPat<atomic_store_32, SW, GPR, XLenVT>;
+  defm : AtomicLdPat<atomic_load_8,  PseudoAtomicLB>;
+  defm : AtomicLdPat<atomic_load_16, PseudoAtomicLH>;
+  defm : AtomicLdPat<atomic_load_32, PseudoAtomicLW>;
+
+  defm : AtomicStPat<atomic_store_8,  PseudoAtomicSB>;
+  defm : AtomicStPat<atomic_store_16, PseudoAtomicSH>;
+  defm : AtomicStPat<atomic_store_32, PseudoAtomicSW>;
 }
 
 let Predicates = [HasAtomicLdSt, IsRV64] in {
-  def : LdPat<atomic_load_64, LD, i64>;
-  def : StPat<atomic_store_64, SD, GPR, i64>;
+  def PseudoAtomicLD : PseudoAtomicLoad;
+  def PseudoAtomicSD : PseudoAtomicStore;
+
+  defm : AtomicLdPat<atomic_load_64, PseudoAtomicLD>;
+  defm : AtomicStPat<atomic_store_64, PseudoAtomicSD>;
 }
 
 /// AMOs

@github-actions
Copy link

github-actions bot commented Sep 22, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@nikic nikic removed their request for review September 25, 2023 19:11
@wangpc-pp
Copy link
Contributor Author

If we want to make fence and load/store schedulable separately, we can just expand them before RA in RISCVPreRAExpandPseudo.

@wangpc-pp
Copy link
Contributor Author

Ping.
Any feedbacks are welcome!

@wangpc-pp
Copy link
Contributor Author

Ping.

@asb
Copy link
Contributor

asb commented Oct 20, 2023

First of all, sorry that this fell of the review queue.

Some thoughts:

  • The diff vs what we have today isn't massive, so I don't have a strong aversion to this. Though particularly on the tablegen side, there is a bit more target-specific complexity I think.
  • You're right that in general atomic operations are expanded later on, though I wouldn't say it necessarily follows that its better to do all in the same way. Alternatively, you could view the current setup as the RISC-V backend using the target-independent hooks wherever possible for atomics codegen and falling back to late stage expansion via a target-specific codepath only when necessary.
  • The docs on atomics in LLVM are I think actually pretty good https://llvm.org/docs/Atomics.html and I tried to expand them to properly cover what we do on RISC-V, so that should hopefully help anyone diving it. It's hard for me to judge how much the docs make sense for someone who's never messed with atomics lowering in LLVM though!
  • If we go in this direction, please add a doc comment to record the motivation for not using the emitLeadingFence/emitTrailingFence hooks.
  • Your second reason does make sense to me, but it would be good to get a second opinion - e.g .@preames @topperc

@asb
Copy link
Contributor

asb commented Oct 20, 2023

Oh, and was #69685 just an opportune moment to re-advertise this patch - or did you see that work interacting with this change in some way? if so, could you elaborate please? Many thanks.

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 20, 2023

Do other architectures use this approach? Having custom pseudo-based lowering for something SDAG can represent directly just fine is pretty sad (and adds more work for us downstream, too), it's a bunch of code that doesn't really need to exist. I would much rather see the generic lowering be adapted to support the optimisations you want (if they are in fact important) rather than making things clunkier like this. I won't block the patch, but I will heavily encourage this change to not be applied if at all possible.

@wangpc-pp
Copy link
Contributor Author

wangpc-pp commented Oct 23, 2023

Oh, and was #69685 just an opportune moment to re-advertise this patch - or did you see that work interacting with this change in some way? if so, could you elaborate please? Many thanks.

Zalasr extension provide instructions for atomic load/store (for example, load acquire and store release). If we are going to support CodeGen of this extension, then we shouldn't expand atomic load/store before ISel, or the patterns may not be matched because we may insert some LLVM IRs between fence instruction and load/store (and we may not be able to describe the patterns in TableGen). For AArch64, this kind of atomic load/store instructions will be selected in the way of this PR (actually, this PR is inspired by AArch64 target :-)), except that they are selected to actual instructions instead of pseudos.

Fused atomic load/store are just like Zalasr extension, except the addressing mode (Zalasr supports zero offset address only, while fused atomic load/store can be like normal load/store instructions). The fused version of atomic load/store is just a workaround since RISCV is lack of such instructions, while Zalasr has fill the hole.

Before this PR, the lowering path of atomic load/store is:

atomic load/store (LLVM IR) -> *Expand Atomic instructions Pass* -> monotonic load/store with leading/trailing fence (LLVM IR) -> SDAG -> *ISel* -> normal load/store instructions with leading/trailing fence

After this PR, it will be like:

atomic load/store (LLVM IR) -> SDAG -> *ISel* -> pseudo atomic load/store -> *RISC-V atomic pseudo instruction expansion pass* -> normal load/store instructions with leading/trailing fence

If Zalasr is enabled, then we can expand pseudo atomic load/store to instructions in Zalasr instead of normal load/store instructions with leading/trailing fence.

I don't know if the above statement has made my intention of this PR clearer, let me know if there are any questions. :-)
CC @jrtc27

@topperc
Copy link
Collaborator

topperc commented Oct 23, 2023

Oh, and was #69685 just an opportune moment to re-advertise this patch - or did you see that work interacting with this change in some way? if so, could you elaborate please? Many thanks.

Zalasr extension provide instructions for atomic load/store (for example, load acquire and store release). If we are going to support CodeGen of this extension, then we shouldn't expand atomic load/store before ISel, or the patterns may not be matched because we may insert some LLVM IRs between fence instruction and load/store (and we may not be able to describe the patterns in TableGen). For AArch64, this kind of atomic load/store instructions will be selected in the way of this PR (actually, this PR is inspired by AArch64 target :-)), except that they are selected to actual instructions instead of pseudos.

Fused atomic load/store are just like Zalasr extension, except the addressing mode (Zalasr supports zero offset address only, while fused atomic load/store can be like normal load/store instructions). The fused version of atomic load/store is just a workaround since RISCV is lack of such instructions, while Zalasr has fill the hole.

Before this PR, the lowering path of atomic load/store is:

atomic load/store (LLVM IR) -> *Expand Atomic instructions Pass* -> monotonic load/store with leading/trailing fence (LLVM IR) -> SDAG -> *ISel* -> normal load/store instructions with leading/trailing fence

After this PR, it will be like:

atomic load/store (LLVM IR) -> SDAG -> *ISel* -> pseudo atomic load/store -> *RISC-V atomic pseudo instruction expansion pass* -> normal load/store instructions with leading/trailing fence

If Zalasr is enabled, then we can expand pseudo atomic load/store to instructions in Zalasr instead of normal load/store instructions with leading/trailing fence.

I don't know if the above statement has made my intention of this PR clearer, let me know if there are any questions. :-) CC @jrtc27

If we have Zalasr, wouldn't we disable the fence emission rather than trying to pattern match it?

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 23, 2023

Yes, just don't expand them into memory access + fence... you don't need a pseudo for that

@wangpc-pp
Copy link
Contributor Author

wangpc-pp commented Oct 23, 2023

If we have Zalasr, wouldn't we disable the fence emission rather than trying to pattern match it?
Yes, just don't expand them into memory access + fence... you don't need a pseudo for that

Yes, I agree. We can just do the instruction selection like AArch64, and the patterns will be like these in this PR except that the selected instructions are Zalasr instructions instead of pseudos.

The existence of pseudos is to unify the implementation of Zalasr and fused atomic load/store, can this be a strong reason? And there is one more information that MacroFusion way may not be perfect because some fence+load/store are still seperated when postRA scheduler is enabled. After some rough investigation, this may be bacause there is no data dependency between fence and load/store and the dependency analysis is not so smart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants