[RISCV] 'Zalrsc' may permit non-base instructions #165042

slachowsky · 2025-10-24T20:51:42Z

Provide shorter atomic LR/SC sequences with non-base instructions (eg. ''B'' extension instructions) when implementations opt in to FeaturePermissiveZalrsc. Currently this shortens atomicrmw {min,max,umin,umax} pseudo expansions.

There is no functional change for machines when this target feature is not requested.

llvmbot · 2025-10-24T20:52:20Z

@llvm/pr-subscribers-backend-risc-v

Author: None (slachowsky)

Changes

Provide shorter atomic LR/SC sequences with non-base instructions (eg. ''B'' extension instructions) when implementations opt in to FeaturePermissiveZalrsc. Currently this shortens atomicrmw {min,max,umin,umax} pseudo expansions.

There is no functional change for machines when this target feature is not requested.

Patch is 46.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165042.diff

3 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp (+23)
(modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+16)
(added) llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll (+1074)

diff --git a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
index 98b636e8e0e55..9bd66a43717e7 100644
--- a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
@@ -373,6 +373,26 @@ static void doAtomicBinOpExpansion(const RISCVInstrInfo *TII, MachineInstr &MI,
         .addReg(ScratchReg)
         .addImm(-1);
     break;
+  case AtomicRMWInst::Max:
+    BuildMI(LoopMBB, DL, TII->get(RISCV::MAX), ScratchReg)
+        .addReg(DestReg)
+        .addReg(IncrReg);
+    break;
+  case AtomicRMWInst::Min:
+    BuildMI(LoopMBB, DL, TII->get(RISCV::MIN), ScratchReg)
+        .addReg(DestReg)
+        .addReg(IncrReg);
+    break;
+  case AtomicRMWInst::UMax:
+    BuildMI(LoopMBB, DL, TII->get(RISCV::MAXU), ScratchReg)
+        .addReg(DestReg)
+        .addReg(IncrReg);
+    break;
+  case AtomicRMWInst::UMin:
+    BuildMI(LoopMBB, DL, TII->get(RISCV::MINU), ScratchReg)
+        .addReg(DestReg)
+        .addReg(IncrReg);
+    break;
   }
   BuildMI(LoopMBB, DL, TII->get(getSCForRMW(Ordering, Width, STI)), ScratchReg)
       .addReg(ScratchReg)
@@ -682,6 +702,9 @@ bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
     AtomicRMWInst::BinOp BinOp, bool IsMasked, int Width,
     MachineBasicBlock::iterator &NextMBBI) {
+  // Using MIN(U)/MAX(U) is preferrable if permitted
+  if (STI->hasPermissiveZalrsc() && STI->hasStdExtZbb() && !IsMasked)
+    return expandAtomicBinOp(MBB, MBBI, BinOp, IsMasked, Width, NextMBBI);
 
   MachineInstr &MI = *MBBI;
   DebugLoc DL = MI.getDebugLoc();
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 2754d789b9899..3c1e9665d823e 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1906,6 +1906,22 @@ def FeatureForcedAtomics : SubtargetFeature<
 def HasAtomicLdSt
     : Predicate<"Subtarget->hasStdExtZalrsc() || Subtarget->hasForcedAtomics()">;
 
+// The RISCV Unprivileged Architecture defines _constrained_ LR/SC loops:
+//   The dynamic code executed between the LR and SC instructions can only
+//   contain instructions from the base ''I'' instruction set, excluding loads,
+//   stores, backward jumps, taken backward branches, JALR, FENCE, and SYSTEM
+//   instructions. Compressed forms of the aforementioned ''I'' instructions in
+//   the Zca and Zcb extensions are also permitted.
+// LR/SC loops that do not adhere to the above are _unconstrained_ LR/SC loops,
+// and success is implementation specific. For implementations which know that
+// non-base instructions (such as the ''B'' extension) will not violate any
+// forward progress guarantees, using these instructions to reduce the LR/SC
+// sequence length is desirable.
+def FeaturePermissiveZalrsc
+    : SubtargetFeature<
+          "permissive-zalrsc", "HasPermissiveZalrsc", "true",
+          "Implementation permits non-base instructions between LR/SC pairs">;
+
 def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
     "AllowTaggedGlobals",
     "true", "Use an instruction sequence for taking the address of a global "
diff --git a/llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll b/llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll
new file mode 100644
index 0000000000000..9ce987c1add50
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll
@@ -0,0 +1,1074 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv32 -mattr=+zalrsc -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV32I-ZALRSC %s
+; RUN: llc -mtriple=riscv32 -mattr=+b,+zalrsc,+permissive-zalrsc -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV32IB-ZALRSC %s
+; RUN: llc -mtriple=riscv32 -mattr=+a -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV32IA %s
+;
+; RUN: llc -mtriple=riscv64 -mattr=+zalrsc -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV64I-ZALRSC %s
+; RUN: llc -mtriple=riscv64 -mattr=+b,+zalrsc,+permissive-zalrsc -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV64IB-ZALRSC %s
+; RUN: llc -mtriple=riscv64 -mattr=+a -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV64IA %s
+
+define i32 @atomicrmw_max_i32_seq_cst(ptr %a, i32 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_max_i32_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32I-ZALRSC-NEXT:    mv a3, a2
+; RV32I-ZALRSC-NEXT:    bge a3, a1, .LBB0_3
+; RV32I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB0_1 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a3, a1
+; RV32I-ZALRSC-NEXT:  .LBB0_3: # in Loop: Header=BB0_1 Depth=1
+; RV32I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32I-ZALRSC-NEXT:    bnez a3, .LBB0_1
+; RV32I-ZALRSC-NEXT:  # %bb.4:
+; RV32I-ZALRSC-NEXT:    mv a0, a2
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_max_i32_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32IB-ZALRSC-NEXT:    max a3, a2, a1
+; RV32IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IB-ZALRSC-NEXT:    bnez a3, .LBB0_1
+; RV32IB-ZALRSC-NEXT:  # %bb.2:
+; RV32IB-ZALRSC-NEXT:    mv a0, a2
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_max_i32_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    amomax.w.aqrl a0, a1, (a0)
+; RV32IA-NEXT:    ret
+;
+; RV64I-ZALRSC-LABEL: atomicrmw_max_i32_seq_cst:
+; RV64I-ZALRSC:       # %bb.0:
+; RV64I-ZALRSC-NEXT:    sext.w a2, a1
+; RV64I-ZALRSC-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; RV64I-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64I-ZALRSC-NEXT:    mv a3, a1
+; RV64I-ZALRSC-NEXT:    bge a3, a2, .LBB0_3
+; RV64I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB0_1 Depth=1
+; RV64I-ZALRSC-NEXT:    mv a3, a2
+; RV64I-ZALRSC-NEXT:  .LBB0_3: # in Loop: Header=BB0_1 Depth=1
+; RV64I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64I-ZALRSC-NEXT:    bnez a3, .LBB0_1
+; RV64I-ZALRSC-NEXT:  # %bb.4:
+; RV64I-ZALRSC-NEXT:    mv a0, a1
+; RV64I-ZALRSC-NEXT:    ret
+;
+; RV64IB-ZALRSC-LABEL: atomicrmw_max_i32_seq_cst:
+; RV64IB-ZALRSC:       # %bb.0:
+; RV64IB-ZALRSC-NEXT:    sext.w a2, a1
+; RV64IB-ZALRSC-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; RV64IB-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64IB-ZALRSC-NEXT:    max a3, a1, a2
+; RV64IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IB-ZALRSC-NEXT:    bnez a3, .LBB0_1
+; RV64IB-ZALRSC-NEXT:  # %bb.2:
+; RV64IB-ZALRSC-NEXT:    mv a0, a1
+; RV64IB-ZALRSC-NEXT:    ret
+;
+; RV64IA-LABEL: atomicrmw_max_i32_seq_cst:
+; RV64IA:       # %bb.0:
+; RV64IA-NEXT:    amomax.w.aqrl a0, a1, (a0)
+; RV64IA-NEXT:    ret
+  %1 = atomicrmw max ptr %a, i32 %b seq_cst
+  ret i32 %1
+}
+
+define i32 @atomicrmw_min_i32_seq_cst(ptr %a, i32 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_min_i32_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:  .LBB1_1: # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32I-ZALRSC-NEXT:    mv a3, a2
+; RV32I-ZALRSC-NEXT:    bge a1, a3, .LBB1_3
+; RV32I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB1_1 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a3, a1
+; RV32I-ZALRSC-NEXT:  .LBB1_3: # in Loop: Header=BB1_1 Depth=1
+; RV32I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32I-ZALRSC-NEXT:    bnez a3, .LBB1_1
+; RV32I-ZALRSC-NEXT:  # %bb.4:
+; RV32I-ZALRSC-NEXT:    mv a0, a2
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_min_i32_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:  .LBB1_1: # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32IB-ZALRSC-NEXT:    min a3, a2, a1
+; RV32IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IB-ZALRSC-NEXT:    bnez a3, .LBB1_1
+; RV32IB-ZALRSC-NEXT:  # %bb.2:
+; RV32IB-ZALRSC-NEXT:    mv a0, a2
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_min_i32_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    amomin.w.aqrl a0, a1, (a0)
+; RV32IA-NEXT:    ret
+;
+; RV64I-ZALRSC-LABEL: atomicrmw_min_i32_seq_cst:
+; RV64I-ZALRSC:       # %bb.0:
+; RV64I-ZALRSC-NEXT:    sext.w a2, a1
+; RV64I-ZALRSC-NEXT:  .LBB1_1: # =>This Inner Loop Header: Depth=1
+; RV64I-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64I-ZALRSC-NEXT:    mv a3, a1
+; RV64I-ZALRSC-NEXT:    bge a2, a3, .LBB1_3
+; RV64I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB1_1 Depth=1
+; RV64I-ZALRSC-NEXT:    mv a3, a2
+; RV64I-ZALRSC-NEXT:  .LBB1_3: # in Loop: Header=BB1_1 Depth=1
+; RV64I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64I-ZALRSC-NEXT:    bnez a3, .LBB1_1
+; RV64I-ZALRSC-NEXT:  # %bb.4:
+; RV64I-ZALRSC-NEXT:    mv a0, a1
+; RV64I-ZALRSC-NEXT:    ret
+;
+; RV64IB-ZALRSC-LABEL: atomicrmw_min_i32_seq_cst:
+; RV64IB-ZALRSC:       # %bb.0:
+; RV64IB-ZALRSC-NEXT:    sext.w a2, a1
+; RV64IB-ZALRSC-NEXT:  .LBB1_1: # =>This Inner Loop Header: Depth=1
+; RV64IB-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64IB-ZALRSC-NEXT:    min a3, a1, a2
+; RV64IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IB-ZALRSC-NEXT:    bnez a3, .LBB1_1
+; RV64IB-ZALRSC-NEXT:  # %bb.2:
+; RV64IB-ZALRSC-NEXT:    mv a0, a1
+; RV64IB-ZALRSC-NEXT:    ret
+;
+; RV64IA-LABEL: atomicrmw_min_i32_seq_cst:
+; RV64IA:       # %bb.0:
+; RV64IA-NEXT:    amomin.w.aqrl a0, a1, (a0)
+; RV64IA-NEXT:    ret
+  %1 = atomicrmw min ptr %a, i32 %b seq_cst
+  ret i32 %1
+}
+
+define i32 @atomicrmw_umax_i32_seq_cst(ptr %a, i32 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:  .LBB2_1: # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32I-ZALRSC-NEXT:    mv a3, a2
+; RV32I-ZALRSC-NEXT:    bgeu a3, a1, .LBB2_3
+; RV32I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB2_1 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a3, a1
+; RV32I-ZALRSC-NEXT:  .LBB2_3: # in Loop: Header=BB2_1 Depth=1
+; RV32I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32I-ZALRSC-NEXT:    bnez a3, .LBB2_1
+; RV32I-ZALRSC-NEXT:  # %bb.4:
+; RV32I-ZALRSC-NEXT:    mv a0, a2
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:  .LBB2_1: # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32IB-ZALRSC-NEXT:    maxu a3, a2, a1
+; RV32IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IB-ZALRSC-NEXT:    bnez a3, .LBB2_1
+; RV32IB-ZALRSC-NEXT:  # %bb.2:
+; RV32IB-ZALRSC-NEXT:    mv a0, a2
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    amomaxu.w.aqrl a0, a1, (a0)
+; RV32IA-NEXT:    ret
+;
+; RV64I-ZALRSC-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV64I-ZALRSC:       # %bb.0:
+; RV64I-ZALRSC-NEXT:    sext.w a2, a1
+; RV64I-ZALRSC-NEXT:  .LBB2_1: # =>This Inner Loop Header: Depth=1
+; RV64I-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64I-ZALRSC-NEXT:    mv a3, a1
+; RV64I-ZALRSC-NEXT:    bgeu a3, a2, .LBB2_3
+; RV64I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB2_1 Depth=1
+; RV64I-ZALRSC-NEXT:    mv a3, a2
+; RV64I-ZALRSC-NEXT:  .LBB2_3: # in Loop: Header=BB2_1 Depth=1
+; RV64I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64I-ZALRSC-NEXT:    bnez a3, .LBB2_1
+; RV64I-ZALRSC-NEXT:  # %bb.4:
+; RV64I-ZALRSC-NEXT:    mv a0, a1
+; RV64I-ZALRSC-NEXT:    ret
+;
+; RV64IB-ZALRSC-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV64IB-ZALRSC:       # %bb.0:
+; RV64IB-ZALRSC-NEXT:    sext.w a2, a1
+; RV64IB-ZALRSC-NEXT:  .LBB2_1: # =>This Inner Loop Header: Depth=1
+; RV64IB-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64IB-ZALRSC-NEXT:    maxu a3, a1, a2
+; RV64IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IB-ZALRSC-NEXT:    bnez a3, .LBB2_1
+; RV64IB-ZALRSC-NEXT:  # %bb.2:
+; RV64IB-ZALRSC-NEXT:    mv a0, a1
+; RV64IB-ZALRSC-NEXT:    ret
+;
+; RV64IA-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV64IA:       # %bb.0:
+; RV64IA-NEXT:    amomaxu.w.aqrl a0, a1, (a0)
+; RV64IA-NEXT:    ret
+  %1 = atomicrmw umax ptr %a, i32 %b seq_cst
+  ret i32 %1
+}
+
+define i32 @atomicrmw_umin_i32_seq_cst(ptr %a, i32 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:  .LBB3_1: # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32I-ZALRSC-NEXT:    mv a3, a2
+; RV32I-ZALRSC-NEXT:    bgeu a1, a3, .LBB3_3
+; RV32I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB3_1 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a3, a1
+; RV32I-ZALRSC-NEXT:  .LBB3_3: # in Loop: Header=BB3_1 Depth=1
+; RV32I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32I-ZALRSC-NEXT:    bnez a3, .LBB3_1
+; RV32I-ZALRSC-NEXT:  # %bb.4:
+; RV32I-ZALRSC-NEXT:    mv a0, a2
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:  .LBB3_1: # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32IB-ZALRSC-NEXT:    minu a3, a2, a1
+; RV32IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IB-ZALRSC-NEXT:    bnez a3, .LBB3_1
+; RV32IB-ZALRSC-NEXT:  # %bb.2:
+; RV32IB-ZALRSC-NEXT:    mv a0, a2
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    amominu.w.aqrl a0, a1, (a0)
+; RV32IA-NEXT:    ret
+;
+; RV64I-ZALRSC-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV64I-ZALRSC:       # %bb.0:
+; RV64I-ZALRSC-NEXT:    sext.w a2, a1
+; RV64I-ZALRSC-NEXT:  .LBB3_1: # =>This Inner Loop Header: Depth=1
+; RV64I-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64I-ZALRSC-NEXT:    mv a3, a1
+; RV64I-ZALRSC-NEXT:    bgeu a2, a3, .LBB3_3
+; RV64I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB3_1 Depth=1
+; RV64I-ZALRSC-NEXT:    mv a3, a2
+; RV64I-ZALRSC-NEXT:  .LBB3_3: # in Loop: Header=BB3_1 Depth=1
+; RV64I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64I-ZALRSC-NEXT:    bnez a3, .LBB3_1
+; RV64I-ZALRSC-NEXT:  # %bb.4:
+; RV64I-ZALRSC-NEXT:    mv a0, a1
+; RV64I-ZALRSC-NEXT:    ret
+;
+; RV64IB-ZALRSC-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV64IB-ZALRSC:       # %bb.0:
+; RV64IB-ZALRSC-NEXT:    sext.w a2, a1
+; RV64IB-ZALRSC-NEXT:  .LBB3_1: # =>This Inner Loop Header: Depth=1
+; RV64IB-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64IB-ZALRSC-NEXT:    minu a3, a1, a2
+; RV64IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IB-ZALRSC-NEXT:    bnez a3, .LBB3_1
+; RV64IB-ZALRSC-NEXT:  # %bb.2:
+; RV64IB-ZALRSC-NEXT:    mv a0, a1
+; RV64IB-ZALRSC-NEXT:    ret
+;
+; RV64IA-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV64IA:       # %bb.0:
+; RV64IA-NEXT:    amominu.w.aqrl a0, a1, (a0)
+; RV64IA-NEXT:    ret
+  %1 = atomicrmw umin ptr %a, i32 %b seq_cst
+  ret i32 %1
+}
+
+define i64 @atomicrmw_max_i64_seq_cst(ptr %a, i64 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_max_i64_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:    addi sp, sp, -32
+; RV32I-ZALRSC-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32I-ZALRSC-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32I-ZALRSC-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
+; RV32I-ZALRSC-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
+; RV32I-ZALRSC-NEXT:    mv s0, a2
+; RV32I-ZALRSC-NEXT:    mv s1, a0
+; RV32I-ZALRSC-NEXT:    lw a4, 0(a0)
+; RV32I-ZALRSC-NEXT:    lw a5, 4(a0)
+; RV32I-ZALRSC-NEXT:    mv s2, a1
+; RV32I-ZALRSC-NEXT:    j .LBB4_2
+; RV32I-ZALRSC-NEXT:  .LBB4_1: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    sw a4, 8(sp)
+; RV32I-ZALRSC-NEXT:    sw a5, 12(sp)
+; RV32I-ZALRSC-NEXT:    addi a1, sp, 8
+; RV32I-ZALRSC-NEXT:    li a4, 5
+; RV32I-ZALRSC-NEXT:    li a5, 5
+; RV32I-ZALRSC-NEXT:    mv a0, s1
+; RV32I-ZALRSC-NEXT:    call __atomic_compare_exchange_8
+; RV32I-ZALRSC-NEXT:    lw a4, 8(sp)
+; RV32I-ZALRSC-NEXT:    lw a5, 12(sp)
+; RV32I-ZALRSC-NEXT:    bnez a0, .LBB4_7
+; RV32I-ZALRSC-NEXT:  .LBB4_2: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    beq a5, s0, .LBB4_4
+; RV32I-ZALRSC-NEXT:  # %bb.3: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    slt a0, s0, a5
+; RV32I-ZALRSC-NEXT:    j .LBB4_5
+; RV32I-ZALRSC-NEXT:  .LBB4_4: # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    sltu a0, s2, a4
+; RV32I-ZALRSC-NEXT:  .LBB4_5: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a2, a4
+; RV32I-ZALRSC-NEXT:    mv a3, a5
+; RV32I-ZALRSC-NEXT:    bnez a0, .LBB4_1
+; RV32I-ZALRSC-NEXT:  # %bb.6: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a2, s2
+; RV32I-ZALRSC-NEXT:    mv a3, s0
+; RV32I-ZALRSC-NEXT:    j .LBB4_1
+; RV32I-ZALRSC-NEXT:  .LBB4_7: # %atomicrmw.end
+; RV32I-ZALRSC-NEXT:    mv a0, a4
+; RV32I-ZALRSC-NEXT:    mv a1, a5
+; RV32I-ZALRSC-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32I-ZALRSC-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32I-ZALRSC-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
+; RV32I-ZALRSC-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
+; RV32I-ZALRSC-NEXT:    addi sp, sp, 32
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_max_i64_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:    addi sp, sp, -32
+; RV32IB-ZALRSC-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IB-ZALRSC-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IB-ZALRSC-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
+; RV32IB-ZALRSC-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
+; RV32IB-ZALRSC-NEXT:    mv s0, a2
+; RV32IB-ZALRSC-NEXT:    mv s1, a0
+; RV32IB-ZALRSC-NEXT:    lw a4, 0(a0)
+; RV32IB-ZALRSC-NEXT:    lw a5, 4(a0)
+; RV32IB-ZALRSC-NEXT:    mv s2, a1
+; RV32IB-ZALRSC-NEXT:    j .LBB4_2
+; RV32IB-ZALRSC-NEXT:  .LBB4_1: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    sw a4, 8(sp)
+; RV32IB-ZALRSC-NEXT:    sw a5, 12(sp)
+; RV32IB-ZALRSC-NEXT:    addi a1, sp, 8
+; RV32IB-ZALRSC-NEXT:    li a4, 5
+; RV32IB-ZALRSC-NEXT:    li a5, 5
+; RV32IB-ZALRSC-NEXT:    mv a0, s1
+; RV32IB-ZALRSC-NEXT:    call __atomic_compare_exchange_8
+; RV32IB-ZALRSC-NEXT:    lw a4, 8(sp)
+; RV32IB-ZALRSC-NEXT:    lw a5, 12(sp)
+; RV32IB-ZALRSC-NEXT:    bnez a0, .LBB4_7
+; RV32IB-ZALRSC-NEXT:  .LBB4_2: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    beq a5, s0, .LBB4_4
+; RV32IB-ZALRSC-NEXT:  # %bb.3: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    slt a0, s0, a5
+; RV32IB-ZALRSC-NEXT:    j .LBB4_5
+; RV32IB-ZALRSC-NEXT:  .LBB4_4: # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    sltu a0, s2, a4
+; RV32IB-ZALRSC-NEXT:  .LBB4_5: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    mv a2, a4
+; RV32IB-ZALRSC-NEXT:    mv a3, a5
+; RV32IB-ZALRSC-NEXT:    bnez a0, .LBB4_1
+; RV32IB-ZALRSC-NEXT:  # %bb.6: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    mv a2, s2
+; RV32IB-ZALRSC-NEXT:    mv a3, s0
+; RV32IB-ZALRSC-NEXT:    j .LBB4_1
+; RV32IB-ZALRSC-NEXT:  .LBB4_7: # %atomicrmw.end
+; RV32IB-ZALRSC-NEXT:    mv a0, a4
+; RV32IB-ZALRSC-NEXT:    mv a1, a5
+; RV32IB-ZALRSC-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IB-ZALRSC-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IB-ZALRSC-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
+; RV32IB-ZALRSC-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
+; RV32IB-ZALRSC-NEXT:    addi sp, sp, 32
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_max_i64_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    addi sp, sp, -32
+; RV32IA-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IA-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IA-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
+; RV32IA-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
+; RV32IA-NEXT:    mv s0, a2
+; RV32IA-NEXT:    mv s1, a0
+; RV32IA-NEXT:    lw a4, 0(a0)
+; RV32IA-NEXT:    lw a5, 4(a0)
+; RV32IA-NEXT:    mv s2, a1
+; RV32IA-NEXT:    j .LBB4_2
+; RV32IA-NEXT:  .LBB4_1: # %atomicrmw.start
+; RV32IA-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IA-NEXT:    sw a4, 8(sp)
+; RV32IA-NEXT:    sw a5, 12(sp)
+; RV32IA-NEXT:    addi a1, sp, 8
+; RV32IA-NEXT:    li a4, 5
+; RV32IA-NEXT:    li a5, ...
[truncated]

slachowsky · 2025-10-24T21:01:36Z

@topperc @lenary @fpetrogalli @arichardson @wangpc-pp
This follows on from the previous #163672 'Zalrsc' enablement patch.

Provide shorter atomic LR/SC sequences with non-base instructions (eg. ''B'' extension instructions) when implementations opt in to FeaturePermissiveZalrsc. Currently this shortens `atomicrmw {min,max,umin,umax}` pseudo expansions. There is no functional change for machines when this target feature is not requested.

topperc · 2025-10-24T21:27:07Z

Please do not force push unless absolutely necessary. https://llvm.org/docs/GitHub.html#rebasing-pull-requests-and-force-pushes

topperc · 2025-10-24T21:28:16Z

llvm/lib/Target/RISCV/RISCVFeatures.td

 def HasAtomicLdSt
    : Predicate<"Subtarget->hasStdExtZalrsc() || Subtarget->hasForcedAtomics()">;

+// The RISCV Unprivileged Architecture defines _constrained_ LR/SC loops:


RISCV -> RISC-V. It is trademark that we should respect whenever possible.

llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp

llvm/lib/Target/RISCV/RISCVFeatures.td

llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll

llvm/lib/Target/RISCV/RISCVFeatures.td

fpetrogalli

LGTM, thank you!

topperc

LGTM

lenary · 2025-10-27T22:21:29Z

llvm/lib/Target/RISCV/RISCVFeatures.td

+def FeaturePermissiveZalrsc
+    : SubtargetFeature<
+          "permissive-zalrsc", "HasPermissiveZalrsc", "true",
+          "Implementation permits non-base instructions between LR/SC pairs">;


Can I ask about the general trajectory here:

Do we expect all "permissive" cores to support the same set of instructions in zalrsc, or might different cores allow different sets of extensions in their loops?

If we expect different cores to allow different extensions, we should not be using such a generic name - I would prefer something like FeatureZalrscAllowsEXT for different EXTs (in this case, FeatureZalrscAllowsZbb), perhaps?

I would be slightly concerned if we're going to have a massive spread of these features (as for SFB).

Certainly a reasonable ask.

This feature is from the point of view of a minimal RISC-V core with LR/SC, and a global monitor that is external to the core. In such a configuration the global monitor is aware only of the load/store transactions to the memory system, and completely unaware of what instructions or control flow occurred on the CPU(s) (or non-CPU devices) to generate those transactions. Any instruction mix is permissible in this style of system (ignoring higher order concerns of guaranteed forward progress / eventual success), as long as the same memory transactions present to the monitor.

It is necessary to have some FeaturePermissiveZalrsc control to enable 'unconstrained' LR/SC loops, and the proposal here is there are no constraints on what is permissible. The idea is to admit shorter sequences via checks on extant secondary extension feature availability:

if (STI->hasPermissiveZalrsc() && STI->hasVendorExtABC()) // build short vendor ABC instruction sequence else if (STI->hasPermissiveZalrsc() && STI->hasStdExtXYZ()) // build short standard XYZ instruction sequence else // build original constrained sequence with only 'I' instructions

This avoids the explosion in features to cover the cross-products of permitted Zalrsc x {XYZ, ABC, etc}. If a core has no constraints on what is permitted, and it also has an instruction extension that gives a shorter sequence go ahead and use it.

Realistically though there is a tiny vocabulary of atomicrmw <ops>, and the existing pseudo expansions for these are very tightly coded, so there is very limited opportunity for improvement here. Other than Zbb MIN/MAX instructions in this patch, the only other instruction extension that I can think of that has utility is some sort of bit field insertion / bit select instructions that could shorten the xor + and + xor sequence used in the masked atomics.

Great, thanks for explaining.

The comment does not mention the "16 instructions placed sequentially in memory" rule that is also applied on constrained LR/SC sequences. Is it considered to be relaxed here?

LR/SC sequences succeed on Rocket if the SC comes after the LR within a fixed window measured in cycles and there are no intervening memory operations. Importantly, all of the instructions allowed in a constrained LR/SC loop are single-cycle on Rocket; min/max would extremely likely be fine, but div is problematic, mul might be depending on configuration, and floating point may work if there is real hardware support, but definitely won't if it is emulated by privileged software (which is allowed even if F/D are in -march).

lenary

LGTM

slachowsky · 2025-10-28T19:36:56Z

Ready for anyone with approval to merge.

Provide shorter atomic LR/SC sequences with non-base instructions (eg. ''B'' extension instructions) when implementations opt in to FeaturePermissiveZalrsc. Currently this shortens `atomicrmw {min,max,umin,umax}` pseudo expansions. There is no functional change for machines when this target feature is not requested.

llvmbot added the backend:RISC-V label Oct 24, 2025

slachowsky force-pushed the permissive_zalrsc branch from 7a1867b to 07af5e4 Compare October 24, 2025 21:25

topperc reviewed Oct 24, 2025

View reviewed changes

Fix RISC-V reference in comment

21a358e

fpetrogalli reviewed Oct 24, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVFeatures.td Show resolved Hide resolved

slachowsky added 3 commits October 24, 2025 18:12

Update RISCVFeatures.td

84c01c1

Update llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll

c656689

FeaturePermissiveZalrsc does not imply FeatureStdExtZalrsc

603eec4

fpetrogalli self-requested a review October 27, 2025 16:44

fpetrogalli approved these changes Oct 27, 2025

View reviewed changes

topperc approved these changes Oct 27, 2025

View reviewed changes

lenary reviewed Oct 27, 2025

View reviewed changes

lenary approved these changes Oct 28, 2025

View reviewed changes

lenary merged commit e940119 into llvm:main Oct 28, 2025
10 checks passed

[RISCV] 'Zalrsc' may permit non-base instructions #165042

[RISCV] 'Zalrsc' may permit non-base instructions #165042

Conversation

slachowsky commented Oct 24, 2025

Uh oh!

llvmbot commented Oct 24, 2025

Uh oh!

slachowsky commented Oct 24, 2025

Uh oh!

topperc commented Oct 24, 2025

Uh oh!

topperc Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

slachowsky Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fpetrogalli left a comment

Choose a reason for hiding this comment

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

lenary Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

slachowsky Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lenary Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

sorear Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

lenary left a comment

Choose a reason for hiding this comment

Uh oh!

slachowsky commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

slachowsky Oct 28, 2025 •

edited

Loading