Skip to content

Conversation

@4vtomat
Copy link
Member

@4vtomat 4vtomat commented Nov 22, 2025

Atomic(except for unordered) loads/stores and volatile loads/stores are
considered as ordered instructions, we only needs to ensure ordered
instructions are not reordered instead of just disable them from
ZilsdOptimizer.

Atomic(except for unordered) loads/stores and volatile loads/stores are
considered as ordered instructions, we only needs to ensure ordered
instructions are not reordered instead of just disable them from
ZilsdOptimizer.
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Brandon Wu (4vtomat)

Changes

Atomic(except for unordered) loads/stores and volatile loads/stores are
considered as ordered instructions, we only needs to ensure ordered
instructions are not reordered instead of just disable them from
ZilsdOptimizer.


Full diff: https://github.com/llvm/llvm-project/pull/169169.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVZilsdOptimizer.cpp (+15-6)
  • (modified) llvm/test/CodeGen/RISCV/zilsd-ldst-opt-prera.mir (+204-5)
diff --git a/llvm/lib/Target/RISCV/RISCVZilsdOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVZilsdOptimizer.cpp
index 99e83fbb05a73..cb0a66c23b90b 100644
--- a/llvm/lib/Target/RISCV/RISCVZilsdOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVZilsdOptimizer.cpp
@@ -275,6 +275,16 @@ bool RISCVPreAllocZilsdOpt::isSafeToMove(MachineInstr *MI, MachineInstr *Target,
       LLVM_DEBUG(dbgs() << "Memory operation interference detected\n");
       return false;
     }
+
+    // Don't move across instructions if they are guaranteed to be ordered, e.g.
+    // volatile and ordered atomic
+    if (MI->hasOrderedMemoryRef() && It->hasOrderedMemoryRef()) {
+      LLVM_DEBUG(
+          dbgs()
+          << "Cannot move across instruction that is guaranteed to be ordered: "
+          << *It);
+      return false;
+    }
   }
 
   return true;
@@ -334,6 +344,10 @@ bool RISCVPreAllocZilsdOpt::rescheduleOps(
         Distance > MaxRescheduleDistance)
       continue;
 
+    // If MI0 comes later, it's not able fold if the memory order matters.
+    if (!MI1IsLater && MI0->hasOrderedMemoryRef() && MI1->hasOrderedMemoryRef())
+      continue;
+
     // Move the instruction to the target position
     MachineBasicBlock::iterator InsertPos = TargetInstr->getIterator();
     ++InsertPos;
@@ -400,15 +414,10 @@ bool RISCVPreAllocZilsdOpt::isMemoryOp(const MachineInstr &MI) {
     return false;
 
   // When no memory operands are present, conservatively assume unaligned,
-  // volatile, unfoldable.
+  // unfoldable.
   if (!MI.hasOneMemOperand())
     return false;
 
-  const MachineMemOperand *MMO = *MI.memoperands_begin();
-
-  if (MMO->isVolatile() || MMO->isAtomic())
-    return false;
-
   // sw <undef> could probably be eliminated entirely, but for now we just want
   // to avoid making a mess of it.
   if (MI.getOperand(0).isReg() && MI.getOperand(0).isUndef())
diff --git a/llvm/test/CodeGen/RISCV/zilsd-ldst-opt-prera.mir b/llvm/test/CodeGen/RISCV/zilsd-ldst-opt-prera.mir
index dab394d4bc8c4..95d0e322faa4d 100644
--- a/llvm/test/CodeGen/RISCV/zilsd-ldst-opt-prera.mir
+++ b/llvm/test/CodeGen/RISCV/zilsd-ldst-opt-prera.mir
@@ -182,6 +182,26 @@
     ret i32 %5
   }
 
+  define void @invalid_volatile_loads() {
+    ret void
+  }
+
+  define void @atomic_loads() {
+    ret void
+  }
+
+  define void @invalid_atomic_loads() {
+    ret void
+  }
+
+  define void @interleave_atomic_volatile_loads() {
+    ret void
+  }
+
+  define void @interleave_atomic_volatile_loads2() {
+    ret void
+  }
+
   define i32 @store_dependency(ptr %0, i32 %1) {
     %3 = load i32, ptr %0, align 4
     %4 = getelementptr inbounds i32, ptr %0, i32 1
@@ -895,7 +915,7 @@ body: |
 
 ...
 ---
-# Test with volatile loads - should not combine
+# Test with valid volatile loads
 name: volatile_loads
 alignment: 4
 tracksRegLiveness: true
@@ -918,17 +938,196 @@ body: |
     ; CHECK-4BYTE: liveins: $x10
     ; CHECK-4BYTE-NEXT: {{  $}}
     ; CHECK-4BYTE-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
-    ; CHECK-4BYTE-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (volatile load (s32))
-    ; CHECK-4BYTE-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (volatile load (s32))
-    ; CHECK-4BYTE-NEXT: [[ADD:%[0-9]+]]:gpr = ADD [[LW]], [[LW1]]
+    ; CHECK-4BYTE-NEXT: [[PseudoLD_RV32_OPT:%[0-9]+]]:gpr, [[PseudoLD_RV32_OPT1:%[0-9]+]]:gpr = PseudoLD_RV32_OPT [[COPY]], 0 :: (volatile load (s32))
+    ; CHECK-4BYTE-NEXT: [[ADD:%[0-9]+]]:gpr = ADD [[PseudoLD_RV32_OPT]], [[PseudoLD_RV32_OPT1]]
     ; CHECK-4BYTE-NEXT: PseudoRET
     %0:gpr = COPY $x10
-    ; Volatile loads should not be combined
     %1:gpr = LW %0, 0 :: (volatile load (s32))
     %2:gpr = LW %0, 4 :: (volatile load (s32))
     %3:gpr = ADD %1, %2
     PseudoRET
 
+...
+---
+# Test with invalid volatile loads
+name: invalid_volatile_loads
+alignment: 4
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x10', virtual-reg: '%0' }
+body: |
+  bb.0:
+    liveins: $x10
+
+    ; CHECK-LABEL: name: invalid_volatile_loads
+    ; CHECK: liveins: $x10
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (volatile load (s32))
+    ; CHECK-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 8 :: (volatile load (s32))
+    ; CHECK-NEXT: [[LW2:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (volatile load (s32))
+    ; CHECK-NEXT: [[LW3:%[0-9]+]]:gpr = LW [[COPY]], 12 :: (volatile load (s32))
+    ; CHECK-NEXT: PseudoRET
+    ;
+    ; CHECK-4BYTE-LABEL: name: invalid_volatile_loads
+    ; CHECK-4BYTE: liveins: $x10
+    ; CHECK-4BYTE-NEXT: {{  $}}
+    ; CHECK-4BYTE-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-4BYTE-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (volatile load (s32))
+    ; CHECK-4BYTE-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 8 :: (volatile load (s32))
+    ; CHECK-4BYTE-NEXT: [[LW2:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (volatile load (s32))
+    ; CHECK-4BYTE-NEXT: [[LW3:%[0-9]+]]:gpr = LW [[COPY]], 12 :: (volatile load (s32))
+    ; CHECK-4BYTE-NEXT: PseudoRET
+    %0:gpr = COPY $x10
+    %1:gpr = LW %0, 0 :: (volatile load (s32))
+    %2:gpr = LW %0, 8 :: (volatile load (s32))
+    %3:gpr = LW %0, 4 :: (volatile load (s32))
+    %4:gpr = LW %0, 12 :: (volatile load (s32))
+    PseudoRET
+
+...
+---
+# Test with valid atomic loads
+name: atomic_loads
+alignment: 4
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x10', virtual-reg: '%0' }
+body: |
+  bb.0:
+    liveins: $x10
+
+    ; CHECK-LABEL: name: atomic_loads
+    ; CHECK: liveins: $x10
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (load monotonic (s32))
+    ; CHECK-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (load monotonic (s32))
+    ; CHECK-NEXT: PseudoRET
+    ;
+    ; CHECK-4BYTE-LABEL: name: atomic_loads
+    ; CHECK-4BYTE: liveins: $x10
+    ; CHECK-4BYTE-NEXT: {{  $}}
+    ; CHECK-4BYTE-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-4BYTE-NEXT: [[PseudoLD_RV32_OPT:%[0-9]+]]:gpr, [[PseudoLD_RV32_OPT1:%[0-9]+]]:gpr = PseudoLD_RV32_OPT [[COPY]], 0 :: (load monotonic (s32))
+    ; CHECK-4BYTE-NEXT: PseudoRET
+    %0:gpr = COPY $x10
+    %1:gpr = LW %0, 0 :: (load monotonic (s32))
+    %2:gpr = LW %0, 4 :: (load monotonic (s32))
+    PseudoRET
+
+...
+---
+# Test with invalid atomic loads
+name: invalid_atomic_loads
+alignment: 4
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x10', virtual-reg: '%0' }
+body: |
+  bb.0:
+    liveins: $x10
+
+    ; CHECK-LABEL: name: invalid_atomic_loads
+    ; CHECK: liveins: $x10
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (load monotonic (s32))
+    ; CHECK-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 8 :: (load monotonic (s32))
+    ; CHECK-NEXT: [[LW2:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (load monotonic (s32))
+    ; CHECK-NEXT: [[LW3:%[0-9]+]]:gpr = LW [[COPY]], 12 :: (load monotonic (s32))
+    ; CHECK-NEXT: PseudoRET
+    ;
+    ; CHECK-4BYTE-LABEL: name: invalid_atomic_loads
+    ; CHECK-4BYTE: liveins: $x10
+    ; CHECK-4BYTE-NEXT: {{  $}}
+    ; CHECK-4BYTE-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-4BYTE-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (load monotonic (s32))
+    ; CHECK-4BYTE-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 8 :: (load monotonic (s32))
+    ; CHECK-4BYTE-NEXT: [[LW2:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (load monotonic (s32))
+    ; CHECK-4BYTE-NEXT: [[LW3:%[0-9]+]]:gpr = LW [[COPY]], 12 :: (load monotonic (s32))
+    ; CHECK-4BYTE-NEXT: PseudoRET
+    %0:gpr = COPY $x10
+    %1:gpr = LW %0, 0 :: (load monotonic (s32))
+    %2:gpr = LW %0, 8 :: (load monotonic (s32))
+    %3:gpr = LW %0, 4 :: (load monotonic (s32))
+    %4:gpr = LW %0, 12 :: (load monotonic (s32))
+    PseudoRET
+
+...
+---
+# Test with interleaving atomic loads and volatile loads
+name: interleave_atomic_volatile_loads
+alignment: 4
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x10', virtual-reg: '%0' }
+body: |
+  bb.0:
+    liveins: $x10
+
+    ; CHECK-LABEL: name: interleave_atomic_volatile_loads
+    ; CHECK: liveins: $x10
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (load monotonic (s32))
+    ; CHECK-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 8 :: (volatile load (s32))
+    ; CHECK-NEXT: [[LW2:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (load monotonic (s32))
+    ; CHECK-NEXT: [[LW3:%[0-9]+]]:gpr = LW [[COPY]], 12 :: (volatile load (s32))
+    ; CHECK-NEXT: PseudoRET
+    ;
+    ; CHECK-4BYTE-LABEL: name: interleave_atomic_volatile_loads
+    ; CHECK-4BYTE: liveins: $x10
+    ; CHECK-4BYTE-NEXT: {{  $}}
+    ; CHECK-4BYTE-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-4BYTE-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (load monotonic (s32))
+    ; CHECK-4BYTE-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 8 :: (volatile load (s32))
+    ; CHECK-4BYTE-NEXT: [[LW2:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (load monotonic (s32))
+    ; CHECK-4BYTE-NEXT: [[LW3:%[0-9]+]]:gpr = LW [[COPY]], 12 :: (volatile load (s32))
+    ; CHECK-4BYTE-NEXT: PseudoRET
+    %0:gpr = COPY $x10
+    %1:gpr = LW %0, 0 :: (load monotonic (s32))
+    %2:gpr = LW %0, 8 :: (volatile load (s32))
+    %3:gpr = LW %0, 4 :: (load monotonic (s32))
+    %4:gpr = LW %0, 12 :: (volatile load (s32))
+    PseudoRET
+
+...
+---
+# Test with interleaving atomic loads and volatile loads
+name: interleave_atomic_volatile_loads2
+alignment: 4
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x10', virtual-reg: '%0' }
+body: |
+  bb.0:
+    liveins: $x10
+
+    ; CHECK-LABEL: name: interleave_atomic_volatile_loads2
+    ; CHECK: liveins: $x10
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-NEXT: [[LW:%[0-9]+]]:gpr = LW [[COPY]], 0 :: (load unordered (s32))
+    ; CHECK-NEXT: [[LW1:%[0-9]+]]:gpr = LW [[COPY]], 8 :: (volatile load (s32))
+    ; CHECK-NEXT: [[LW2:%[0-9]+]]:gpr = LW [[COPY]], 4 :: (load unordered (s32))
+    ; CHECK-NEXT: [[LW3:%[0-9]+]]:gpr = LW [[COPY]], 12 :: (volatile load (s32))
+    ; CHECK-NEXT: PseudoRET
+    ;
+    ; CHECK-4BYTE-LABEL: name: interleave_atomic_volatile_loads2
+    ; CHECK-4BYTE: liveins: $x10
+    ; CHECK-4BYTE-NEXT: {{  $}}
+    ; CHECK-4BYTE-NEXT: [[COPY:%[0-9]+]]:gpr = COPY $x10
+    ; CHECK-4BYTE-NEXT: [[PseudoLD_RV32_OPT:%[0-9]+]]:gpr, [[PseudoLD_RV32_OPT1:%[0-9]+]]:gpr = PseudoLD_RV32_OPT [[COPY]], 0 :: (load unordered (s32))
+    ; CHECK-4BYTE-NEXT: [[PseudoLD_RV32_OPT2:%[0-9]+]]:gpr, [[PseudoLD_RV32_OPT3:%[0-9]+]]:gpr = PseudoLD_RV32_OPT [[COPY]], 8 :: (volatile load (s32))
+    ; CHECK-4BYTE-NEXT: PseudoRET
+    %0:gpr = COPY $x10
+    %1:gpr = LW %0, 0 :: (load unordered (s32))
+    %2:gpr = LW %0, 8 :: (volatile load (s32))
+    %3:gpr = LW %0, 4 :: (load unordered (s32))
+    %4:gpr = LW %0, 12 :: (volatile load (s32))
+    PseudoRET
+
 ...
 ---
 # Test store dependency - store modifies same location as load

@4vtomat 4vtomat requested review from lenary and topperc November 22, 2025 16:41
@topperc
Copy link
Collaborator

topperc commented Nov 22, 2025

It's not legal to change the size of an atomic/volatile operation. I think I need to leave the SelectionDAG handling for them.

@4vtomat
Copy link
Member Author

4vtomat commented Nov 22, 2025

It's not legal to change the size of an atomic/volatile operation. I think I need to leave the SelectionDAG handling for them.

Do you mean it's not gonna compile if we have i64 load volatile but not zilsd on rv32?

@topperc
Copy link
Collaborator

topperc commented Nov 22, 2025

It's not legal to change the size of an atomic/volatile operation. I think I need to leave the SelectionDAG handling for them.

Do you mean it's not gonna compile if we have i64 load volatile but not zilsd on rv32?

I think an i64 atomic load will be converted to a libcall before SelectionDAG. volatile is weirder. Without Zilsd, i64 gets split because we have no other choice. With Zilsd we should either always split to match the non-Zilsd behavior if someone depends on that. Or we have to always used Zilsd ld/sd. If we sometimes use Zilsd and sometimes not, then users won't know what they can rely on.

@4vtomat 4vtomat closed this Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants