Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MachinePipeliner] Fix MachineMemOperand to be cleared when update fails #80841

Closed
wants to merge 1 commit into from

Conversation

ytmukai
Copy link
Contributor

@ytmukai ytmukai commented Feb 6, 2024

When a MachineMemOperand is failed to be updated, its offset was set to 0 and its size to unknown. However, some codes do not expect such values, so it is modified to be dropped.

When a MachineMemOperand is failed to be updated, its offset was set
to 0 and its size to unknown. However, some codes do not expect such
values, so it is modified to be dropped.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 6, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Yuta Mukai (ytmukai)

Changes

When a MachineMemOperand is failed to be updated, its offset was set to 0 and its size to unknown. However, some codes do not expect such values, so it is modified to be dropped.


Full diff: https://github.com/llvm/llvm-project/pull/80841.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/ModuloSchedule.cpp (+2-2)
  • (added) llvm/test/CodeGen/AArch64/sms-fix-memoperand.mir (+126)
diff --git a/llvm/lib/CodeGen/ModuloSchedule.cpp b/llvm/lib/CodeGen/ModuloSchedule.cpp
index 0bef513342ff1..6c24f51157bf5 100644
--- a/llvm/lib/CodeGen/ModuloSchedule.cpp
+++ b/llvm/lib/CodeGen/ModuloSchedule.cpp
@@ -979,8 +979,8 @@ void ModuloScheduleExpander::updateMemOperands(MachineInstr &NewMI,
       NewMMOs.push_back(
           MF.getMachineMemOperand(MMO, AdjOffset, MMO->getSize()));
     } else {
-      NewMMOs.push_back(
-          MF.getMachineMemOperand(MMO, 0, MemoryLocation::UnknownSize));
+      NewMMOs.clear();
+      break;
     }
   }
   NewMI.setMemRefs(MF, NewMMOs);
diff --git a/llvm/test/CodeGen/AArch64/sms-fix-memoperand.mir b/llvm/test/CodeGen/AArch64/sms-fix-memoperand.mir
new file mode 100644
index 0000000000000..9ccf495286417
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sms-fix-memoperand.mir
@@ -0,0 +1,126 @@
+# RUN: llc --verify-machineinstrs -mtriple=aarch64 -o - %s -start-before=pipeliner -aarch64-enable-pipeliner
+# REQUIRES: asserts
+
+# Check that assertions do not fail
+
+--- |
+  define dso_local void @f(ptr noalias nocapture noundef writeonly %a, ptr nocapture noundef readonly %b, i32 noundef %n) local_unnamed_addr #0 {
+  entry:
+    %cmp6 = icmp sgt i32 %n, 0
+    br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup
+  
+  for.body.preheader:                               ; preds = %entry
+    %wide.trip.count = zext nneg i32 %n to i64
+    %0 = icmp eq i32 %n, 1
+    br i1 %0, label %for.cond.cleanup.loopexit.unr-lcssa, label %for.body.preheader.new
+  
+  for.body.preheader.new:                           ; preds = %for.body.preheader
+    %unroll_iter = and i64 %wide.trip.count, 2147483646
+    br label %for.body
+  
+  for.cond.cleanup.loopexit.unr-lcssa:              ; preds = %for.body, %for.body.preheader
+    %indvars.iv.unr = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next.1, %for.body ]
+    %1 = and i64 %wide.trip.count, 1
+    %lcmp.mod.not = icmp eq i64 %1, 0
+    br i1 %lcmp.mod.not, label %for.cond.cleanup, label %for.body.epil
+  
+  for.body.epil:                                    ; preds = %for.cond.cleanup.loopexit.unr-lcssa
+    %arrayidx.epil = getelementptr inbounds double, ptr %b, i64 %indvars.iv.unr
+    %2 = load double, ptr %arrayidx.epil, align 8
+    %mul.epil = fmul double %2, 2.000000e+00
+    %arrayidx2.epil = getelementptr inbounds double, ptr %a, i64 %indvars.iv.unr
+    store double %mul.epil, ptr %arrayidx2.epil, align 8
+    br label %for.cond.cleanup
+  
+  for.cond.cleanup:                                 ; preds = %for.body.epil, %for.cond.cleanup.loopexit.unr-lcssa, %entry
+    ret void
+  
+  for.body:                                         ; preds = %for.body, %for.body.preheader.new
+    %indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [ %indvars.iv.next.1, %for.body ]
+    %3 = shl i64 %indvars.iv, 3
+    %scevgep13 = getelementptr i8, ptr %b, i64 %3
+    %4 = load double, ptr %scevgep13, align 8
+    %mul = fmul double %4, 2.000000e+00
+    %5 = shl i64 %indvars.iv, 3
+    %scevgep12 = getelementptr i8, ptr %a, i64 %5
+    store double %mul, ptr %scevgep12, align 8
+    %6 = shl i64 %indvars.iv, 3
+    %scevgep = getelementptr i8, ptr %b, i64 %6
+    %scevgep9 = getelementptr i8, ptr %scevgep, i64 8
+    %7 = load double, ptr %scevgep9, align 8
+    %mul.1 = fmul double %7, 2.000000e+00
+    %scevgep10 = getelementptr i8, ptr %a, i64 %6
+    %scevgep11 = getelementptr i8, ptr %scevgep10, i64 8
+    store double %mul.1, ptr %scevgep11, align 8
+    %indvars.iv.next.1 = add nuw i64 %indvars.iv, 2
+    %niter.ncmp.1 = icmp eq i64 %unroll_iter, %indvars.iv.next.1
+    br i1 %niter.ncmp.1, label %for.cond.cleanup.loopexit.unr-lcssa, label %for.body
+  }
+
+...
+---
+name:            f
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x0', virtual-reg: '%5' }
+  - { reg: '$x1', virtual-reg: '%6' }
+  - { reg: '$w2', virtual-reg: '%7' }
+body:             |
+  bb.0.entry:
+    liveins: $x0, $x1, $w2
+  
+    %7:gpr32common = COPY $w2
+    %6:gpr64common = COPY $x1
+    %5:gpr64common = COPY $x0
+    dead $wzr = SUBSWri %7, 1, 0, implicit-def $nzcv
+    Bcc 11, %bb.5, implicit $nzcv
+    B %bb.1
+  
+  bb.1.for.body.preheader:
+    %10:gpr32 = ORRWrs $wzr, %7, 0
+    %0:gpr64 = SUBREG_TO_REG 0, killed %10, %subreg.sub_32
+    dead $wzr = SUBSWri %7, 1, 0, implicit-def $nzcv
+    Bcc 1, %bb.2, implicit $nzcv
+  
+  bb.7:
+    %11:gpr64all = COPY $xzr
+    %9:gpr64all = COPY %11
+    B %bb.3
+  
+  bb.2.for.body.preheader.new:
+    %14:gpr64common = ANDXri %0, 8157
+    %15:gpr64all = COPY $xzr
+    %13:gpr64all = COPY %15
+    B %bb.6
+  
+  bb.3.for.cond.cleanup.loopexit.unr-lcssa:
+    %2:gpr64 = PHI %9, %bb.7, %4, %bb.6
+    %25:gpr32 = COPY %0.sub_32
+    TBZW killed %25, 0, %bb.5
+    B %bb.4
+  
+  bb.4.for.body.epil:
+    %26:gpr64 = UBFMXri %2, 61, 60
+    %27:fpr64 = LDRDroX %6, %26, 0, 0 :: (load (s64) from %ir.arrayidx.epil)
+    %28:fpr64 = nofpexcept FADDDrr %27, %27, implicit $fpcr
+    STRDroX killed %28, %5, %26, 0, 0 :: (store (s64) into %ir.arrayidx2.epil)
+  
+  bb.5.for.cond.cleanup:
+    RET_ReallyLR
+  
+  bb.6.for.body:
+    %3:gpr64common = PHI %13, %bb.2, %4, %bb.6
+    %16:gpr64 = UBFMXri %3, 61, 60
+    %17:gpr64common = ADDXrr %6, %16
+    %18:fpr64 = LDRDui %17, 0 :: (load (s64) from %ir.scevgep13)
+    %19:fpr64 = nofpexcept FADDDrr %18, %18, implicit $fpcr
+    %20:gpr64common = ADDXrr %5, %16
+    %21:fpr64 = LDRDui %17, 1 :: (load (s64) from %ir.scevgep9)
+    %22:fpr64 = nofpexcept FADDDrr %21, %21, implicit $fpcr
+    %23:gpr64common = nuw ADDXri %3, 2, 0
+    %4:gpr64all = COPY %23
+    dead $xzr = SUBSXrr %14, %23, implicit-def $nzcv
+    Bcc 0, %bb.3, implicit $nzcv
+    B %bb.6
+
+...

@ytmukai
Copy link
Contributor Author

ytmukai commented Feb 6, 2024

A MachineMemOperand with UnknownSize causes the following assertion failure on AArch64 backend.
https://godbolt.org/z/j7qs6x4Pv

The value is used to initialize Align, which is the immediate cause of the problem.

uint64_t TypeAlignment = MemOp ? Align(MemOp->getSize()).value() : -1;

assert(llvm::isPowerOf2_64(Value) && "Alignment is not a power of 2");

However, this modification might be avoided because dropping MachineMemOperand will also result in the loss of alias information.

@davemgreen
Copy link
Collaborator

I'm not sure if the pipeliner is correct or not, but would it be possible to fix this in the load/store optimizer? It looks like it should be checking that the size is not unknown (or not converting to an Align and back?)

@ytmukai
Copy link
Contributor Author

ytmukai commented Feb 7, 2024

@davemgreen Thanks for your comments, I've reconsidered the load/store code should be modified.
It guarantees that the address of ldp/stp is aligned to twice the element size by MachineMemOperand if it needs to be. However, it assumes that the alignment is guaranteed if MachineMemOperand does not exist or has an invalid value, which appears to be a problem.
I will propose a modification that would determine it more conservatively.

// Fetch the memoperand of the load/store that is a candidate for
// combination.
MachineMemOperand *MemOp =
MI.memoperands_empty() ? nullptr : MI.memoperands().front();
// Get the needed alignments to check them if
// ldp-aligned-only/stp-aligned-only features are opted.
uint64_t MemAlignment = MemOp ? MemOp->getAlign().value() : -1;
uint64_t TypeAlignment = MemOp ? Align(MemOp->getSize()).value() : -1;
// If a load arrives and ldp-aligned-only feature is opted, check that the
// alignment of the source pointer is at least double the alignment of the
// type.
if (MI.mayLoad() && Subtarget->hasLdpAlignedOnly() && MemOp &&
MemAlignment < 2 * TypeAlignment)
return false;

@ytmukai
Copy link
Contributor Author

ytmukai commented Mar 5, 2024

However, it assumes that the alignment is guaranteed if MachineMemOperand does not exist or has an invalid value, which appears to be a problem.
I will propose a modification that would determine it more conservatively.

I created this patch and posted a new PR #83948.

@ytmukai ytmukai closed this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants