Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MSSA] Don't check non-AA memory effects when using template access #76142

Closed
wants to merge 1 commit into from

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Dec 21, 2023

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, breaking MSSAU assumptions.

There are principally two ways to solve this. The first is to remove the assumption that a MemoryAccess will be created from MSSAU. I initially pursued this approach, but it is not as simple as it sounds. The core problem is that if there is another later access in the block, we'll have to find a new defining access for it. This is easy if there is another MemoryDef before the MemoryDef we're no longer creating, but becomes complicated if it was the first access in the block. In that case we might have to insert a MemoryPhi and perform the necessary rewrites, which sounds like all too much hassle for this edge-case.

The alternative implemented here is to make sure that the template access is always used if it is specified -- even if we have obtained better information in the meantime. This was already the case wrt AA queries, but the separate non-AA check was performed unconditionally beforehand. Move this check into the non-template branch only.

Sometimes, we create a MemoryAccess for an instruction, which is
later simplified (e.g. via devirtualization) such that the new
instruction has no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then
MSSA will not create a MemoryAccess for the new instruction,
breaking MSSAU assumptions.

There are principally two ways to solve this. The first is to
remove the assumption that a MemoryAccess will be created from
MSSAU. I initially pursued this approach, but it is not as simple
as it sounds. The core problem is that if there is another later
access in the block, we'll have to find a new defining access for
it. This is easy if there is another MemoryDef before the MemoryDef
we're no longer creating, but becomes complicated if it was the
first access in the block. In that case we might have to insert
a MemoryPhi and perform the necessary rewrites, which sounds like
all too much hassle for this edge-case.

The alternative implemented here is to make sure that the template
access is always used if it is specified -- even if we have
obtained better information in the meantime. This was already the
case wrt AA queries, but the separate non-AA check was performed
unconditionally beforehand. Move this check into the non-template
branch only.
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 21, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Nikita Popov (nikic)

Changes

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, breaking MSSAU assumptions.

There are principally two ways to solve this. The first is to remove the assumption that a MemoryAccess will be created from MSSAU. I initially pursued this approach, but it is not as simple as it sounds. The core problem is that if there is another later access in the block, we'll have to find a new defining access for it. This is easy if there is another MemoryDef before the MemoryDef we're no longer creating, but becomes complicated if it was the first access in the block. In that case we might have to insert a MemoryPhi and perform the necessary rewrites, which sounds like all too much hassle for this edge-case.

The alternative implemented here is to make sure that the template access is always used if it is specified -- even if we have obtained better information in the meantime. This was already the case wrt AA queries, but the separate non-AA check was performed unconditionally beforehand. Move this check into the non-template branch only.


Full diff: https://github.com/llvm/llvm-project/pull/76142.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/MemorySSA.cpp (+6-6)
  • (added) llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll (+117)
diff --git a/llvm/lib/Analysis/MemorySSA.cpp b/llvm/lib/Analysis/MemorySSA.cpp
index 2cf92ceba01032..ce4942ea1477af 100644
--- a/llvm/lib/Analysis/MemorySSA.cpp
+++ b/llvm/lib/Analysis/MemorySSA.cpp
@@ -1732,12 +1732,6 @@ MemoryUseOrDef *MemorySSA::createNewAccess(Instruction *I,
     }
   }
 
-  // Using a nonstandard AA pipelines might leave us with unexpected modref
-  // results for I, so add a check to not model instructions that may not read
-  // from or write to memory. This is necessary for correctness.
-  if (!I->mayReadFromMemory() && !I->mayWriteToMemory())
-    return nullptr;
-
   bool Def, Use;
   if (Template) {
     Def = isa<MemoryDef>(Template);
@@ -1757,6 +1751,12 @@ MemoryUseOrDef *MemorySSA::createNewAccess(Instruction *I,
     }
 #endif
   } else {
+    // Using a nonstandard AA pipelines might leave us with unexpected modref
+    // results for I, so add a check to not model instructions that may not read
+    // from or write to memory. This is necessary for correctness.
+    if (!I->mayReadFromMemory() && !I->mayWriteToMemory())
+      return nullptr;
+
     // Find out what affect this instruction has on memory.
     ModRefInfo ModRef = AAP->getModRefInfo(I, std::nullopt);
     // The isOrdered check is used to ensure that volatiles end up as defs
diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll b/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll
new file mode 100644
index 00000000000000..2aaf777683e116
--- /dev/null
+++ b/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll
@@ -0,0 +1,117 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes="loop-mssa(loop-instsimplify,simple-loop-unswitch<nontrivial>)" < %s | FileCheck %s
+
+@vtable = constant ptr @foo
+
+declare void @foo() memory(none)
+declare void @bar()
+
+; The call becomes known readnone after simplification, but still have a
+; MemoryAccess. Make sure this does not lead to an assertion failure.
+define void @test(i1 %c) {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: i1 [[C:%.*]]) {
+; CHECK-NEXT:    [[C_FR:%.*]] = freeze i1 [[C]]
+; CHECK-NEXT:    br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]]
+; CHECK:       .split.us:
+; CHECK-NEXT:    br label [[LOOP_US:%.*]]
+; CHECK:       loop.us:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    br label [[EXIT_SPLIT_US:%.*]]
+; CHECK:       exit.split.us:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       .split:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    br label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+  br label %loop
+
+loop:
+  %fn = load ptr, ptr @vtable, align 8
+  call void %fn()
+  br i1 %c, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; Variant with another access after the call.
+define void @test2(i1 %c, ptr %p) {
+; CHECK-LABEL: define void @test2(
+; CHECK-SAME: i1 [[C:%.*]], ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[C_FR:%.*]] = freeze i1 [[C]]
+; CHECK-NEXT:    br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]]
+; CHECK:       .split.us:
+; CHECK-NEXT:    br label [[LOOP_US:%.*]]
+; CHECK:       loop.us:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[EXIT_SPLIT_US:%.*]]
+; CHECK:       exit.split.us:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       .split:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+  br label %loop
+
+loop:
+  %fn = load ptr, ptr @vtable, align 8
+  call void %fn()
+  call void @bar()
+  br i1 %c, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; Variant with another access after the call and no access before the call.
+define void @test3(i1 %c, ptr %p) {
+; CHECK-LABEL: define void @test3(
+; CHECK-SAME: i1 [[C:%.*]], ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[C_FR:%.*]] = freeze i1 [[C]]
+; CHECK-NEXT:    br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]]
+; CHECK:       .split.us:
+; CHECK-NEXT:    br label [[LOOP_US:%.*]]
+; CHECK:       loop.us:
+; CHECK-NEXT:    br label [[SPLIT_US:%.*]]
+; CHECK:       split.us:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[EXIT_SPLIT_US:%.*]]
+; CHECK:       exit.split.us:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       .split:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br label [[SPLIT:%.*]]
+; CHECK:       split:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+  br label %loop
+
+loop:
+  %fn = load ptr, ptr @vtable, align 8
+  br label %split
+
+split:
+  call void %fn()
+  call void @bar()
+  br i1 %c, label %exit, label %loop
+
+exit:
+  ret void
+}

Copy link
Contributor

@alinas alinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense as the simple way to address this.
There's also the option to generalize what to do in the case of accesses that are optimized further.
I think that's possible but I may be missing edge cases not covered by current tests. Something like this resolves the testcases in this PR:

diff --git llvm/lib/Analysis/MemorySSAUpdater.cpp llvm/lib/Analysis/MemorySSAUpdater.cpp
index 9ad60f774e9f..0e533cb54f15 100644
--- llvm/lib/Analysis/MemorySSAUpdater.cpp
+++ llvm/lib/Analysis/MemorySSAUpdater.cpp
@@ -578,18 +578,9 @@ static MemoryAccess *getNewDefiningAccessForClone(MemoryAccess *MA,
       if (Instruction *NewDefMUDI =
               cast_or_null<Instruction>(VMap.lookup(DefMUDI))) {
         InsnDefining = MSSA->getMemoryAccess(NewDefMUDI);
-        if (!CloneWasSimplified)
-          assert(InsnDefining && "Defining instruction cannot be nullptr.");
-        else if (!InsnDefining || isa<MemoryUse>(InsnDefining)) {
+        if (!InsnDefining || isa<MemoryUse>(InsnDefining)) {
           // The clone was simplified, it's no longer a MemoryDef, look up.
-          auto DefIt = DefMUD->getDefsIterator();
-          // Since simplified clones only occur in single block cloning, a
-          // previous definition must exist, otherwise NewDefMUDI would not
-          // have been found in VMap.
-          assert(DefIt != MSSA->getBlockDefs(DefMUD->getBlock())->begin() &&
-                 "Previous def must exist");
-          InsnDefining = getNewDefiningAccessForClone(
-              &*(--DefIt), VMap, MPhiMap, CloneWasSimplified, MSSA);
+          InsnDefining = getNewDefiningAccessForClone(DefMUD->getDefiningAccess(), VMap, MPhiMap, CloneWasSimplified, MSSA);
         }
       }
     }
@@ -626,7 +617,7 @@ void MemorySSAUpdater::cloneUsesAndDefs(BasicBlock *BB, BasicBlock *NewBB,
             getNewDefiningAccessForClone(MUD->getDefiningAccess(), VMap,
                                          MPhiMap, CloneWasSimplified, MSSA),
             /*Template=*/CloneWasSimplified ? nullptr : MUD,
-            /*CreationMustSucceed=*/CloneWasSimplified ? false : true);
+            /*CreationMustSucceed=*/false);
         if (NewUseOrDef)
           MSSA->insertIntoListsForBlock(NewUseOrDef, NewBB, MemorySSA::End);
       }

But needs cleanup then of "CloneSimplified" that's now losing its purpose.

If you move fwd with creating accesses for non-memory affecting instructions, a follow up optimization could be to remember those accesses that were still created using a template, then call removeMemoryAccess (leaving optimizePhis to false) on those, after cloning and new phi assignment is complete. This should redirect the defining accesses properly.

@nikic
Copy link
Contributor Author

nikic commented Dec 22, 2023

I think that's possible but I may be missing edge cases not covered by current tests. Something like this resolves the testcases in this PR:

I think this may chose an incorrect defining access in some cases.

Let's say we have the following initial structure:

writes x
writes y
now_readnone (defining: writes x)
reads y (defining: now_readnone)

The important part is that the defining access of now_readnone skips an intermediate MemoryDef for whatever reason (say scoped alias metadata).

Then, I think your new code would set the defining access of clone(reads y) to clone(writes x), even though the correct one would be clone(writes y).

If you move fwd with creating accesses for non-memory affecting instructions, a follow up optimization could be to remember those accesses that were still created using a template, then call removeMemoryAccess (leaving optimizePhis to false) on those, after cloning and new phi assignment is complete. This should redirect the defining accesses properly.

Something we could also do is to already clean up such accesses inside LoopInstSimplify. That is, if we see that a callee has been replaced by a constant, check whether the access can be removed. This has the benefit that we'll simplify the original access as well, not just the cloned access. (Though I don't think we should rely on this happening, as there's just too many ways in that AA results can change.)

@nikic
Copy link
Contributor Author

nikic commented Jan 2, 2024

ping

@alinas
Copy link
Contributor

alinas commented Jan 2, 2024

In your example nowreadnone is a MemoryDef, which doesn't get the defining access updated. It has an optimized field which will be set to writes x but its defining should still be writes y.
The mechanism for deleting accesses relies on this too. If a clone is created for nowreadnone, then deleted later (if checked and found it doesn't access memory), then all it's uses will get as the defining access, its own defining access.

I'm ok with moving fwd with this solution to resolve the asserts. I haven't found a counter example for the generalization yet, but I need more time to check it's safe, it's still possible it's missing cases where it's not correct.

@nikic
Copy link
Contributor Author

nikic commented Jan 2, 2024

In your example nowreadnone is a MemoryDef, which doesn't get the defining access updated. It has an optimized field which will be set to writes x but its defining should still be writes y. The mechanism for deleting accesses relies on this too. If a clone is created for nowreadnone, then deleted later (if checked and found it doesn't access memory), then all it's uses will get as the defining access, its own defining access.

Ah yes, good point.

I'm ok with moving fwd with this solution to resolve the asserts. I haven't found a counter example for the generalization yet, but I need more time to check it's safe, it's still possible it's missing cases where it's not correct.

Hm, could it happen that the new defining access ends up in a different block, for which we haven't cloned accesses yet?

@alinas
Copy link
Contributor

alinas commented Jan 2, 2024

I don't think that's possible. The cloning logic works in RPO, so all clones are completed for defining accesses.

@nikic
Copy link
Contributor Author

nikic commented Jan 3, 2024

I went ahead and put up another PR with your variant: #76819 You convinced me that this approach is correct, and I think it's a bit cleaner overall.

nikic added a commit that referenced this pull request Jan 8, 2024
Sometimes, we create a MemoryAccess for an instruction, which is later
simplified (e.g. via devirtualization) such that the new instruction has
no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then MSSA
will not create a MemoryAccess for the new instruction, triggering an
assert.

Disable the assertion (by passing CreationMustSucceed=false) and adjust
getDefiningAccessForClone() to work correctly in that case.

This PR implements the alternative suggestion by alinas from
#76142.
@nikic
Copy link
Contributor Author

nikic commented Jan 9, 2024

Closing in favor of #76819.

@nikic nikic closed this Jan 9, 2024
nikic added a commit to nikic/llvm-project that referenced this pull request Jan 9, 2024
Sometimes, we create a MemoryAccess for an instruction, which is later
simplified (e.g. via devirtualization) such that the new instruction has
no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then MSSA
will not create a MemoryAccess for the new instruction, triggering an
assert.

Disable the assertion (by passing CreationMustSucceed=false) and adjust
getDefiningAccessForClone() to work correctly in that case.

This PR implements the alternative suggestion by alinas from
llvm#76142.

(cherry picked from commit d02c793)
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Sometimes, we create a MemoryAccess for an instruction, which is later
simplified (e.g. via devirtualization) such that the new instruction has
no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then MSSA
will not create a MemoryAccess for the new instruction, triggering an
assert.

Disable the assertion (by passing CreationMustSucceed=false) and adjust
getDefiningAccessForClone() to work correctly in that case.

This PR implements the alternative suggestion by alinas from
llvm#76142.
MingcongBai pushed a commit to AOSC-Tracking/llvm-project that referenced this pull request Mar 26, 2024
Sometimes, we create a MemoryAccess for an instruction, which is later
simplified (e.g. via devirtualization) such that the new instruction has
no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then MSSA
will not create a MemoryAccess for the new instruction, triggering an
assert.

Disable the assertion (by passing CreationMustSucceed=false) and adjust
getDefiningAccessForClone() to work correctly in that case.

This PR implements the alternative suggestion by alinas from
llvm#76142.

(cherry picked from commit d02c793)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants