Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DSE] Fixes bug caused by skipped read clobbers #83181

Closed
wants to merge 1 commit into from

Conversation

vporpo
Copy link
Contributor

@vporpo vporpo commented Feb 27, 2024

The bug seems to be caused by the use of BatchAA after the IR has been modified.

Considering the reproducer in read-clobber-skipped.ll, the bug is caused by isWriteAtEndOfFunction() returning true instead of false when trying to eliminate @llvm.memset.po0.
The problem is that when it visits %ret = load ptr, ptr %struct.byte.8, isReadClobber() should return true, but isRefSet(BatchAA.getModRefInfo()) actually returns false, so the read clobber is missed. As far as I understand the cause is that a mem user store of llvm.memset has already been erased from the IR, causing BatchAA to return the wrong data.

Replacing BatchAA with AA at this point fixes the issue.

The bug seems to be caused by the use of BatchAA after the IR has been modified.

Considering the reproducer in `read-clobber-skipped.ll`, the bug is caused by
`isWriteAtEndOfFunction()` returning true instead of false when trying to
eliminate `@llvm.memset.po0`.
The problem is that when it visits `%ret = load ptr, ptr %struct.byte.8`,
`isReadClobber()` should return true, but `isRefSet(BatchAA.getModRefInfo())`
actually returns false, so the read clobber is missed.
As far as I understand the cause is that a mem user store of `llvm.memset` has
already been erased from the IR, causing BatchAA to return the wrong data.

Replacing BatchAA with AA at this point fixes the issue.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 27, 2024

@llvm/pr-subscribers-llvm-transforms

Author: vporpo (vporpo)

Changes

The bug seems to be caused by the use of BatchAA after the IR has been modified.

Considering the reproducer in read-clobber-skipped.ll, the bug is caused by isWriteAtEndOfFunction() returning true instead of false when trying to eliminate @<!-- -->llvm.memset.po0.
The problem is that when it visits %ret = load ptr, ptr %struct.byte.8, isReadClobber() should return true, but isRefSet(BatchAA.getModRefInfo()) actually returns false, so the read clobber is missed. As far as I understand the cause is that a mem user store of llvm.memset has already been erased from the IR, causing BatchAA to return the wrong data.

Replacing BatchAA with AA at this point fixes the issue.


Full diff: https://github.com/llvm/llvm-project/pull/83181.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp (+2-1)
  • (added) llvm/test/Transforms/DeadStoreElimination/read-clobber-skipped.ll (+41)
diff --git a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
index d30c68a2f08712..6f9fe67e66ca0a 100644
--- a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
@@ -1274,7 +1274,8 @@ struct DSEState {
       if (CB->onlyAccessesInaccessibleMemory())
         return false;
 
-    return isRefSet(BatchAA.getModRefInfo(UseInst, DefLoc));
+    AAQueryInfo TmpAAQI(AA, &EI);
+    return isRefSet(AA.getModRefInfo(UseInst, DefLoc, TmpAAQI));
   }
 
   /// Returns true if a dependency between \p Current and \p KillingDef is
diff --git a/llvm/test/Transforms/DeadStoreElimination/read-clobber-skipped.ll b/llvm/test/Transforms/DeadStoreElimination/read-clobber-skipped.ll
new file mode 100644
index 00000000000000..fd4c3bf1351df4
--- /dev/null
+++ b/llvm/test/Transforms/DeadStoreElimination/read-clobber-skipped.ll
@@ -0,0 +1,41 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=dse < %s | FileCheck %s
+;
+; DSE kills both `store i32 44, ptr %struct.byte.4, align 4` and
+; `call void @llvm.memset.p0.i64(...)` but the memset should not be killed
+; because it has a clobber read: `%ret = load ptr, ptr %struct.byte.8`
+
+%struct.type = type { ptr, ptr }
+
+define ptr @foo(ptr noundef %ptr) {
+; CHECK-LABEL: define ptr @foo(
+; CHECK-SAME: ptr noundef [[PTR:%.*]]) {
+; CHECK-NEXT:    [[STRUCT_ALLOCA:%.*]] = alloca [[STRUCT_TYPE:%.*]], align 8
+; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 56, ptr nonnull [[STRUCT_ALLOCA]]) #[[ATTR2:[0-9]+]]
+; CHECK-NEXT:    [[STRUCT_BYTE_8:%.*]] = getelementptr inbounds i8, ptr [[STRUCT_ALLOCA]], i64 8
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[STRUCT_BYTE_8]], i64 4
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr noundef nonnull align 4 [[TMP1]], i8 42, i64 4, i1 false)
+; CHECK-NEXT:    store i32 43, ptr [[STRUCT_BYTE_8]], align 4
+; CHECK-NEXT:    [[RET:%.*]] = load ptr, ptr [[STRUCT_BYTE_8]], align 8
+; CHECK-NEXT:    call void @llvm.lifetime.end.p0(i64 56, ptr nonnull [[STRUCT_ALLOCA]]) #[[ATTR2]]
+; CHECK-NEXT:    ret ptr [[RET]]
+;
+  %struct.alloca = alloca %struct.type, align 8
+  call void @llvm.lifetime.start.p0(i64 56, ptr nonnull %struct.alloca) nounwind
+  %struct.byte.8 = getelementptr inbounds i8, ptr %struct.alloca, i64 8
+  ; Set %struct.alloca[8, 16) to 42.
+  call void @llvm.memset.p0.i64(ptr noundef nonnull align 4 %struct.byte.8, i8 42, i64 8, i1 false)
+  ; Set %struct.alloca[8, 12) to 43.
+  store i32 43, ptr %struct.byte.8, align 4
+  ; Set %struct.alloca[4, 8) to 44.
+  %struct.byte.4 = getelementptr inbounds i8, ptr %struct.alloca, i64 4
+  store i32 44, ptr %struct.byte.4, align 4
+  ; Return %struct.alloca[8, 16).
+  %ret = load ptr, ptr %struct.byte.8
+  call void @llvm.lifetime.end.p0(i64 56, ptr nonnull %struct.alloca) nounwind
+  ret ptr %ret
+}
+
+declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #0
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #2
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #2

@vporpo
Copy link
Contributor Author

vporpo commented Feb 27, 2024

I had to revert the pre-commit test because BatchAA.getModRefInfo() had some non-deterministic behavior, and the test was failing on some targets.

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the removed dead store (at %struct.byte.4) that invalidates the batchAA right?

@@ -1274,7 +1274,8 @@ struct DSEState {
if (CB->onlyAccessesInaccessibleMemory())
return false;

return isRefSet(BatchAA.getModRefInfo(UseInst, DefLoc));
AAQueryInfo TmpAAQI(AA, &EI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. AA.getModRefInfo(UseInst, DefLoc) should work.

Copy link
Contributor Author

@vporpo vporpo Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that won't work because it won't use EarliestEscapeInfo and the result will be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to call getModeRefInfo() and using EI for the query, without having to use the temporary AAQueryInfo ?

@vporpo
Copy link
Contributor Author

vporpo commented Feb 27, 2024

It is the removed dead store (at %struct.byte.4) that invalidates the batchAA right?

Yes.

fhahn added a commit to fhahn/llvm-project that referenced this pull request Feb 29, 2024
DSE uses BatchAA, which caches queries using pairs of MemoryLocations.
At the moment, DSE may remove instructions that are used as pointers in
cached MemoryLocations. If a new instruction used by a new MemoryLoation
and this instruction gets allocated at the same address as a previosuly
cached and then removed instruction, we may access an incorrect entry in
the cache.

To avoid this delay removing all instructions except MemoryDefs until
the end of DSE. This should avoid removing any values used in BatchAA's
cache.

Test case by @vporpo from
llvm#83181.
(Test not precommitted because the results are non-determinstic - memset
only sometimes gets removed)
@fhahn fhahn requested a review from nikic February 29, 2024 11:26
@fhahn
Copy link
Contributor

fhahn commented Feb 29, 2024

DSE

It is the removed dead store (at %struct.byte.4) that invalidates the batchAA right?

Yes.

This is an interesting case, thanks for sharing! AFAIK BatchAA caches pairs of memory locations; removing the store itself shouldn't invalidate anything in the cache.

I noticed that the memset doesn't get removed incorrectly deterministically; sometimes it gets removed, sometimes it does not, so I took a closer look at whats' going on.

When removing a store, we also remove any instructions that become dead after removal (reversibly), so we may remove an instruction that was used as memory location pointer in the cache. In most cases, this isn't a problem either, as the instruction has been removed and we are left with stale entries in the cache, which shouldn't be accessed though.

What seems to be happening in the case at hand is that we create a new GEP for the adjusted memset and in some cases it gets allocated at the same address as a previous GEP that we removed, but is still in the cache.

I experimented a bit to confirm this was indeed the issue and it looks like we should be able to keep using BatchAA, if we avoid deleting instructions that may be cached before the end of DSE, which I sketched here #83411

@vporpo
Copy link
Contributor Author

vporpo commented Feb 29, 2024

I experimented a bit to confirm this was indeed the issue and it looks like we should be able to keep using BatchAA, if we avoid deleting instructions that may be cached before the end of DSE, which I sketched here #83411

Thanks for looking into it @fhahn and for figuring out how to fix it without resorting to using AA instead of BatchAA.

@vporpo
Copy link
Contributor Author

vporpo commented Mar 1, 2024

Closing, the bug will be fixed by this patch: #83411

@vporpo vporpo closed this Mar 1, 2024
fhahn added a commit that referenced this pull request Mar 2, 2024
DSE uses BatchAA, which caches queries using pairs of MemoryLocations.
At the moment, DSE may remove instructions that are used as pointers in
cached MemoryLocations. If a new instruction used by a new MemoryLoation
and this instruction gets allocated at the same address as a previosuly
cached and then removed instruction, we may access an incorrect entry in
the cache.

To avoid this delay removing all instructions except MemoryDefs until
the end of DSE. This should avoid removing any values used in BatchAA's
cache.

Test case by @vporpo from
#83181.
(Test not precommitted because the results are non-determinstic - memset
only sometimes gets removed)

PR: #83411
fhahn added a commit to apple/llvm-project that referenced this pull request Mar 5, 2024
DSE uses BatchAA, which caches queries using pairs of MemoryLocations.
At the moment, DSE may remove instructions that are used as pointers in
cached MemoryLocations. If a new instruction used by a new MemoryLoation
and this instruction gets allocated at the same address as a previosuly
cached and then removed instruction, we may access an incorrect entry in
the cache.

To avoid this delay removing all instructions except MemoryDefs until
the end of DSE. This should avoid removing any values used in BatchAA's
cache.

Test case by @vporpo from
llvm#83181.
(Test not precommitted because the results are non-determinstic - memset
only sometimes gets removed)

PR: llvm#83411
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Mar 6, 2024
DSE uses BatchAA, which caches queries using pairs of MemoryLocations.
At the moment, DSE may remove instructions that are used as pointers in
cached MemoryLocations. If a new instruction used by a new MemoryLoation
and this instruction gets allocated at the same address as a previosuly
cached and then removed instruction, we may access an incorrect entry in
the cache.

To avoid this delay removing all instructions except MemoryDefs until
the end of DSE. This should avoid removing any values used in BatchAA's
cache.

Test case by @vporpo from
llvm#83181.
(Test not precommitted because the results are non-determinstic - memset
only sometimes gets removed)

PR: llvm#83411
(cherry picked from commit 10f5e98)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Mar 11, 2024
DSE uses BatchAA, which caches queries using pairs of MemoryLocations.
At the moment, DSE may remove instructions that are used as pointers in
cached MemoryLocations. If a new instruction used by a new MemoryLoation
and this instruction gets allocated at the same address as a previosuly
cached and then removed instruction, we may access an incorrect entry in
the cache.

To avoid this delay removing all instructions except MemoryDefs until
the end of DSE. This should avoid removing any values used in BatchAA's
cache.

Test case by @vporpo from
llvm#83181.
(Test not precommitted because the results are non-determinstic - memset
only sometimes gets removed)

PR: llvm#83411
(cherry picked from commit 10f5e98)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants