Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MemCpyOpt] Require writable object during call slot optimization #71542

Merged
merged 1 commit into from
Nov 9, 2023

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Nov 7, 2023

Call slot optimization may introduce writes to the destination object that occur earlier than in the original function. We currently already check that that the destination is dereferenceable and aligned, but we do not make sure that it is writable. As such, we might introduce a write to read-only memory, or introduce a data race.

Fix this by checking that the object is writable. For arguments, this is indicated by the new writable attribute. Tests using sret/dereferenceable are updated to use it.

Call slot optimization may introduce writes to the destination
object that occur earlier than in the original function. We
currently already check that that the destination is dereferenceable
and aligned, but we do not make sure that it is writable. As such,
we might introduce a write to read-only memory, or introduce a
data race.

Fix this by checking that the object is writable. For arguments,
this is indicated by the new writable attribute. Tests using
sret/dereferenceable are updated to use it.
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 7, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

Call slot optimization may introduce writes to the destination object that occur earlier than in the original function. We currently already check that that the destination is dereferenceable and aligned, but we do not make sure that it is writable. As such, we might introduce a write to read-only memory, or introduce a data race.

Fix this by checking that the object is writable. For arguments, this is indicated by the new writable attribute. Tests using sret/dereferenceable are updated to use it.


Full diff: https://github.com/llvm/llvm-project/pull/71542.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp (+6-4)
  • (modified) llvm/test/Transforms/MemCpyOpt/callslot.ll (+15-2)
  • (modified) llvm/test/Transforms/MemCpyOpt/callslot_deref.ll (+2-2)
  • (modified) llvm/test/Transforms/MemCpyOpt/callslot_noalias.ll (+2-2)
  • (modified) llvm/test/Transforms/MemCpyOpt/loadstore-sret.ll (+1-1)
  • (modified) llvm/test/Transforms/MemCpyOpt/memcpy.ll (+1-1)
  • (modified) llvm/test/Transforms/MemCpyOpt/sret.ll (+1-1)
diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 39ad3d87779b526..7789458364caeed 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -925,10 +925,12 @@ bool MemCpyOptPass::performCallSlotOptzn(Instruction *cpyLoad,
       return false;
   }
 
-  // Check that accessing the first srcSize bytes of dest will not cause a
-  // trap.  Otherwise the transform is invalid since it might cause a trap
-  // to occur earlier than it otherwise would.
-  if (!isDereferenceableAndAlignedPointer(cpyDest, Align(1), APInt(64, cpySize),
+  // Check that storing to the first srcSize bytes of dest will not cause a
+  // trap or data race.
+  bool ExplicitlyDereferenceableOnly;
+  if (!isWritableObject(getUnderlyingObject(cpyDest),
+                        ExplicitlyDereferenceableOnly) ||
+      !isDereferenceableAndAlignedPointer(cpyDest, Align(1), APInt(64, cpySize),
                                           DL, C, AC, DT)) {
     LLVM_DEBUG(dbgs() << "Call Slot: Dest pointer not dereferenceable\n");
     return false;
diff --git a/llvm/test/Transforms/MemCpyOpt/callslot.ll b/llvm/test/Transforms/MemCpyOpt/callslot.ll
index 8c769319236d654..877b73316f9018e 100644
--- a/llvm/test/Transforms/MemCpyOpt/callslot.ll
+++ b/llvm/test/Transforms/MemCpyOpt/callslot.ll
@@ -69,7 +69,7 @@ define void @write_src_between_call_and_memcpy() {
   ret void
 }
 
-define void @throw_between_call_and_mempy(ptr dereferenceable(16) %dest.i8) {
+define void @throw_between_call_and_mempy(ptr writable dereferenceable(16) %dest.i8) {
 ; CHECK-LABEL: @throw_between_call_and_mempy(
 ; CHECK-NEXT:    [[SRC:%.*]] = alloca [16 x i8], align 1
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[SRC]], i8 0, i64 16, i1 false)
@@ -209,7 +209,7 @@ nocaptures:
   ret void
 }
 
-define void @source_alignment(ptr noalias dereferenceable(128) %dst) {
+define void @source_alignment(ptr noalias writable dereferenceable(128) %dst) {
 ; CHECK-LABEL: @source_alignment(
 ; CHECK-NEXT:    [[SRC:%.*]] = alloca [128 x i8], align 4
 ; CHECK-NEXT:    call void @accept_ptr(ptr nocapture [[DST:%.*]]) #[[ATTR3]]
@@ -221,6 +221,19 @@ define void @source_alignment(ptr noalias dereferenceable(128) %dst) {
   ret void
 }
 
+define void @dest_not_writable(ptr noalias dereferenceable(128) %dst) {
+; CHECK-LABEL: @dest_not_writable(
+; CHECK-NEXT:    [[SRC:%.*]] = alloca [128 x i8], align 4
+; CHECK-NEXT:    call void @accept_ptr(ptr nocapture [[SRC]]) #[[ATTR3]]
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DST:%.*]], ptr [[SRC]], i64 128, i1 false)
+; CHECK-NEXT:    ret void
+;
+  %src = alloca [128 x i8], align 4
+  call void @accept_ptr(ptr nocapture %src) nounwind
+  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dst, ptr %src, i64 128, i1 false)
+  ret void
+}
+
 declare void @may_throw()
 declare void @accept_ptr(ptr)
 declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
diff --git a/llvm/test/Transforms/MemCpyOpt/callslot_deref.ll b/llvm/test/Transforms/MemCpyOpt/callslot_deref.ll
index e66f702a54326a5..dcbede105a3b13c 100644
--- a/llvm/test/Transforms/MemCpyOpt/callslot_deref.ll
+++ b/llvm/test/Transforms/MemCpyOpt/callslot_deref.ll
@@ -6,7 +6,7 @@ declare void @llvm.memcpy.p0.p0.i64(ptr nocapture, ptr nocapture readonly, i64,
 declare void @llvm.memset.p0.i64(ptr nocapture, i8, i64, i1) nounwind
 
 ; all bytes of %dst that are touch by the memset are dereferenceable
-define void @must_remove_memcpy(ptr noalias nocapture dereferenceable(4096) %dst) nofree nosync {
+define void @must_remove_memcpy(ptr noalias nocapture writable dereferenceable(4096) %dst) nofree nosync {
 ; CHECK-LABEL: @must_remove_memcpy(
 ; CHECK-NEXT:    [[SRC:%.*]] = alloca [4096 x i8], align 1
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[DST:%.*]], i8 0, i64 4096, i1 false)
@@ -20,7 +20,7 @@ define void @must_remove_memcpy(ptr noalias nocapture dereferenceable(4096) %dst
 
 ; memset touch more bytes than those guaranteed to be dereferenceable
 ; We can't remove the memcpy, but we can turn it into an independent memset.
-define void @must_not_remove_memcpy(ptr noalias nocapture dereferenceable(1024) %dst) nofree nosync {
+define void @must_not_remove_memcpy(ptr noalias nocapture writable dereferenceable(1024) %dst) nofree nosync {
 ; CHECK-LABEL: @must_not_remove_memcpy(
 ; CHECK-NEXT:    [[SRC:%.*]] = alloca [4096 x i8], align 1
 ; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr [[SRC]], i8 0, i64 4096, i1 false)
diff --git a/llvm/test/Transforms/MemCpyOpt/callslot_noalias.ll b/llvm/test/Transforms/MemCpyOpt/callslot_noalias.ll
index 483ce2f6b6cf8af..9b51b6cec3b45fb 100644
--- a/llvm/test/Transforms/MemCpyOpt/callslot_noalias.ll
+++ b/llvm/test/Transforms/MemCpyOpt/callslot_noalias.ll
@@ -5,10 +5,10 @@ declare void @func(ptr %dst)
 
 ; The noalias metadata from the call, the load and the store should be merged,
 ; so that no metadata is left on the call.
-define i8 @test(ptr dereferenceable(1) noalias %dst) {
+define i8 @test(ptr writable dereferenceable(1) noalias %dst) {
 ; CHECK-LABEL: @test(
 ; CHECK-NEXT:    [[TMP:%.*]] = alloca i8, align 1
-; CHECK-NEXT:    call void @func(ptr nocapture [[DST:%.*]]) #[[ATTR0:[0-9]+]]{{$}}
+; CHECK-NEXT:    call void @func(ptr nocapture [[DST:%.*]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    [[V2:%.*]] = load i8, ptr [[DST]], align 1, !alias.scope !0
 ; CHECK-NEXT:    ret i8 [[V2]]
 ;
diff --git a/llvm/test/Transforms/MemCpyOpt/loadstore-sret.ll b/llvm/test/Transforms/MemCpyOpt/loadstore-sret.ll
index 461205e7bbcf479..20f570f2cb5311b 100644
--- a/llvm/test/Transforms/MemCpyOpt/loadstore-sret.ll
+++ b/llvm/test/Transforms/MemCpyOpt/loadstore-sret.ll
@@ -7,7 +7,7 @@ target triple = "x86_64-apple-darwin10.0.0"
 
 %"class.std::auto_ptr" = type { ptr }
 
-define void @_Z3foov(ptr noalias nocapture sret(%"class.std::auto_ptr") %agg.result) ssp {
+define void @_Z3foov(ptr noalias nocapture writable sret(%"class.std::auto_ptr") %agg.result) ssp {
 ; CHECK-LABEL: @_Z3foov(
 ; CHECK-NEXT:  _ZNSt8auto_ptrIiED1Ev.exit:
 ; CHECK-NEXT:    [[TEMP_LVALUE:%.*]] = alloca %"class.std::auto_ptr", align 8
diff --git a/llvm/test/Transforms/MemCpyOpt/memcpy.ll b/llvm/test/Transforms/MemCpyOpt/memcpy.ll
index 413d72a8e611558..cad2170b271e29d 100644
--- a/llvm/test/Transforms/MemCpyOpt/memcpy.ll
+++ b/llvm/test/Transforms/MemCpyOpt/memcpy.ll
@@ -142,7 +142,7 @@ define void @test6_memcpy(ptr %src, ptr %dest) nounwind {
 
 @x = external global %0
 
-define void @test3(ptr noalias sret(%0) %agg.result) nounwind  {
+define void @test3(ptr noalias writable sret(%0) %agg.result) nounwind  {
 ; CHECK-LABEL: @test3(
 ; CHECK-NEXT:    [[X_0:%.*]] = alloca [[TMP0:%.*]], align 16
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i32(ptr align 16 [[AGG_RESULT:%.*]], ptr align 16 @x, i32 32, i1 false)
diff --git a/llvm/test/Transforms/MemCpyOpt/sret.ll b/llvm/test/Transforms/MemCpyOpt/sret.ll
index efb591242c2f135..1d0f0934ec2da2d 100644
--- a/llvm/test/Transforms/MemCpyOpt/sret.ll
+++ b/llvm/test/Transforms/MemCpyOpt/sret.ll
@@ -6,7 +6,7 @@ target triple = "i686-apple-darwin9"
 
 %0 = type { x86_fp80, x86_fp80 }
 
-define void @ccosl(ptr noalias sret(%0) %agg.result, ptr byval(%0) align 8 %z) nounwind {
+define void @ccosl(ptr noalias writable sret(%0) %agg.result, ptr byval(%0) align 8 %z) nounwind {
 ; CHECK-LABEL: @ccosl(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[IZ:%.*]] = alloca [[TMP0:%.*]], align 16

@aeubanks
Copy link
Contributor

aeubanks commented Nov 7, 2023

does alive model these sorts of issues?

@nikic
Copy link
Contributor Author

nikic commented Nov 7, 2023

does alive model these sorts of issues?

I'm not really sure. I believe it doesn't reason about data races and doesn't support noalias, but maybe it has read-only memory? I think this would be a variant of a test case that doesn't rely on noalias: https://alive2.llvm.org/ce/z/ZDcarT And it doesn't verify ... but I'm not sure whether it fails to verify for the right reasons. I never know how to interpret these "Function triggered UB" results. cc @nunoplopes

Copy link
Contributor

@khei4 khei4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me!

@nunoplopes
Copy link
Member

does alive model these sorts of issues?

I'm not really sure. I believe it doesn't reason about data races and doesn't support noalias, but maybe it has read-only memory? I think this would be a variant of a test case that doesn't rely on noalias: https://alive2.llvm.org/ce/z/ZDcarT And it doesn't verify ... but I'm not sure whether it fails to verify for the right reasons. I never know how to interpret these "Function triggered UB" results. cc @nunoplopes

Alive2 doesn't support writable yet; I've put that on the todo list.
Besides that, it doesn't model data races. It happily accepts introduction of stores if the memory is dereferenceable. noalias.

That test case should verify AFACIT, it's just that the refinement of function arguments is too strict. I'll open a bug report.

@nikic nikic merged commit 369c9b7 into llvm:main Nov 9, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants