Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[msan] Unpoison indirect outputs for userspace using memset for large operands #79924

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Jan 30, 2024

Modify #77393 to clear shadow memory using llvm.memset.* when the size
is large, similar to shouldUseBZeroPlusStoresToInitialize in clang for
-ftrivial-auto-var-init=. The intrinsic, if lowered to libcall, will use the msan interceptor.

The instruction selector lowers a StoreInst to multiple stores, not
utilizing memset. When the size is large (e.g.
store { [100 x i32] } zeroinitializer, ptr %12, align 1), the
generated code will be long (and CodeGenPrepare::optimizeInst will
even crash for a huge size).

// Test stack size
template <class T>
void DoNotOptimize(const T& var) { // deprecated by https://github.com/google/benchmark/pull/1493
  asm volatile("" : "+m"(const_cast<T&>(var)));
}

int main() {
  using LargeArray = std::array<int, 1000000>;
  auto large_stack = []() { DoNotOptimize(LargeArray()); };
  /////// CodeGenPrepare::optimizeInst triggers an assertion failure when creating an integer type with a bit width>2**23
  large_stack();
}

Created using spr 1.3.4
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 30, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

@llvm/pr-subscribers-llvm-transforms

Author: Fangrui Song (MaskRay)

Changes

Modify #77393 to clear shadow memory using llvm.memset.* instead of
StoreInst. poisonAllocaUserspace uses llvm.memset.* as well.

The instruction selector lowers the StoreInst to many stores, not
utilizing memset. When the size is large (e.g.
store { [100 x i32] } zeroinitializer, ptr %12, align 1), the
generated code will be long, and SelectionDAG will even crash if the
object is too large. llvm.memset.* does not have the problem.

// Test stack size
template &lt;class T&gt;
void DoNotOptimize(const T&amp; var) { // deprecated by https://github.com/google/benchmark/pull/1493
  asm volatile("" : "+m"(const_cast&lt;T&amp;&gt;(var)));
}

int main() {
  using LargeArray = std::array&lt;int, 1000000&gt;;
  auto large_stack = []() { DoNotOptimize(LargeArray()); };
  /////// SelectionDAG crash if StoreInst is used
  large_stack();
}

Full diff: https://github.com/llvm/llvm-project/pull/79924.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp (+4-1)
  • (modified) llvm/test/Instrumentation/MemorySanitizer/msan_asm_conservative.ll (+8-8)
diff --git a/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp b/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
index 2b697557d8a92c..0806d7a5b14527 100644
--- a/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
@@ -4559,9 +4559,12 @@ struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
     } else {
       // ElemTy, derived from elementtype(), does not encode the alignment of
       // the pointer. Conservatively assume that the shadow memory is unaligned.
+      // Avoid StoreInst as SizeVal may be large, expanding to many
+      // instructions.
       auto [ShadowPtr, _] =
           getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
-      IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
+      IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
+                       SizeVal, Align(1));
     }
   }
 
diff --git a/llvm/test/Instrumentation/MemorySanitizer/msan_asm_conservative.ll b/llvm/test/Instrumentation/MemorySanitizer/msan_asm_conservative.ll
index 894f76b9b8d32a..86ca697ed9a4c6 100644
--- a/llvm/test/Instrumentation/MemorySanitizer/msan_asm_conservative.ll
+++ b/llvm/test/Instrumentation/MemorySanitizer/msan_asm_conservative.ll
@@ -177,8 +177,8 @@ entry:
 }
 
 ; CHECK-LABEL: @f_2i_2o_mem
-; USER-CONS:  store i32 0, ptr inttoptr (i64 xor (i64 ptrtoint (ptr @id1 to i64), i64 87960930222080) to ptr), align 1
-; USER-CONS:  store i32 0, ptr inttoptr (i64 xor (i64 ptrtoint (ptr @id2 to i64), i64 87960930222080) to ptr), align 1
+; USER-CONS:  call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 xor (i64 ptrtoint (ptr @id1 to i64), i64 87960930222080) to ptr), i8 0, i64 4, i1 false)
+; USER-CONS:  call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 xor (i64 ptrtoint (ptr @id2 to i64), i64 87960930222080) to ptr), i8 0, i64 4, i1 false)
 ; CHECK-CONS: call void @__msan_instrument_asm_store({{.*}}@id1{{.*}}, i64 4)
 ; CHECK-CONS: call void @__msan_instrument_asm_store({{.*}}@id2{{.*}}, i64 4)
 ; CHECK: call void asm "", "=*m,=*m,*m,*m,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) @id1, ptr elementtype(i32) @id2, ptr elementtype(i32) @is1, ptr elementtype(i32) @is2)
@@ -196,7 +196,7 @@ entry:
 
 ; CHECK-LABEL: @f_1i_1o_memreg
 ; CHECK: [[IS1_F7:%.*]] = load i32, ptr @is1, align 4
-; USER-CONS:  store i32 0, ptr inttoptr (i64 xor (i64 ptrtoint (ptr @id1 to i64), i64 87960930222080) to ptr), align 1
+; USER-CONS:  call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 xor (i64 ptrtoint (ptr @id1 to i64), i64 87960930222080) to ptr), i8 0, i64 4, i1 false)
 ; CHECK-CONS: call void @__msan_instrument_asm_store({{.*}}@id1{{.*}}, i64 4)
 ; CHECK: call void @__msan_warning
 ; CHECK: call i32 asm "", "=r,=*m,r,*m,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) @id1, i32 [[IS1_F7]], ptr elementtype(i32) @is1)
@@ -215,7 +215,7 @@ entry:
 }
 
 ; CHECK-LABEL: @f_3o_reg_mem_reg
-; USER-CONS:  store i32 0, ptr inttoptr (i64 xor (i64 ptrtoint (ptr @id2 to i64), i64 87960930222080) to ptr), align 1
+; USER-CONS:  call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 xor (i64 ptrtoint (ptr @id2 to i64), i64 87960930222080) to ptr), i8 0, i64 4, i1 false)
 ; CHECK-CONS: call void @__msan_instrument_asm_store(ptr @id2, i64 4)
 ; CHECK: call { i32, i32 } asm "", "=r,=*m,=r,~{dirflag},~{fpsr},~{flags}"(ptr elementtype(i32) @id2)
 
@@ -240,7 +240,7 @@ entry:
 ; CHECK: [[PAIR1_F9:%.*]] = load {{.*}} @pair1
 ; CHECK: [[C1_F9:%.*]] = load {{.*}} @c1
 ; CHECK: [[MEMCPY_S1_F9:%.*]] = load {{.*}} @memcpy_s1
-; USER-CONS:  store { i32, i32 } zeroinitializer, ptr inttoptr (i64 xor (i64 ptrtoint (ptr @pair2 to i64), i64 87960930222080) to ptr), align 1
+; USER-CONS:  call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 xor (i64 ptrtoint (ptr @pair2 to i64), i64 87960930222080) to ptr), i8 0, i64 8, i1 false)
 ; CHECK-CONS: call void @__msan_instrument_asm_store({{.*}}@pair2{{.*}}, i64 8)
 ; CHECK: call void @__msan_warning
 ; KMSAN: call void @__msan_warning
@@ -257,9 +257,9 @@ entry:
 }
 
 ; CHECK-LABEL: @f_3i_3o_complex_mem
-; USER-CONS:       store { i32, i32 } zeroinitializer, ptr inttoptr (i64 xor (i64 ptrtoint (ptr @pair2 to i64), i64 87960930222080) to ptr), align 1
-; USER-CONS-NEXT:  store i8 0, ptr inttoptr (i64 xor (i64 ptrtoint (ptr @c2 to i64), i64 87960930222080) to ptr), align 1
-; USER-CONS-NEXT:  store i64 0, ptr inttoptr (i64 xor (i64 ptrtoint (ptr @memcpy_d1 to i64), i64 87960930222080) to ptr), align 1
+; USER-CONS:       call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 xor (i64 ptrtoint (ptr @pair2 to i64), i64 87960930222080) to ptr), i8 0, i64 8, i1 false)
+; USER-CONS-NEXT:  call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 xor (i64 ptrtoint (ptr @c2 to i64), i64 87960930222080) to ptr), i8 0, i64 1, i1 false)
+; USER-CONS-NEXT:  call void @llvm.memset.p0.i64(ptr align 1 inttoptr (i64 xor (i64 ptrtoint (ptr @memcpy_d1 to i64), i64 87960930222080) to ptr), i8 0, i64 8, i1 false)
 ; CHECK-CONS:      call void @__msan_instrument_asm_store({{.*}}@pair2{{.*}}, i64 8)
 ; CHECK-CONS:      call void @__msan_instrument_asm_store({{.*}}@c2{{.*}}, i64 1)
 ; CHECK-CONS:      call void @__msan_instrument_asm_store({{.*}}@memcpy_d1{{.*}}, i64 8)

@ramosian-glider
Copy link
Contributor

LGTM in general.

But the downside here is that the compiler won't be able to optimize away repeated stores if they are transformed to memset calls.
Maybe introduce some threshold and only memset() sizes greater than, say, 64? (IIRC that's what -ftrivial-auto-var-init does)

@MaskRay
Copy link
Member Author

MaskRay commented Jan 30, 2024

LGTM in general.

But the downside here is that the compiler won't be able to optimize away repeated stores if they are transformed to memset calls. Maybe introduce some threshold and only memset() sizes greater than, say, 64? (IIRC that's what -ftrivial-auto-var-init does)

TIL. Changed to keep using StoreInst when size <= 32, similar to shouldUseBZeroPlusStoresToInitialize -ftrivial-auto-var-init)

Created using spr 1.3.4
if (Size <= 32)
IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
else
IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will hit interceptor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, we do this all the time here.

@MaskRay MaskRay changed the title [msan] Unpoison indirect outputs for userspace using llvm.memset.* [msan] Unpoison indirect outputs for userspace using memset for large operands Jan 30, 2024
@MaskRay MaskRay merged commit 9b91c54 into main Jan 30, 2024
3 of 4 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/msan-unpoison-indirect-outputs-for-userspace-using-llvmmemset branch January 30, 2024 21:45
auto [ShadowPtr, _] =
getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
if (Size <= 32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some comment wouldn't hurt here to clarify this constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit arbitrary. Perhaps

// The size threshold matches shouldUseBZeroPlusStoresToInitialize for -ftrivial-auto-var-init=zero?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memset will go into interceptor and do a lot of check, see __msan_memset, so it's more expensive than a regular memset. I can't tell what can be optimal constant here, but I would expect something larger. But I doubt it will make a meaningful difference.

Copy link
Member Author

@MaskRay MaskRay Feb 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your previous comment about the interceptor. The committed patch does contain this description:
"The intrinsic, if lowered to libcall, will use the msan interceptor."

Inline asm isn't commonly used:) This patch is for =m in extended asm, which I believe is more uncommon, and if used, usually with small objects. I guess 32 and 64 won't make a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants