Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-exec TSan with no ASLR if memory layout is incompatible on Linux #78351

Merged
merged 6 commits into from Jan 19, 2024

Conversation

thurstond
Copy link
Contributor

@thurstond thurstond commented Jan 16, 2024

TSan's shadow mappings only support 30-bits of ASLR entropy on x86 Linux, and
it is not practical to support the maximum of 32-bits (due to pointer
compression and the overhead of shadow mappings). Instead, this patch
changes TSan to re-exec without ASLR if it encounters an incompatible
memory layout, as suggested by Dmitry in google/sanitizers#1716.
If ASLR is already disabled but the memory layout is still incompatible, it will abort.

This patch involves a bit of refactoring, because the old code is:

  1. InitializePlatformEarly()
  2. InitializeAllocator()
  3. InitializePlatform(): CheckAndProtect()

but it may already segfault during InitializeAllocator() if the memory
layout is incompatible, before we get a chance to check in
CheckAndProtect().

This patch adds CheckAndProtect() during InitializePlatformEarly(), before
the allocator is initialized. Naturally, it is necessary to ensure that
CheckAndProtect() does not allow the heap regions to be occupied there,
hence we generalize CheckAndProtect() to optionally check the heap
regions. We keep the original behavior of CheckAndProtect() in InitializePlatform()
as a last line of defense.

We need to be careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec
for other reasons (e.g., unlimited stack size); we implement this by
moving all the re-exec logic into ReExecIfNeeded().

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 16, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Thurston Dang (thurstond)

Changes

TSan's shadow mappings only support 30-bits of ASLR entropy on x86, and
it is not practical to support the maximum of 32-bits (due to pointer
compression and the overhead of shadow mappings). Instead, this patch
changes TSan to re-exec without ASLR if it encounters an incompatible
memory layout, as suggested by Dmitry in google/sanitizers#1716.
If ASLR is already disabled, it will abort.

This patch involves a bit of refactoring, because the old code is:
InitializePlatformEarly()
InitializeAllocator()
InitializePlatform(): CheckAndProtect()
but it may already segfault during InitializeAllocator() if the memory
layout is incompatible, before we get a chance to check in
CheckAndProtect.

This patch adds CheckAndProtect during InitializePlatformEarly(), before
the allocator is initialized. Naturally, it is necessary to ensure that
CheckAndProtect does not allow the heap regions to be occupied there,
hence we generalize CheckAndProtect to optionally check the heap
regions. We keep the original behavior of CheckAndProtect() in InitializePlatform()
as a last line of defense.

We need to careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec
for other reasons (e.g., unlimited stack size); we implement this by
moving all the re-exec logic into ReExecIfNeeded().


Full diff: https://github.com/llvm/llvm-project/pull/78351.diff

3 Files Affected:

  • (modified) compiler-rt/lib/tsan/rtl/tsan_platform.h (+1-1)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp (+98-42)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp (+37-6)
diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform.h b/compiler-rt/lib/tsan/rtl/tsan_platform.h
index 84ff4bfade09ace..377f8aeb8d66e0d 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_platform.h
+++ b/compiler-rt/lib/tsan/rtl/tsan_platform.h
@@ -1024,7 +1024,7 @@ inline uptr RestoreAddr(uptr addr) {
 
 void InitializePlatform();
 void InitializePlatformEarly();
-void CheckAndProtect();
+bool CheckAndProtect(bool protect, bool ignore_heap, bool print_warnings);
 void InitializeShadowMemoryPlatform();
 void WriteMemoryProfile(char *buf, uptr buf_size, u64 uptime_ns);
 int ExtractResolvFDs(void *state, int *fds, int nfd);
diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp
index b45adea45b27adb..bc1be49dd41513f 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp
@@ -214,6 +214,88 @@ void InitializeShadowMemoryPlatform() {
 
 #endif  // #if !SANITIZER_GO
 
+#  if !SANITIZER_GO
+static void ReExecIfNeeded() {
+  // Go maps shadow memory lazily and works fine with limited address space.
+  // Unlimited stack is not a problem as well, because the executable
+  // is not compiled with -pie.
+  {
+    bool reexec = false;
+    // TSan doesn't play well with unlimited stack size (as stack
+    // overlaps with shadow memory). If we detect unlimited stack size,
+    // we re-exec the program with limited stack size as a best effort.
+    if (StackSizeIsUnlimited()) {
+      const uptr kMaxStackSize = 32 * 1024 * 1024;
+      VReport(1,
+              "Program is run with unlimited stack size, which wouldn't "
+              "work with ThreadSanitizer.\n"
+              "Re-execing with stack size limited to %zd bytes.\n",
+              kMaxStackSize);
+      SetStackSizeLimitInBytes(kMaxStackSize);
+      reexec = true;
+    }
+
+    if (!AddressSpaceIsUnlimited()) {
+      Report(
+          "WARNING: Program is run with limited virtual address space,"
+          " which wouldn't work with ThreadSanitizer.\n");
+      Report("Re-execing with unlimited virtual address space.\n");
+      SetAddressSpaceUnlimited();
+      reexec = true;
+    }
+
+    // ASLR personality check.
+    int old_personality = personality(0xffffffff);
+    bool aslr_on =
+        (old_personality != -1) && ((old_personality & ADDR_NO_RANDOMIZE) == 0);
+
+#    if SANITIZER_ANDROID && (defined(__aarch64__) || defined(__x86_64__))
+    // After patch "arm64: mm: support ARCH_MMAP_RND_BITS." is introduced in
+    // linux kernel, the random gap between stack and mapped area is increased
+    // from 128M to 36G on 39-bit aarch64. As it is almost impossible to cover
+    // this big range, we should disable randomized virtual space on aarch64.
+    if (aslr_on) {
+      VReport(1,
+              "WARNING: Program is run with randomized virtual address "
+              "space, which wouldn't work with ThreadSanitizer on Android.\n"
+              "Re-execing with fixed virtual address space.\n");
+      CHECK_NE(personality(old_personality | ADDR_NO_RANDOMIZE), -1);
+      reexec = true;
+    }
+#    endif
+
+    if (reexec) {
+      // Don't check the address space since we're going to re-exec anyway.
+    } else if (!CheckAndProtect(false, false, false)) {
+      if (aslr_on) {
+        // Disable ASLR if the memory layout was incompatible.
+        // Alternatively, we could just keep re-execing until we get lucky
+        // with a compatible randomized layout, but the risk is that if it's
+        // not an ASLR-related issue, we will be stuck in an infinite loop of
+        // re-execing (unless we change ReExec to pass a parameter of the
+        // number of retries allowed.)
+        VReport(1,
+                "WARNING: ThreadSanitizer: memory layout is incompatible, "
+                "possibly due to high-entropy ASLR.\n"
+                "Re-execing with fixed virtual address space.\n"
+                "N.B. reducing ASLR entropy is preferable.\n");
+        CHECK_NE(personality(old_personality | ADDR_NO_RANDOMIZE), -1);
+        reexec = true;
+      } else {
+        VReport(1,
+                "FATAL: ThreadSanitizer: memory layout is incompatible, "
+                "even though ASLR is disabled.\n"
+                "Please file a bug.\n");
+        Die();
+      }
+    }
+
+    if (reexec)
+      ReExec();
+  }
+}
+#  endif
+
 void InitializePlatformEarly() {
   vmaSize =
     (MostSignificantSetBitIndex(GET_CURRENT_FRAME()) + 1);
@@ -284,6 +366,10 @@ void InitializePlatformEarly() {
   }
 #    endif
 #  endif
+
+#  if !SANITIZER_GO
+  ReExecIfNeeded();
+#  endif
 }
 
 void InitializePlatform() {
@@ -294,52 +380,22 @@ void InitializePlatform() {
   // is not compiled with -pie.
 #if !SANITIZER_GO
   {
-    bool reexec = false;
-    // TSan doesn't play well with unlimited stack size (as stack
-    // overlaps with shadow memory). If we detect unlimited stack size,
-    // we re-exec the program with limited stack size as a best effort.
-    if (StackSizeIsUnlimited()) {
-      const uptr kMaxStackSize = 32 * 1024 * 1024;
-      VReport(1, "Program is run with unlimited stack size, which wouldn't "
-                 "work with ThreadSanitizer.\n"
-                 "Re-execing with stack size limited to %zd bytes.\n",
-              kMaxStackSize);
-      SetStackSizeLimitInBytes(kMaxStackSize);
-      reexec = true;
-    }
-
-    if (!AddressSpaceIsUnlimited()) {
-      Report("WARNING: Program is run with limited virtual address space,"
-             " which wouldn't work with ThreadSanitizer.\n");
-      Report("Re-execing with unlimited virtual address space.\n");
-      SetAddressSpaceUnlimited();
-      reexec = true;
-    }
-#if SANITIZER_ANDROID && (defined(__aarch64__) || defined(__x86_64__))
-    // After patch "arm64: mm: support ARCH_MMAP_RND_BITS." is introduced in
-    // linux kernel, the random gap between stack and mapped area is increased
-    // from 128M to 36G on 39-bit aarch64. As it is almost impossible to cover
-    // this big range, we should disable randomized virtual space on aarch64.
-    // ASLR personality check.
-    int old_personality = personality(0xffffffff);
-    if (old_personality != -1 && (old_personality & ADDR_NO_RANDOMIZE) == 0) {
-      VReport(1, "WARNING: Program is run with randomized virtual address "
-              "space, which wouldn't work with ThreadSanitizer.\n"
-              "Re-execing with fixed virtual address space.\n");
-      CHECK_NE(personality(old_personality | ADDR_NO_RANDOMIZE), -1);
-      reexec = true;
-    }
-
-#endif
-#if SANITIZER_LINUX && (defined(__aarch64__) || defined(__loongarch_lp64))
+#    if SANITIZER_LINUX && (defined(__aarch64__) || defined(__loongarch_lp64))
     // Initialize the xor key used in {sig}{set,long}jump.
     InitializeLongjmpXorKey();
-#endif
-    if (reexec)
-      ReExec();
+#    endif
+  }
+
+  // Earlier initialization steps already re-exec'ed until we got a compatible
+  // memory layout, so we don't expect any more issues here.
+  if (!CheckAndProtect(true, true, true)) {
+    Printf(
+        "FATAL: ThreadSanitizer: unexpectedly found incompatible memory "
+        "layout.\n");
+    Printf("FATAL: Please file a bug.\n");
+    Die();
   }
 
-  CheckAndProtect();
   InitTlsSize();
 #endif  // !SANITIZER_GO
 }
diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp
index e7dcd664dc0d207..7d5a992dc665eff 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp
@@ -94,22 +94,51 @@ static void ProtectRange(uptr beg, uptr end) {
   }
 }
 
-void CheckAndProtect() {
+// CheckAndProtect will check if the memory layout is compatible with TSan.
+// Optionally (if 'protect' is true), it will set the memory regions between
+// app memory to be inaccessible.
+// 'ignore_heap' means it will not consider heap memory allocations to be a
+// conflict. Set this based on whether we are calling CheckAndProtect before
+// or after the allocator has initialized the heap.
+bool CheckAndProtect(bool protect, bool ignore_heap, bool print_warnings) {
   // Ensure that the binary is indeed compiled with -pie.
   MemoryMappingLayout proc_maps(true);
   MemoryMappedSegment segment;
   while (proc_maps.Next(&segment)) {
-    if (IsAppMem(segment.start)) continue;
+    if (segment.start >= HeapMemBeg() && segment.end <= HeapEnd()) {
+      if (ignore_heap) {
+        continue;
+      } else {
+        return false;
+      }
+    }
+
+    // Note: IsAppMem includes if it is heap memory, hence we must
+    // put this check after the heap bounds check.
+    if (IsAppMem(segment.start) && IsAppMem(segment.end - 1))
+      continue;
+
+    // Guard page after the heap end
     if (segment.start >= HeapMemEnd() && segment.start < HeapEnd()) continue;
+
     if (segment.protection == 0)  // Zero page or mprotected.
       continue;
+
     if (segment.start >= VdsoBeg())  // vdso
       break;
-    Printf("FATAL: ThreadSanitizer: unexpected memory mapping 0x%zx-0x%zx\n",
-           segment.start, segment.end);
-    Die();
+
+    // Debug output can break tests. Suppress this message in most cases.
+    if (print_warnings)
+      Printf(
+          "WARNING: ThreadSanitizer: unexpected memory mapping 0x%zx-0x%zx\n",
+          segment.start, segment.end);
+
+    return false;
   }
 
+  if (!protect)
+    return true;
+
 #    if SANITIZER_IOS && !SANITIZER_IOSSIM
   ProtectRange(HeapMemEnd(), ShadowBeg());
   ProtectRange(ShadowEnd(), MetaShadowBeg());
@@ -135,8 +164,10 @@ void CheckAndProtect() {
   // Older s390x kernels may not support 5-level page tables.
   TryProtectRange(user_addr_max_l4, user_addr_max_l5);
 #endif
+
+  return true;
 }
-#endif
+#  endif
 
 }  // namespace __tsan
 


// Earlier initialization steps already re-exec'ed until we got a compatible
// memory layout, so we don't expect any more issues here.
if (!CheckAndProtect(true, true, true)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a call to CheckAndProtect in platform_mac.cpp, probably also needs to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've made the function call compatible on Mac, though I haven't added the re-exec functionality (I don't have a Mac to even test that it compiles, let alone works).

@thurstond thurstond changed the title Re-exec TSan with no ASLR if memory layout is incompatible Re-exec TSan with no ASLR if memory layout is incompatible on Linux Jan 17, 2024
// Go maps shadow memory lazily and works fine with limited address space.
// Unlimited stack is not a problem as well, because the executable
// is not compiled with -pie.
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of whole function {} ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's copy-pasted
it's noop here, please remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link

github-actions bot commented Jan 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

TSan's shadow mappings only support 30-bits of ASLR entropy on x86, and
it is not practical to support the maximum of 32-bits (due to pointer
compression and the overhead of shadow mappings). Instead, this patch
changes TSan to re-exec without ASLR if it encounters an incompatible
memory layout, as suggested by Dmitry in google/sanitizers#1716.
If ASLR is already disabled, it will abort.

This patch involves a bit of refactoring, because the old code is:
    InitializePlatformEarly()
    InitializeAllocator()
    InitializePlatform(): CheckAndProtect()
but it may already segfault during InitializeAllocator() if the memory
layout is incompatible, before we get a chance to check in
CheckAndProtect.

This patch adds CheckAndProtect during InitializePlatformEarly(), before
the allocator is initialized. Naturally, it is necessary to ensure that
CheckAndProtect does *not* allow the heap regions to be occupied there,
hence we generalize CheckAndProtect to optionally check the heap
regions. We keep the original behavior of CheckAndProtect() in InitializePlatform()
as a last line of defense.

We need to careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec
for other reasons (e.g., unlimited stack size); we implement this by
moving all the re-exec logic into ReExecIfNeeded().
The overall behavior is unchanged for Mac (i.e., there is no
re-exec added).
@thurstond thurstond merged commit 0784b1e into llvm:main Jan 19, 2024
4 checks passed
thurstond added a commit to thurstond/llvm-project that referenced this pull request Jan 24, 2024
My previous patch, "Re-exec TSan with no ASLR if memory layout is incompatible on Linux (llvm#78351)" (0784b1e) hoisted the 'personality' call, to
share the code between Android and non-Android Linux. Unfortunately, this eager call to
'personality' may trigger sandbox violations on non-Android Linux.

This patch fixes the issue by only calling 'personality' on non-Android Linux if the
memory mapping is incompatible. This may still cause a sandbox violation, but only if it
was going to abort anyway due to an incompatible memory mapping.

(The behavior on Android Linux is unchanged by this patch or the previous patch.)
thurstond added a commit to thurstond/llvm-project that referenced this pull request Jan 25, 2024
My previous patch, "Re-exec TSan with no ASLR if memory layout is incompatible on Linux (llvm#78351)" (0784b1e) hoisted the 'personality' call, to
share the code between Android and non-Android Linux. Unfortunately, this eager call to
'personality' may trigger sandbox violations on non-Android Linux.

This patch fixes the issue by only calling 'personality' on non-Android Linux if the
memory mapping is incompatible. This may still cause a sandbox violation, but only if it
was going to abort anyway due to an incompatible memory mapping.

(The behavior on Android Linux is unchanged by this patch or the previous patch.)
thurstond added a commit that referenced this pull request Jan 25, 2024
My previous patch, "Re-exec TSan with no ASLR if memory layout is incompatible on Linux (#78351)" (0784b1e) hoisted the 'personality' call, to share the code between Android and non-Android Linux. Unfortunately, this eager call to 'personality' may trigger sandbox violations on non-Android Linux.

This patch fixes the issue by only calling 'personality' on non-Android Linux if the memory mapping is incompatible. This may still cause a sandbox violation, but only if it was going to abort anyway due to an incompatible memory mapping.

(The behavior on Android Linux is unchanged by this patch or the previous patch.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants