Re-exec TSan with no ASLR if memory layout is incompatible on Linux #78351

thurstond · 2024-01-16T21:41:46Z

TSan's shadow mappings only support 30-bits of ASLR entropy on x86 Linux, and
it is not practical to support the maximum of 32-bits (due to pointer
compression and the overhead of shadow mappings). Instead, this patch
changes TSan to re-exec without ASLR if it encounters an incompatible
memory layout, as suggested by Dmitry in google/sanitizers#1716.
If ASLR is already disabled but the memory layout is still incompatible, it will abort.

This patch involves a bit of refactoring, because the old code is:

InitializePlatformEarly()
InitializeAllocator()
InitializePlatform(): CheckAndProtect()

but it may already segfault during InitializeAllocator() if the memory
layout is incompatible, before we get a chance to check in
CheckAndProtect().

This patch adds CheckAndProtect() during InitializePlatformEarly(), before
the allocator is initialized. Naturally, it is necessary to ensure that
CheckAndProtect() does not allow the heap regions to be occupied there,
hence we generalize CheckAndProtect() to optionally check the heap
regions. We keep the original behavior of CheckAndProtect() in InitializePlatform()
as a last line of defense.

We need to be careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec
for other reasons (e.g., unlimited stack size); we implement this by
moving all the re-exec logic into ReExecIfNeeded().

llvmbot · 2024-01-16T21:42:13Z

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Thurston Dang (thurstond)

Changes

TSan's shadow mappings only support 30-bits of ASLR entropy on x86, and
it is not practical to support the maximum of 32-bits (due to pointer
compression and the overhead of shadow mappings). Instead, this patch
changes TSan to re-exec without ASLR if it encounters an incompatible
memory layout, as suggested by Dmitry in google/sanitizers#1716.
If ASLR is already disabled, it will abort.

This patch involves a bit of refactoring, because the old code is:
InitializePlatformEarly()
InitializeAllocator()
InitializePlatform(): CheckAndProtect()
but it may already segfault during InitializeAllocator() if the memory
layout is incompatible, before we get a chance to check in
CheckAndProtect.

This patch adds CheckAndProtect during InitializePlatformEarly(), before
the allocator is initialized. Naturally, it is necessary to ensure that
CheckAndProtect does not allow the heap regions to be occupied there,
hence we generalize CheckAndProtect to optionally check the heap
regions. We keep the original behavior of CheckAndProtect() in InitializePlatform()
as a last line of defense.

We need to careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec
for other reasons (e.g., unlimited stack size); we implement this by
moving all the re-exec logic into ReExecIfNeeded().

Full diff: https://github.com/llvm/llvm-project/pull/78351.diff

3 Files Affected:

(modified) compiler-rt/lib/tsan/rtl/tsan_platform.h (+1-1)
(modified) compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp (+98-42)
(modified) compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp (+37-6)

diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform.h b/compiler-rt/lib/tsan/rtl/tsan_platform.h
index 84ff4bfade09ace..377f8aeb8d66e0d 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_platform.h
+++ b/compiler-rt/lib/tsan/rtl/tsan_platform.h
@@ -1024,7 +1024,7 @@ inline uptr RestoreAddr(uptr addr) {
 
 void InitializePlatform();
 void InitializePlatformEarly();
-void CheckAndProtect();
+bool CheckAndProtect(bool protect, bool ignore_heap, bool print_warnings);
 void InitializeShadowMemoryPlatform();
 void WriteMemoryProfile(char *buf, uptr buf_size, u64 uptime_ns);
 int ExtractResolvFDs(void *state, int *fds, int nfd);
diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp
index b45adea45b27adb..bc1be49dd41513f 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp
@@ -214,6 +214,88 @@ void InitializeShadowMemoryPlatform() {
 
 #endif  // #if !SANITIZER_GO
 
+#  if !SANITIZER_GO
+static void ReExecIfNeeded() {
+  // Go maps shadow memory lazily and works fine with limited address space.
+  // Unlimited stack is not a problem as well, because the executable
+  // is not compiled with -pie.
+  {
+    bool reexec = false;
+    // TSan doesn't play well with unlimited stack size (as stack
+    // overlaps with shadow memory). If we detect unlimited stack size,
+    // we re-exec the program with limited stack size as a best effort.
+    if (StackSizeIsUnlimited()) {
+      const uptr kMaxStackSize = 32 * 1024 * 1024;
+      VReport(1,
+              "Program is run with unlimited stack size, which wouldn't "
+              "work with ThreadSanitizer.\n"
+              "Re-execing with stack size limited to %zd bytes.\n",
+              kMaxStackSize);
+      SetStackSizeLimitInBytes(kMaxStackSize);
+      reexec = true;
+    }
+
+    if (!AddressSpaceIsUnlimited()) {
+      Report(
+          "WARNING: Program is run with limited virtual address space,"
+          " which wouldn't work with ThreadSanitizer.\n");
+      Report("Re-execing with unlimited virtual address space.\n");
+      SetAddressSpaceUnlimited();
+      reexec = true;
+    }
+
+    // ASLR personality check.
+    int old_personality = personality(0xffffffff);
+    bool aslr_on =
+        (old_personality != -1) && ((old_personality & ADDR_NO_RANDOMIZE) == 0);
+
+#    if SANITIZER_ANDROID && (defined(__aarch64__) || defined(__x86_64__))
+    // After patch "arm64: mm: support ARCH_MMAP_RND_BITS." is introduced in
+    // linux kernel, the random gap between stack and mapped area is increased
+    // from 128M to 36G on 39-bit aarch64. As it is almost impossible to cover
+    // this big range, we should disable randomized virtual space on aarch64.
+    if (aslr_on) {
+      VReport(1,
+              "WARNING: Program is run with randomized virtual address "
+              "space, which wouldn't work with ThreadSanitizer on Android.\n"
+              "Re-execing with fixed virtual address space.\n");
+      CHECK_NE(personality(old_personality | ADDR_NO_RANDOMIZE), -1);
+      reexec = true;
+    }
+#    endif
+
+    if (reexec) {
+      // Don't check the address space since we're going to re-exec anyway.
+    } else if (!CheckAndProtect(false, false, false)) {
+      if (aslr_on) {
+        // Disable ASLR if the memory layout was incompatible.
+        // Alternatively, we could just keep re-execing until we get lucky
+        // with a compatible randomized layout, but the risk is that if it's
+        // not an ASLR-related issue, we will be stuck in an infinite loop of
+        // re-execing (unless we change ReExec to pass a parameter of the
+        // number of retries allowed.)
+        VReport(1,
+                "WARNING: ThreadSanitizer: memory layout is incompatible, "
+                "possibly due to high-entropy ASLR.\n"
+                "Re-execing with fixed virtual address space.\n"
+                "N.B. reducing ASLR entropy is preferable.\n");
+        CHECK_NE(personality(old_personality | ADDR_NO_RANDOMIZE), -1);
+        reexec = true;
+      } else {
+        VReport(1,
+                "FATAL: ThreadSanitizer: memory layout is incompatible, "
+                "even though ASLR is disabled.\n"
+                "Please file a bug.\n");
+        Die();
+      }
+    }
+
+    if (reexec)
+      ReExec();
+  }
+}
+#  endif
+
 void InitializePlatformEarly() {
   vmaSize =
     (MostSignificantSetBitIndex(GET_CURRENT_FRAME()) + 1);
@@ -284,6 +366,10 @@ void InitializePlatformEarly() {
   }
 #    endif
 #  endif
+
+#  if !SANITIZER_GO
+  ReExecIfNeeded();
+#  endif
 }
 
 void InitializePlatform() {
@@ -294,52 +380,22 @@ void InitializePlatform() {
   // is not compiled with -pie.
 #if !SANITIZER_GO
   {
-    bool reexec = false;
-    // TSan doesn't play well with unlimited stack size (as stack
-    // overlaps with shadow memory). If we detect unlimited stack size,
-    // we re-exec the program with limited stack size as a best effort.
-    if (StackSizeIsUnlimited()) {
-      const uptr kMaxStackSize = 32 * 1024 * 1024;
-      VReport(1, "Program is run with unlimited stack size, which wouldn't "
-                 "work with ThreadSanitizer.\n"
-                 "Re-execing with stack size limited to %zd bytes.\n",
-              kMaxStackSize);
-      SetStackSizeLimitInBytes(kMaxStackSize);
-      reexec = true;
-    }
-
-    if (!AddressSpaceIsUnlimited()) {
-      Report("WARNING: Program is run with limited virtual address space,"
-             " which wouldn't work with ThreadSanitizer.\n");
-      Report("Re-execing with unlimited virtual address space.\n");
-      SetAddressSpaceUnlimited();
-      reexec = true;
-    }
-#if SANITIZER_ANDROID && (defined(__aarch64__) || defined(__x86_64__))
-    // After patch "arm64: mm: support ARCH_MMAP_RND_BITS." is introduced in
-    // linux kernel, the random gap between stack and mapped area is increased
-    // from 128M to 36G on 39-bit aarch64. As it is almost impossible to cover
-    // this big range, we should disable randomized virtual space on aarch64.
-    // ASLR personality check.
-    int old_personality = personality(0xffffffff);
-    if (old_personality != -1 && (old_personality & ADDR_NO_RANDOMIZE) == 0) {
-      VReport(1, "WARNING: Program is run with randomized virtual address "
-              "space, which wouldn't work with ThreadSanitizer.\n"
-              "Re-execing with fixed virtual address space.\n");
-      CHECK_NE(personality(old_personality | ADDR_NO_RANDOMIZE), -1);
-      reexec = true;
-    }
-
-#endif
-#if SANITIZER_LINUX && (defined(__aarch64__) || defined(__loongarch_lp64))
+#    if SANITIZER_LINUX && (defined(__aarch64__) || defined(__loongarch_lp64))
     // Initialize the xor key used in {sig}{set,long}jump.
     InitializeLongjmpXorKey();
-#endif
-    if (reexec)
-      ReExec();
+#    endif
+  }
+
+  // Earlier initialization steps already re-exec'ed until we got a compatible
+  // memory layout, so we don't expect any more issues here.
+  if (!CheckAndProtect(true, true, true)) {
+    Printf(
+        "FATAL: ThreadSanitizer: unexpectedly found incompatible memory "
+        "layout.\n");
+    Printf("FATAL: Please file a bug.\n");
+    Die();
   }
 
-  CheckAndProtect();
   InitTlsSize();
 #endif  // !SANITIZER_GO
 }
diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp
index e7dcd664dc0d207..7d5a992dc665eff 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_platform_posix.cpp
@@ -94,22 +94,51 @@ static void ProtectRange(uptr beg, uptr end) {
   }
 }
 
-void CheckAndProtect() {
+// CheckAndProtect will check if the memory layout is compatible with TSan.
+// Optionally (if 'protect' is true), it will set the memory regions between
+// app memory to be inaccessible.
+// 'ignore_heap' means it will not consider heap memory allocations to be a
+// conflict. Set this based on whether we are calling CheckAndProtect before
+// or after the allocator has initialized the heap.
+bool CheckAndProtect(bool protect, bool ignore_heap, bool print_warnings) {
   // Ensure that the binary is indeed compiled with -pie.
   MemoryMappingLayout proc_maps(true);
   MemoryMappedSegment segment;
   while (proc_maps.Next(&segment)) {
-    if (IsAppMem(segment.start)) continue;
+    if (segment.start >= HeapMemBeg() && segment.end <= HeapEnd()) {
+      if (ignore_heap) {
+        continue;
+      } else {
+        return false;
+      }
+    }
+
+    // Note: IsAppMem includes if it is heap memory, hence we must
+    // put this check after the heap bounds check.
+    if (IsAppMem(segment.start) && IsAppMem(segment.end - 1))
+      continue;
+
+    // Guard page after the heap end
     if (segment.start >= HeapMemEnd() && segment.start < HeapEnd()) continue;
+
     if (segment.protection == 0)  // Zero page or mprotected.
       continue;
+
     if (segment.start >= VdsoBeg())  // vdso
       break;
-    Printf("FATAL: ThreadSanitizer: unexpected memory mapping 0x%zx-0x%zx\n",
-           segment.start, segment.end);
-    Die();
+
+    // Debug output can break tests. Suppress this message in most cases.
+    if (print_warnings)
+      Printf(
+          "WARNING: ThreadSanitizer: unexpected memory mapping 0x%zx-0x%zx\n",
+          segment.start, segment.end);
+
+    return false;
   }
 
+  if (!protect)
+    return true;
+
 #    if SANITIZER_IOS && !SANITIZER_IOSSIM
   ProtectRange(HeapMemEnd(), ShadowBeg());
   ProtectRange(ShadowEnd(), MetaShadowBeg());
@@ -135,8 +164,10 @@ void CheckAndProtect() {
   // Older s390x kernels may not support 5-level page tables.
   TryProtectRange(user_addr_max_l4, user_addr_max_l5);
 #endif
+
+  return true;
 }
-#endif
+#  endif
 
 }  // namespace __tsan

dvyukov · 2024-01-17T07:31:05Z

compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp

+
+  // Earlier initialization steps already re-exec'ed until we got a compatible
+  // memory layout, so we don't expect any more issues here.
+  if (!CheckAndProtect(true, true, true)) {


There is also a call to CheckAndProtect in platform_mac.cpp, probably also needs to be updated.

Good point! I've made the function call compatible on Mac, though I haven't added the re-exec functionality (I don't have a Mac to even test that it compiles, let alone works).

vitalybuka · 2024-01-18T00:57:11Z

compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp

+  // Go maps shadow memory lazily and works fine with limited address space.
+  // Unlimited stack is not a problem as well, because the executable
+  // is not compiled with -pie.
+  {


what is the purpose of whole function {} ?

I guess it's copy-pasted
it's noop here, please remove.

github-actions · 2024-01-18T22:31:58Z

✅ With the latest revision this PR passed the C/C++ code formatter.

TSan's shadow mappings only support 30-bits of ASLR entropy on x86, and it is not practical to support the maximum of 32-bits (due to pointer compression and the overhead of shadow mappings). Instead, this patch changes TSan to re-exec without ASLR if it encounters an incompatible memory layout, as suggested by Dmitry in google/sanitizers#1716. If ASLR is already disabled, it will abort. This patch involves a bit of refactoring, because the old code is: InitializePlatformEarly() InitializeAllocator() InitializePlatform(): CheckAndProtect() but it may already segfault during InitializeAllocator() if the memory layout is incompatible, before we get a chance to check in CheckAndProtect. This patch adds CheckAndProtect during InitializePlatformEarly(), before the allocator is initialized. Naturally, it is necessary to ensure that CheckAndProtect does *not* allow the heap regions to be occupied there, hence we generalize CheckAndProtect to optionally check the heap regions. We keep the original behavior of CheckAndProtect() in InitializePlatform() as a last line of defense. We need to careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec for other reasons (e.g., unlimited stack size); we implement this by moving all the re-exec logic into ReExecIfNeeded().

The overall behavior is unchanged for Mac (i.e., there is no re-exec added).

My previous patch, "Re-exec TSan with no ASLR if memory layout is incompatible on Linux (llvm#78351)" (0784b1e) hoisted the 'personality' call, to share the code between Android and non-Android Linux. Unfortunately, this eager call to 'personality' may trigger sandbox violations on non-Android Linux. This patch fixes the issue by only calling 'personality' on non-Android Linux if the memory mapping is incompatible. This may still cause a sandbox violation, but only if it was going to abort anyway due to an incompatible memory mapping. (The behavior on Android Linux is unchanged by this patch or the previous patch.)

My previous patch, "Re-exec TSan with no ASLR if memory layout is incompatible on Linux (#78351)" (0784b1e) hoisted the 'personality' call, to share the code between Android and non-Android Linux. Unfortunately, this eager call to 'personality' may trigger sandbox violations on non-Android Linux. This patch fixes the issue by only calling 'personality' on non-Android Linux if the memory mapping is incompatible. This may still cause a sandbox violation, but only if it was going to abort anyway due to an incompatible memory mapping. (The behavior on Android Linux is unchanged by this patch or the previous patch.)

llvmbot added compiler-rt compiler-rt:sanitizer labels Jan 16, 2024

thurstond requested review from dvyukov, vitalybuka and kstoimenov January 16, 2024 21:42

dvyukov approved these changes Jan 17, 2024

View reviewed changes

thurstond changed the title ~~Re-exec TSan with no ASLR if memory layout is incompatible~~ Re-exec TSan with no ASLR if memory layout is incompatible on Linux Jan 17, 2024

vitalybuka approved these changes Jan 18, 2024

View reviewed changes

thurstond force-pushed the main branch from ac0f084 to 42b4796 Compare January 19, 2024 00:50

thurstond added 6 commits January 19, 2024 00:58

Update tsan_platform_mac.cpp to use new CheckAndProtect semantics.

bf8d52f

The overall behavior is unchanged for Mac (i.e., there is no re-exec added).

Remove unnecessary braces

115eb0c

Whitespace

2dd6026

More changes to whitespace

efcef0c

Empty commit to try to trick BuildKite into re-running

ecf732c

thurstond force-pushed the main branch from 42b4796 to ecf732c Compare January 19, 2024 01:01

thurstond merged commit 0784b1e into llvm:main Jan 19, 2024
4 checks passed

thurstond mentioned this pull request Jan 24, 2024

[tsan] Lazily call 'personality' to minimize sandbox violations #79334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-exec TSan with no ASLR if memory layout is incompatible on Linux #78351

Re-exec TSan with no ASLR if memory layout is incompatible on Linux #78351

thurstond commented Jan 16, 2024 •

edited

llvmbot commented Jan 16, 2024

dvyukov Jan 17, 2024

thurstond Jan 17, 2024

vitalybuka Jan 18, 2024

vitalybuka Jan 18, 2024

thurstond Jan 18, 2024

github-actions bot commented Jan 18, 2024 •

edited

Re-exec TSan with no ASLR if memory layout is incompatible on Linux #78351

Re-exec TSan with no ASLR if memory layout is incompatible on Linux #78351

Conversation

thurstond commented Jan 16, 2024 • edited

llvmbot commented Jan 16, 2024

dvyukov Jan 17, 2024

Choose a reason for hiding this comment

thurstond Jan 17, 2024

Choose a reason for hiding this comment

vitalybuka Jan 18, 2024

Choose a reason for hiding this comment

vitalybuka Jan 18, 2024

Choose a reason for hiding this comment

thurstond Jan 18, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 18, 2024 • edited

thurstond commented Jan 16, 2024 •

edited

github-actions bot commented Jan 18, 2024 •

edited