Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TySan] A Type Sanitizer (Runtime Library) #76261

Open
wants to merge 8 commits into
base: users/fhahn/tysan-a-type-sanitizer-clang
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Dec 22, 2023

This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.

As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work.

(This includes build fixes for Linux from Mingjie Xu)

Based on https://reviews.llvm.org/D32197.

Depends on #76260 (Clang support), #76259 (LLVM support)

@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt compiler-rt:sanitizer labels Dec 22, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 22, 2023

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-compiler-rt-sanitizer

@llvm/pr-subscribers-clang

Author: Florian Hahn (fhahn)

Changes

This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.

As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work.

(This includes build fixes for Linux from Mingjie Xu)

Based on https://reviews.llvm.org/D32197.


Patch is 53.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76261.diff

28 Files Affected:

  • (modified) clang/runtime/CMakeLists.txt (+1-1)
  • (modified) compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake (+1)
  • (modified) compiler-rt/cmake/config-ix.cmake (+14-1)
  • (added) compiler-rt/lib/tysan/CMakeLists.txt (+64)
  • (added) compiler-rt/lib/tysan/lit.cfg (+35)
  • (added) compiler-rt/lib/tysan/lit.site.cfg.in (+12)
  • (added) compiler-rt/lib/tysan/tysan.cpp (+339)
  • (added) compiler-rt/lib/tysan/tysan.h (+79)
  • (added) compiler-rt/lib/tysan/tysan.syms.extra (+2)
  • (added) compiler-rt/lib/tysan/tysan_flags.inc (+17)
  • (added) compiler-rt/lib/tysan/tysan_interceptors.cpp (+250)
  • (added) compiler-rt/lib/tysan/tysan_platform.h (+93)
  • (added) compiler-rt/lib/tysan/weak_symbols.txt ()
  • (added) compiler-rt/test/tysan/CMakeLists.txt (+32)
  • (added) compiler-rt/test/tysan/anon-ns.cpp (+41)
  • (added) compiler-rt/test/tysan/anon-same-struct.c (+26)
  • (added) compiler-rt/test/tysan/anon-struct.c (+27)
  • (added) compiler-rt/test/tysan/basic.c (+65)
  • (added) compiler-rt/test/tysan/char-memcpy.c (+45)
  • (added) compiler-rt/test/tysan/global.c (+31)
  • (added) compiler-rt/test/tysan/int-long.c (+21)
  • (added) compiler-rt/test/tysan/lit.cfg.py (+139)
  • (added) compiler-rt/test/tysan/lit.site.cfg.py.in (+17)
  • (added) compiler-rt/test/tysan/ptr-float.c (+19)
  • (added) compiler-rt/test/tysan/struct-offset-multiple-compilation-units.cpp (+51)
  • (added) compiler-rt/test/tysan/struct-offset.c (+26)
  • (added) compiler-rt/test/tysan/struct.c (+39)
  • (added) compiler-rt/test/tysan/union-wr-wr.c (+18)
diff --git a/clang/runtime/CMakeLists.txt b/clang/runtime/CMakeLists.txt
index 65fcdc2868f031..ff2605b23d25b0 100644
--- a/clang/runtime/CMakeLists.txt
+++ b/clang/runtime/CMakeLists.txt
@@ -122,7 +122,7 @@ if(LLVM_BUILD_EXTERNAL_COMPILER_RT AND EXISTS ${COMPILER_RT_SRC_ROOT}/)
                            COMPONENT compiler-rt)
 
   # Add top-level targets that build specific compiler-rt runtimes.
-  set(COMPILER_RT_RUNTIMES fuzzer asan builtins dfsan lsan msan profile tsan ubsan ubsan-minimal)
+  set(COMPILER_RT_RUNTIMES fuzzer asan builtins dfsan lsan msan profile tsan tysan ubsan ubsan-minimal)
   foreach(runtime ${COMPILER_RT_RUNTIMES})
     get_ext_project_build_command(build_runtime_cmd ${runtime})
     add_custom_target(${runtime}
diff --git a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
index 416777171d2ca7..66ebee4392e1ff 100644
--- a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
+++ b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
@@ -67,6 +67,7 @@ set(ALL_PROFILE_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${PPC32} ${PPC
     ${RISCV32} ${RISCV64} ${LOONGARCH64})
 set(ALL_TSAN_SUPPORTED_ARCH ${X86_64} ${MIPS64} ${ARM64} ${PPC64} ${S390X}
     ${LOONGARCH64} ${RISCV64})
+set(ALL_TYSAN_SUPPORTED_ARCH ${X86_64} ${ARM64})
 set(ALL_UBSAN_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${RISCV64}
     ${MIPS32} ${MIPS64} ${PPC64} ${S390X} ${SPARC} ${SPARCV9} ${HEXAGON}
     ${LOONGARCH64})
diff --git a/compiler-rt/cmake/config-ix.cmake b/compiler-rt/cmake/config-ix.cmake
index 2dccd4954b2537..e448e71e5422a1 100644
--- a/compiler-rt/cmake/config-ix.cmake
+++ b/compiler-rt/cmake/config-ix.cmake
@@ -448,6 +448,7 @@ if(APPLE)
   set(SANITIZER_COMMON_SUPPORTED_OS osx)
   set(PROFILE_SUPPORTED_OS osx)
   set(TSAN_SUPPORTED_OS osx)
+  set(TYSAN_SUPPORTED_OS osx)
   set(XRAY_SUPPORTED_OS osx)
   set(FUZZER_SUPPORTED_OS osx)
   set(ORC_SUPPORTED_OS)
@@ -581,6 +582,7 @@ if(APPLE)
           list(APPEND FUZZER_SUPPORTED_OS ${platform})
           list(APPEND ORC_SUPPORTED_OS ${platform})
           list(APPEND UBSAN_SUPPORTED_OS ${platform})
+          list(APPEND TYSAN_SUPPORTED_OS ${platform})
           list(APPEND LSAN_SUPPORTED_OS ${platform})
           list(APPEND STATS_SUPPORTED_OS ${platform})
         endif()
@@ -630,6 +632,9 @@ if(APPLE)
   list_intersect(PROFILE_SUPPORTED_ARCH
     ALL_PROFILE_SUPPORTED_ARCH
     SANITIZER_COMMON_SUPPORTED_ARCH)
+  list_intersect(TYSAN_SUPPORTED_ARCH
+    ALL_TYSAN_SUPPORTED_ARCH
+    SANITIZER_COMMON_SUPPORTED_ARCH)
   list_intersect(TSAN_SUPPORTED_ARCH
     ALL_TSAN_SUPPORTED_ARCH
     SANITIZER_COMMON_SUPPORTED_ARCH)
@@ -676,6 +681,7 @@ else()
   filter_available_targets(HWASAN_SUPPORTED_ARCH ${ALL_HWASAN_SUPPORTED_ARCH})
   filter_available_targets(MEMPROF_SUPPORTED_ARCH ${ALL_MEMPROF_SUPPORTED_ARCH})
   filter_available_targets(PROFILE_SUPPORTED_ARCH ${ALL_PROFILE_SUPPORTED_ARCH})
+  filter_available_targets(TYSAN_SUPPORTED_ARCH ${ALL_TYSAN_SUPPORTED_ARCH})
   filter_available_targets(TSAN_SUPPORTED_ARCH ${ALL_TSAN_SUPPORTED_ARCH})
   filter_available_targets(UBSAN_SUPPORTED_ARCH ${ALL_UBSAN_SUPPORTED_ARCH})
   filter_available_targets(SAFESTACK_SUPPORTED_ARCH
@@ -720,7 +726,7 @@ if(COMPILER_RT_SUPPORTED_ARCH)
 endif()
 message(STATUS "Compiler-RT supported architectures: ${COMPILER_RT_SUPPORTED_ARCH}")
 
-set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
+set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;tysan,safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
 set(COMPILER_RT_SANITIZERS_TO_BUILD all CACHE STRING
     "sanitizers to build if supported on the target (all;${ALL_SANITIZERS})")
 list_replace(COMPILER_RT_SANITIZERS_TO_BUILD all "${ALL_SANITIZERS}")
@@ -801,6 +807,13 @@ else()
   set(COMPILER_RT_HAS_PROFILE FALSE)
 endif()
 
+if (COMPILER_RT_HAS_SANITIZER_COMMON AND TYSAN_SUPPORTED_ARCH AND
+        OS_NAME MATCHES "Linux|Darwin")
+  set(COMPILER_RT_HAS_TYSAN TRUE)
+else()
+  set(COMPILER_RT_HAS_TYSAN FALSE)
+endif()
+
 if (COMPILER_RT_HAS_SANITIZER_COMMON AND TSAN_SUPPORTED_ARCH)
   if (OS_NAME MATCHES "Linux|Darwin|FreeBSD|NetBSD")
     set(COMPILER_RT_HAS_TSAN TRUE)
diff --git a/compiler-rt/lib/tysan/CMakeLists.txt b/compiler-rt/lib/tysan/CMakeLists.txt
new file mode 100644
index 00000000000000..859b67928f004a
--- /dev/null
+++ b/compiler-rt/lib/tysan/CMakeLists.txt
@@ -0,0 +1,64 @@
+include_directories(..)
+
+# Runtime library sources and build flags.
+set(TYSAN_SOURCES
+  tysan.cpp
+  tysan_interceptors.cpp)
+set(TYSAN_COMMON_CFLAGS ${SANITIZER_COMMON_CFLAGS})
+append_rtti_flag(OFF TYSAN_COMMON_CFLAGS)
+# Prevent clang from generating libc calls.
+append_list_if(COMPILER_RT_HAS_FFREESTANDING_FLAG -ffreestanding TYSAN_COMMON_CFLAGS)
+
+add_compiler_rt_object_libraries(RTTysan_dynamic
+  OS ${SANITIZER_COMMON_SUPPORTED_OS}
+  ARCHS ${TYSAN_SUPPORTED_ARCH}
+  SOURCES ${TYSAN_SOURCES}
+  ADDITIONAL_HEADERS ${TYSAN_HEADERS}
+  CFLAGS ${TYSAN_DYNAMIC_CFLAGS}
+  DEFS ${TYSAN_DYNAMIC_DEFINITIONS})
+
+
+# Static runtime library.
+add_compiler_rt_component(tysan)
+
+
+if(APPLE)
+  add_weak_symbols("sanitizer_common" WEAK_SYMBOL_LINK_FLAGS)
+
+  add_compiler_rt_runtime(clang_rt.tysan
+    SHARED
+    OS ${SANITIZER_COMMON_SUPPORTED_OS}
+    ARCHS ${TYSAN_SUPPORTED_ARCH}
+    OBJECT_LIBS RTTysan_dynamic
+                RTInterception
+                RTSanitizerCommon
+                RTSanitizerCommonLibc
+                RTSanitizerCommonSymbolizer
+    CFLAGS ${TYSAN_DYNAMIC_CFLAGS}
+    LINK_FLAGS ${WEAK_SYMBOL_LINK_FLAGS}
+    DEFS ${TYSAN_DYNAMIC_DEFINITIONS}
+    PARENT_TARGET tysan)
+
+  add_compiler_rt_runtime(clang_rt.tysan_static
+    STATIC
+    ARCHS ${TYSAN_SUPPORTED_ARCH}
+    OBJECT_LIBS RTTysan_static
+    CFLAGS ${TYSAN_CFLAGS}
+    DEFS ${TYSAN_COMMON_DEFINITIONS}
+    PARENT_TARGET tysan)
+else()
+  foreach(arch ${TYSAN_SUPPORTED_ARCH})
+    set(TYSAN_CFLAGS ${TYSAN_COMMON_CFLAGS})
+    append_list_if(COMPILER_RT_HAS_FPIE_FLAG -fPIE TYSAN_CFLAGS)
+    add_compiler_rt_runtime(clang_rt.tysan
+      STATIC
+      ARCHS ${arch}
+      SOURCES ${TYSAN_SOURCES}
+              $<TARGET_OBJECTS:RTInterception.${arch}>
+              $<TARGET_OBJECTS:RTSanitizerCommon.${arch}>
+              $<TARGET_OBJECTS:RTSanitizerCommonLibc.${arch}>
+              $<TARGET_OBJECTS:RTSanitizerCommonSymbolizer.${arch}>
+      CFLAGS ${TYSAN_CFLAGS}
+      PARENT_TARGET tysan)
+  endforeach()
+endif()
diff --git a/compiler-rt/lib/tysan/lit.cfg b/compiler-rt/lib/tysan/lit.cfg
new file mode 100644
index 00000000000000..bd2bbe855529a7
--- /dev/null
+++ b/compiler-rt/lib/tysan/lit.cfg
@@ -0,0 +1,35 @@
+# -*- Python -*-
+
+import os
+
+# Setup config name.
+config.name = 'TypeSanitizer' + getattr(config, 'name_suffix', 'default')
+
+# Setup source root.
+config.test_source_root = os.path.dirname(__file__)
+
+# Setup default compiler flags used with -fsanitize=type option.
+clang_tysan_cflags = (["-fsanitize=type",
+                      "-mno-omit-leaf-frame-pointer",
+                      "-fno-omit-frame-pointer",
+                      "-fno-optimize-sibling-calls"] +
+                      [config.target_cflags] +
+                      config.debug_info_flags)
+clang_tysan_cxxflags = config.cxx_mode_flags + clang_tysan_cflags
+
+def build_invocation(compile_flags):
+  return " " + " ".join([config.clang] + compile_flags) + " "
+
+config.substitutions.append( ("%clang_tysan ", build_invocation(clang_tysan_cflags)) )
+config.substitutions.append( ("%clangxx_tysan ", build_invocation(clang_tysan_cxxflags)) )
+
+# Default test suffixes.
+config.suffixes = ['.c', '.cc', '.cpp']
+
+# TypeSanitizer tests are currently supported on Linux only.
+if config.host_os not in ['Linux']:
+  config.unsupported = True
+
+if config.target_arch != 'aarch64':
+  config.available_features.add('stable-runtime')
+
diff --git a/compiler-rt/lib/tysan/lit.site.cfg.in b/compiler-rt/lib/tysan/lit.site.cfg.in
new file mode 100644
index 00000000000000..673d04e514379b
--- /dev/null
+++ b/compiler-rt/lib/tysan/lit.site.cfg.in
@@ -0,0 +1,12 @@
+@LIT_SITE_CFG_IN_HEADER@
+
+# Tool-specific config options.
+config.name_suffix = "@TYSAN_TEST_CONFIG_SUFFIX@"
+config.target_cflags = "@TYSAN_TEST_TARGET_CFLAGS@"
+config.target_arch = "@TYSAN_TEST_TARGET_ARCH@"
+
+# Load common config for all compiler-rt lit tests.
+lit_config.load_config(config, "@COMPILER_RT_BINARY_DIR@/test/lit.common.configured")
+
+# Load tool-specific config that would do the real work.
+lit_config.load_config(config, "@TYSAN_LIT_SOURCE_DIR@/lit.cfg")
diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
new file mode 100644
index 00000000000000..f627851d049e6a
--- /dev/null
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -0,0 +1,339 @@
+//===-- tysan.cpp ---------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of TypeSanitizer.
+//
+// TypeSanitizer runtime.
+//===----------------------------------------------------------------------===//
+
+#include "sanitizer_common/sanitizer_atomic.h"
+#include "sanitizer_common/sanitizer_common.h"
+#include "sanitizer_common/sanitizer_flag_parser.h"
+#include "sanitizer_common/sanitizer_flags.h"
+#include "sanitizer_common/sanitizer_libc.h"
+#include "sanitizer_common/sanitizer_report_decorator.h"
+#include "sanitizer_common/sanitizer_stacktrace.h"
+#include "sanitizer_common/sanitizer_symbolizer.h"
+
+#include "tysan/tysan.h"
+
+using namespace __sanitizer;
+using namespace __tysan;
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+tysan_set_type_unknown(const void *addr, uptr size) {
+  if (tysan_inited)
+    internal_memset(shadow_for(addr), 0, size * sizeof(uptr));
+}
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+tysan_copy_types(const void *daddr, const void *saddr, uptr size) {
+  if (tysan_inited)
+    internal_memmove(shadow_for(daddr), shadow_for(saddr), size * sizeof(uptr));
+}
+
+static const char *getDisplayName(const char *Name) {
+  if (Name[0] == '\0')
+    return "<anonymous type>";
+
+  // Clang generates tags for C++ types that demangle as typeinfo. Remove the
+  // prefix from the generated string.
+  const char TIPrefix[] = "typeinfo name for ";
+
+  const char *DName = Symbolizer::GetOrInit()->Demangle(Name);
+  if (!internal_strncmp(DName, TIPrefix, sizeof(TIPrefix) - 1))
+    DName += sizeof(TIPrefix) - 1;
+
+  return DName;
+}
+
+static void printTDName(tysan_type_descriptor *td) {
+  if (((sptr)td) <= 0) {
+    Printf("<unknown type>");
+    return;
+  }
+
+  switch (td->Tag) {
+  default:
+    DCHECK(0);
+    break;
+  case TYSAN_MEMBER_TD:
+    printTDName(td->Member.Access);
+    if (td->Member.Access != td->Member.Base) {
+      Printf(" (in ");
+      printTDName(td->Member.Base);
+      Printf(" at offset %zu)", td->Member.Offset);
+    }
+    break;
+  case TYSAN_STRUCT_TD:
+    Printf("%s", getDisplayName(
+                     (char *)(td->Struct.Members + td->Struct.MemberCount)));
+    break;
+  }
+}
+
+static tysan_type_descriptor *getRootTD(tysan_type_descriptor *TD) {
+  tysan_type_descriptor *RootTD = TD;
+
+  do {
+    RootTD = TD;
+
+    if (TD->Tag == TYSAN_STRUCT_TD) {
+      if (TD->Struct.MemberCount > 0)
+        TD = TD->Struct.Members[0].Type;
+      else
+        TD = nullptr;
+    } else if (TD->Tag == TYSAN_MEMBER_TD) {
+      TD = TD->Member.Access;
+    } else {
+      DCHECK(0);
+      break;
+    }
+  } while (TD);
+
+  return RootTD;
+}
+
+static bool isAliasingLegalUp(tysan_type_descriptor *TDA,
+                              tysan_type_descriptor *TDB) {
+  // Walk up the tree starting with TDA to see if we reach TDB.
+  uptr OffsetA = 0, OffsetB = 0;
+  if (TDB->Tag == TYSAN_MEMBER_TD) {
+    OffsetB = TDB->Member.Offset;
+    TDB = TDB->Member.Base;
+  }
+
+  if (TDA->Tag == TYSAN_MEMBER_TD) {
+    OffsetA = TDA->Member.Offset;
+    TDA = TDA->Member.Base;
+  }
+
+  do {
+    if (TDA == TDB)
+      return OffsetA == OffsetB;
+
+    if (TDA->Tag == TYSAN_STRUCT_TD) {
+      // Reached root type descriptor.
+      if (!TDA->Struct.MemberCount)
+        break;
+
+      uptr Idx = 0;
+      for (; Idx < TDA->Struct.MemberCount - 1; ++Idx) {
+        if (TDA->Struct.Members[Idx].Offset >= OffsetA)
+          break;
+      }
+
+      OffsetA -= TDA->Struct.Members[Idx].Offset;
+      TDA = TDA->Struct.Members[Idx].Type;
+    } else {
+      DCHECK(0);
+      break;
+    }
+  } while (TDA);
+
+  return false;
+}
+
+static bool isAliasingLegal(tysan_type_descriptor *TDA,
+                            tysan_type_descriptor *TDB) {
+  if (TDA == TDB || !TDB || !TDA)
+    return true;
+
+  // Aliasing is legal is the two types have different root nodes.
+  if (getRootTD(TDA) != getRootTD(TDB))
+    return true;
+
+  return isAliasingLegalUp(TDA, TDB) || isAliasingLegalUp(TDB, TDA);
+}
+
+namespace __tysan {
+class Decorator : public __sanitizer::SanitizerCommonDecorator {
+public:
+  Decorator() : SanitizerCommonDecorator() {}
+  const char *Warning() { return Red(); }
+  const char *Name() { return Green(); }
+  const char *End() { return Default(); }
+};
+} // namespace __tysan
+
+ALWAYS_INLINE
+static void reportError(void *Addr, int Size, tysan_type_descriptor *TD,
+                        tysan_type_descriptor *OldTD, const char *AccessStr,
+                        const char *DescStr, int Offset, uptr pc, uptr bp,
+                        uptr sp) {
+  Decorator d;
+  Printf("%s", d.Warning());
+  Report("ERROR: TypeSanitizer: type-aliasing-violation on address %p"
+         " (pc %p bp %p sp %p tid %llu)\n",
+         Addr, (void *)pc, (void *)bp, (void *)sp, GetTid());
+  Printf("%s", d.End());
+  Printf("%s of size %d at %p with type ", AccessStr, Size, Addr);
+
+  Printf("%s", d.Name());
+  printTDName(TD);
+  Printf("%s", d.End());
+
+  Printf(" %s of type ", DescStr);
+
+  Printf("%s", d.Name());
+  printTDName(OldTD);
+  Printf("%s", d.End());
+
+  if (Offset != 0)
+    Printf(" that starts at offset %d\n", Offset);
+  else
+    Printf("\n");
+
+  if (pc) {
+
+    bool request_fast = StackTrace::WillUseFastUnwind(true);
+    BufferedStackTrace ST;
+    ST.Unwind(kStackTraceMax, pc, bp, 0, 0, 0, request_fast);
+    ST.Print();
+  } else {
+    Printf("\n");
+  }
+}
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+__tysan_check(void *addr, int size, tysan_type_descriptor *td, int flags) {
+  GET_CALLER_PC_BP_SP;
+
+  bool IsRead = flags & 1;
+  bool IsWrite = flags & 2;
+  const char *AccessStr;
+  if (IsRead && !IsWrite)
+    AccessStr = "READ";
+  else if (!IsRead && IsWrite)
+    AccessStr = "WRITE";
+  else
+    AccessStr = "ATOMIC UPDATE";
+
+  tysan_type_descriptor **OldTDPtr = shadow_for(addr);
+  tysan_type_descriptor *OldTD = *OldTDPtr;
+  if (((sptr)OldTD) < 0) {
+    int i = -((sptr)OldTD);
+    OldTDPtr -= i;
+    OldTD = *OldTDPtr;
+
+    if (!isAliasingLegal(td, OldTD))
+      reportError(addr, size, td, OldTD, AccessStr,
+                  "accesses part of an existing object", -i, pc, bp, sp);
+
+    return;
+  }
+
+  if (!isAliasingLegal(td, OldTD)) {
+    reportError(addr, size, td, OldTD, AccessStr, "accesses an existing object",
+                0, pc, bp, sp);
+    return;
+  }
+
+  // These types are allowed to alias (or the stored type is unknown), report
+  // an error if we find an interior type.
+
+  for (int i = 0; i < size; ++i) {
+    OldTDPtr = shadow_for((void *)(((uptr)addr) + i));
+    OldTD = *OldTDPtr;
+    if (((sptr)OldTD) >= 0 && !isAliasingLegal(td, OldTD))
+      reportError(addr, size, td, OldTD, AccessStr,
+                  "partially accesses an object", i, pc, bp, sp);
+  }
+}
+
+Flags __tysan::flags_data;
+
+SANITIZER_INTERFACE_ATTRIBUTE uptr __tysan_shadow_memory_address;
+SANITIZER_INTERFACE_ATTRIBUTE uptr __tysan_app_memory_mask;
+
+#ifdef TYSAN_RUNTIME_VMA
+// Runtime detected VMA size.
+int __tysan::vmaSize;
+#endif
+
+void Flags::SetDefaults() {
+#define TYSAN_FLAG(Type, Name, DefaultValue, Description) Name = DefaultValue;
+#include "tysan_flags.inc"
+#undef TYSAN_FLAG
+}
+
+static void RegisterTySanFlags(FlagParser *parser, Flags *f) {
+#define TYSAN_FLAG(Type, Name, DefaultValue, Description)                      \
+  RegisterFlag(parser, #Name, Description, &f->Name);
+#include "tysan_flags.inc"
+#undef TYSAN_FLAG
+}
+
+static void InitializeFlags() {
+  SetCommonFlagsDefaults();
+  {
+    CommonFlags cf;
+    cf.CopyFrom(*common_flags());
+    cf.external_symbolizer_path = GetEnv("TYSAN_SYMBOLIZER_PATH");
+    OverrideCommonFlags(cf);
+  }
+
+  flags().SetDefaults();
+
+  FlagParser parser;
+  RegisterCommonFlags(&parser);
+  RegisterTySanFlags(&parser, &flags());
+  parser.ParseString(GetEnv("TYSAN_OPTIONS"));
+  InitializeCommonFlags();
+  if (Verbosity())
+    ReportUnrecognizedFlags();
+  if (common_flags()->help)
+    parser.PrintFlagDescriptions();
+}
+
+static void TySanInitializePlatformEarly() {
+  AvoidCVE_2016_2143();
+#ifdef TYSAN_RUNTIME_VMA
+  vmaSize = (MostSignificantSetBitIndex(GET_CURRENT_FRAME()) + 1);
+#if defined(__aarch64__) && !SANITIZER_APPLE
+  if (vmaSize != 39 && vmaSize != 42 && vmaSize != 48) {
+    Printf("FATAL: TypeSanitizer: unsupported VMA range\n");
+    Printf("FATAL: Found %d - Supported 39, 42 and 48\n", vmaSize);
+    Die();
+  }
+#endif
+#endif
+
+  __sanitizer::InitializePlatformEarly();
+
+  __tysan_shadow_memory_address = ShadowAddr();
+  __tysan_app_memory_mask = AppMask();
+}
+
+namespace __tysan {
+bool tysan_inited = false;
+bool tysan_init_is_running;
+} // namespace __tysan
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __tysan_init() {
+  CHECK(!tysan_init_is_running);
+  if (tysan_inited)
+    return;
+  tysan_init_is_running = true;
+
+  InitializeFlags();
+  TySanInitializePlatformEarly();
+
+  InitializeInterceptors();
+
+  if (!MmapFixedNoReserve(ShadowAddr(), AppAddr() - ShadowAddr()))
+    Die();
+
+  tysan_init_is_running = false;
+  tysan_inited = true;
+}
+
+#if SANITIZER_CAN_USE_PREINIT_ARRAY
+__attribute__((section(".preinit_array"),
+               used)) static void (*tysan_init_ptr)() = __tysan_init;
+#endif
diff --git a/compiler-rt/lib/tysan/tysan.h b/compiler-rt/lib/tysan/tysan.h
new file mode 100644
index 00000000000000..ec6f9587e9ce58
--- /dev/null
+++ b/compiler-rt/lib/tysan/tysan.h
@@ -0,0 +1,79 @@
+//===-- tysan.h -------------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of TypeSanitizer.
+//
+// Private TySan header.
+//===----------------------------------------------------------------------===//
+
+#ifndef TYSAN_H
+#define TYSAN_H
+
+#include "sanitizer_common/sanitizer_internal_defs.h"
+
+using __sanitizer::sptr;
+using __sanitizer::u16;
+using __sanitizer::uptr;
+
+#include "tysan_platform.h"
+
+extern "C" {
+void tysan_set_type_unknown(const void *addr, uptr size);
+void tysan_copy_types(const void *daddr, const void *saddr, uptr size);
+}
+
+namespace __tysan {
+extern bool tysan_inited;
+extern bool tysan_init_is_running;
+
+void InitializeInterceptors();
+
+enum { TYSAN_MEMBER_TD = 1, TYSAN_STRUCT_TD = 2 };
+
+struct tysan_member_type_descriptor {
+  struct tysan_type_descriptor *Base;
+  struct tysan_type_descriptor *Access;
+  uptr Offset;
+};
+
+struct tysan_struct_type_descriptor {
+  uptr MemberCount;
+  struct {
+    struct tysan_type_descriptor *Type;
+    uptr Offset;
+  } Members[1]; // Tail allocated.
+  // char Name[]; // Tail allocated.
+};
+
+struct tysan_type_descriptor {
+  uptr Tag;
+  union {
+    tysan_member_type_descriptor Member;
+    tysan_struct_type_descriptor Struct;
+  };
+};
+
+inline tysan_type_descriptor **shadow_for(const void *ptr) {
+  return (tysan_type_descriptor **)((((uptr)ptr) & AppMask()) * sizeof(ptr) +
+                                    ShadowAddr());
+}
+
+struct Flags {
+#define TYSAN_FLAG(Type, Name, Defaul...
[truncated]

Copy link

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r fbaff4033ba060d1478c924ba736be407b4edb80...00e6e657f700e84501cf46a786ccd9389e208ff2 compiler-rt/test/tysan/lit.cfg.py
View the diff from darker here.
--- lit.cfg.py	2023-12-22 19:18:34.000000 +0000
+++ lit.cfg.py	2023-12-22 19:20:58.101017 +0000
@@ -7,133 +7,155 @@
 import lit.formats
 
 # Get shlex.quote if available (added in 3.3), and fall back to pipes.quote if
 # it's not available.
 try:
-  import shlex
-  sh_quote = shlex.quote
+    import shlex
+
+    sh_quote = shlex.quote
 except:
-  import pipes
-  sh_quote = pipes.quote
+    import pipes
+
+    sh_quote = pipes.quote
+
 
 def get_required_attr(config, attr_name):
-  attr_value = getattr(config, attr_name, None)
-  if attr_value == None:
-    lit_config.fatal(
-      "No attribute %r in test configuration! You may need to run "
-      "tests from your build directory or add this attribute "
-      "to lit.site.cfg.py " % attr_name)
-  return attr_value
+    attr_value = getattr(config, attr_name, None)
+    if attr_value == None:
+        lit_config.fatal(
+            "No attribute %r in test configuration! You may need to run "
+            "tests from your build directory or add this attribute "
+            "to lit.site.cfg.py " % attr_name
+        )
+    return attr_value
+
 
 def push_dynamic_library_lookup_path(config, new_path):
-  if platform.system() == 'Windows':
-    dynamic_library_lookup_var = 'PATH'
-  elif platform.system() == 'Darwin':
-    dynamic_library_lookup_var = 'DYLD_LIBRARY_PATH'
-  else:
-    dynamic_library_lookup_var = 'LD_LIBRARY_PATH'
+    if platform.system() == "Windows":
+        dynamic_library_lookup_var = "PATH"
+    elif platform.system() == "Darwin":
+        dynamic_library_lookup_var = "DYLD_LIBRARY_PATH"
+    else:
+        dynamic_library_lookup_var = "LD_LIBRARY_PATH"
 
-  new_ld_library_path = os.path.pathsep.join(
-    (new_path, config.environment.get(dynamic_library_lookup_var, '')))
-  config.environment[dynamic_library_lookup_var] = new_ld_library_path
+    new_ld_library_path = os.path.pathsep.join(
+        (new_path, config.environment.get(dynamic_library_lookup_var, ""))
+    )
+    config.environment[dynamic_library_lookup_var] = new_ld_library_path
 
-  if platform.system() == 'FreeBSD':
-    dynamic_library_lookup_var = 'LD_32_LIBRARY_PATH'
-    new_ld_32_library_path = os.path.pathsep.join(
-      (new_path, config.environment.get(dynamic_library_lookup_var, '')))
-    config.environment[dynamic_library_lookup_var] = new_ld_32_library_path
+    if platform.system() == "FreeBSD":
+        dynamic_library_lookup_var = "LD_32_LIBRARY_PATH"
+        new_ld_32_library_path = os.path.pathsep.join(
+            (new_path, config.environment.get(dynamic_library_lookup_var, ""))
+        )
+        config.environment[dynamic_library_lookup_var] = new_ld_32_library_path
 
-  if platform.system() == 'SunOS':
-    dynamic_library_lookup_var = 'LD_LIBRARY_PATH_32'
-    new_ld_library_path_32 = os.path.pathsep.join(
-      (new_path, config.environment.get(dynamic_library_lookup_var, '')))
-    config.environment[dynamic_library_lookup_var] = new_ld_library_path_32
+    if platform.system() == "SunOS":
+        dynamic_library_lookup_var = "LD_LIBRARY_PATH_32"
+        new_ld_library_path_32 = os.path.pathsep.join(
+            (new_path, config.environment.get(dynamic_library_lookup_var, ""))
+        )
+        config.environment[dynamic_library_lookup_var] = new_ld_library_path_32
 
-    dynamic_library_lookup_var = 'LD_LIBRARY_PATH_64'
-    new_ld_library_path_64 = os.path.pathsep.join(
-      (new_path, config.environment.get(dynamic_library_lookup_var, '')))
-    config.environment[dynamic_library_lookup_var] = new_ld_library_path_64
+        dynamic_library_lookup_var = "LD_LIBRARY_PATH_64"
+        new_ld_library_path_64 = os.path.pathsep.join(
+            (new_path, config.environment.get(dynamic_library_lookup_var, ""))
+        )
+        config.environment[dynamic_library_lookup_var] = new_ld_library_path_64
+
 
 # Setup config name.
-config.name = 'TypeSanitizer' + config.name_suffix
+config.name = "TypeSanitizer" + config.name_suffix
 
 # Platform-specific default TYSAN_OPTIONS for lit tests.
 default_tysan_opts = list(config.default_sanitizer_opts)
 
 # On Darwin, leak checking is not enabled by default. Enable on macOS
 # tests to prevent regressions
-if config.host_os == 'Darwin' and config.apple_platform == 'osx':
-  default_tysan_opts += ['detect_leaks=1']
+if config.host_os == "Darwin" and config.apple_platform == "osx":
+    default_tysan_opts += ["detect_leaks=1"]
 
-default_tysan_opts_str = ':'.join(default_tysan_opts)
+default_tysan_opts_str = ":".join(default_tysan_opts)
 if default_tysan_opts_str:
-  config.environment['TYSAN_OPTIONS'] = default_tysan_opts_str
-  default_tysan_opts_str += ':'
-config.substitutions.append(('%env_tysan_opts=',
-                             'env TYSAN_OPTIONS=' + default_tysan_opts_str))
+    config.environment["TYSAN_OPTIONS"] = default_tysan_opts_str
+    default_tysan_opts_str += ":"
+config.substitutions.append(
+    ("%env_tysan_opts=", "env TYSAN_OPTIONS=" + default_tysan_opts_str)
+)
 
 # Setup source root.
 config.test_source_root = os.path.dirname(__file__)
 
-if config.host_os not in ['FreeBSD', 'NetBSD']:
-  libdl_flag = "-ldl"
+if config.host_os not in ["FreeBSD", "NetBSD"]:
+    libdl_flag = "-ldl"
 else:
-  libdl_flag = ""
+    libdl_flag = ""
 
 # GCC-ASan doesn't link in all the necessary libraries automatically, so
 # we have to do it ourselves.
-if config.compiler_id == 'GNU':
-  extra_link_flags = ["-pthread", "-lstdc++", libdl_flag]
+if config.compiler_id == "GNU":
+    extra_link_flags = ["-pthread", "-lstdc++", libdl_flag]
 else:
-  extra_link_flags = []
+    extra_link_flags = []
 
 # Setup default compiler flags used with -fsanitize=address option.
 # FIXME: Review the set of required flags and check if it can be reduced.
 target_cflags = [get_required_attr(config, "target_cflags")] + extra_link_flags
 target_cxxflags = config.cxx_mode_flags + target_cflags
-clang_tysan_static_cflags = (["-fsanitize=type",
-                            "-mno-omit-leaf-frame-pointer",
-                            "-fno-omit-frame-pointer",
-                            "-fno-optimize-sibling-calls"] +
-                            config.debug_info_flags + target_cflags)
-if config.target_arch == 's390x':
-  clang_tysan_static_cflags.append("-mbackchain")
+clang_tysan_static_cflags = (
+    [
+        "-fsanitize=type",
+        "-mno-omit-leaf-frame-pointer",
+        "-fno-omit-frame-pointer",
+        "-fno-optimize-sibling-calls",
+    ]
+    + config.debug_info_flags
+    + target_cflags
+)
+if config.target_arch == "s390x":
+    clang_tysan_static_cflags.append("-mbackchain")
 clang_tysan_static_cxxflags = config.cxx_mode_flags + clang_tysan_static_cflags
 
-clang_tysan_cflags = clang_tysan_static_cflags 
+clang_tysan_cflags = clang_tysan_static_cflags
 clang_tysan_cxxflags = clang_tysan_static_cxxflags
 
+
 def build_invocation(compile_flags):
-  return " " + " ".join([config.clang] + compile_flags) + " "
+    return " " + " ".join([config.clang] + compile_flags) + " "
 
-config.substitutions.append( ("%clang ", build_invocation(target_cflags)) )
-config.substitutions.append( ("%clangxx ", build_invocation(target_cxxflags)) )
-config.substitutions.append( ("%clang_tysan ", build_invocation(clang_tysan_cflags)) )
-config.substitutions.append( ("%clangxx_tysan ", build_invocation(clang_tysan_cxxflags)) )
+
+config.substitutions.append(("%clang ", build_invocation(target_cflags)))
+config.substitutions.append(("%clangxx ", build_invocation(target_cxxflags)))
+config.substitutions.append(("%clang_tysan ", build_invocation(clang_tysan_cflags)))
+config.substitutions.append(("%clangxx_tysan ", build_invocation(clang_tysan_cxxflags)))
 
 
 # FIXME: De-hardcode this path.
 tysan_source_dir = os.path.join(
-  get_required_attr(config, "compiler_rt_src_root"), "lib", "tysan")
+    get_required_attr(config, "compiler_rt_src_root"), "lib", "tysan"
+)
 python_exec = sh_quote(get_required_attr(config, "python_executable"))
 
 # Set LD_LIBRARY_PATH to pick dynamic runtime up properly.
 push_dynamic_library_lookup_path(config, config.compiler_rt_libdir)
 
 # Default test suffixes.
-config.suffixes = ['.c', '.cpp']
+config.suffixes = [".c", ".cpp"]
 
-if config.host_os == 'Darwin':
-  config.suffixes.append('.mm')
+if config.host_os == "Darwin":
+    config.suffixes.append(".mm")
 
-if config.host_os == 'Windows':
-  config.substitutions.append(('%fPIC', ''))
-  config.substitutions.append(('%fPIE', ''))
-  config.substitutions.append(('%pie', ''))
+if config.host_os == "Windows":
+    config.substitutions.append(("%fPIC", ""))
+    config.substitutions.append(("%fPIE", ""))
+    config.substitutions.append(("%pie", ""))
 else:
-  config.substitutions.append(('%fPIC', '-fPIC'))
-  config.substitutions.append(('%fPIE', '-fPIE'))
-  config.substitutions.append(('%pie', '-pie'))
+    config.substitutions.append(("%fPIC", "-fPIC"))
+    config.substitutions.append(("%fPIE", "-fPIE"))
+    config.substitutions.append(("%pie", "-pie"))
 
 # Only run the tests on supported OSs.
-if config.host_os not in ['Linux', 'Darwin',]:
-  config.unsupported = True
+if config.host_os not in [
+    "Linux",
+    "Darwin",
+]:
+    config.unsupported = True

fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 22, 2023
This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.

As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work.

(This includes build fixes for Linux from Mingjie Xu)

Based on https://reviews.llvm.org/D32197.

Pull Request: llvm#76261
@thesamesam thesamesam added the TBAA Type-Based Alias Analysis / Strict Aliasing label Dec 26, 2023
@@ -720,7 +726,7 @@ if(COMPILER_RT_SUPPORTED_ARCH)
endif()
message(STATUS "Compiler-RT supported architectures: ${COMPILER_RT_SUPPORTED_ARCH}")

set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;tysan,safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ tysan;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated!

…hn/tysan-a-type-sanitizer-clang

Conflicts:
	llvm/include/llvm/Bitcode/LLVMBitCodes.h
@fhahn fhahn requested a review from Endilll as a code owner April 19, 2024 11:27
@fhahn fhahn changed the base branch from users/fhahn/main.tysan-a-type-sanitizer-runtime-library to users/fhahn/tysan-a-type-sanitizer-clang April 19, 2024 11:29
@Endilll Endilll removed their request for review April 19, 2024 11:54
@llvmbot llvmbot added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen labels Apr 22, 2024
@fhahn
Copy link
Contributor Author

fhahn commented Apr 22, 2024

Added compiler-rt tests for various strict-aliasing violations from the bug tracker I found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category compiler-rt:sanitizer compiler-rt TBAA Type-Based Alias Analysis / Strict Aliasing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants