Skip to content

Conversation

@BStott6
Copy link
Contributor

@BStott6 BStott6 commented Nov 4, 2025

Changed the TypeSanitizer instrumentation pass to parse TBAA's pN T (e.g. p2 int) pointer type names and rewrite them in a more user-familiar T* notation. Updated TySan docs to remove the explanation for the strange pointer type names. Updated TySan regression tests which refer to the pointer type formatting to match the new formatting. Added a command line option in TypeSanitizer.cpp to use the old TBAA-style type names instead (tysan-use-tbaa-type-names).

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt compiler-rt:sanitizer llvm:transforms labels Nov 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (BStott6)

Changes

Changed the TypeSanitizer instrumentation pass to parse TBAA's pN T (e.g. p2 int) pointer type names and rewrite them in a more user-familiar T* notation. Updated TySan docs to remove the explanation for the strange pointer type names. Updated TySan regression tests which refer to the pointer type formatting to match the new formatting. Added a command line option in TypeSanitizer.cpp to use the old TBAA-style type names instead (tysan-use-tbaa-type-names).


Full diff: https://github.com/llvm/llvm-project/pull/166381.diff

4 Files Affected:

  • (modified) clang/docs/TypeSanitizer.rst (-2)
  • (modified) compiler-rt/test/tysan/print_stacktrace.c (+1-1)
  • (modified) compiler-rt/test/tysan/ptr-float.c (+1-1)
  • (modified) llvm/lib/Transforms/Instrumentation/TypeSanitizer.cpp (+39-1)
diff --git a/clang/docs/TypeSanitizer.rst b/clang/docs/TypeSanitizer.rst
index 3c683a6c24bb4..c2f628cb231db 100644
--- a/clang/docs/TypeSanitizer.rst
+++ b/clang/docs/TypeSanitizer.rst
@@ -119,8 +119,6 @@ brief dictionary of these terms.
 
 * ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++ 
   type ``char``.
-* ``type p[x]``: This signifies pointers to the type. ``x`` is the number of indirections to reach the final value.
-  As an example, a pointer to a pointer to an integer would be ``type p2 int``.
 
 TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove 
 references to LLVM IR specific terms.
diff --git a/compiler-rt/test/tysan/print_stacktrace.c b/compiler-rt/test/tysan/print_stacktrace.c
index 3ffb6063377d9..831be5e4afed9 100644
--- a/compiler-rt/test/tysan/print_stacktrace.c
+++ b/compiler-rt/test/tysan/print_stacktrace.c
@@ -10,7 +10,7 @@ void zero_array() {
   for (i = 0; i < 1; ++i)
     P[i] = 0.0f;
   // CHECK: ERROR: TypeSanitizer: type-aliasing-violation
-  // CHECK: WRITE of size 4 at {{.*}} with type float accesses an existing object of type p1 float
+  // CHECK: WRITE of size 4 at {{.*}} with type float accesses an existing object of type float*
   // CHECK: {{#0 0x.* in zero_array .*print_stacktrace.c:}}[[@LINE-3]]
   // CHECK-SHORT-NOT: {{#1 0x.* in main .*print_stacktrace.c}}
   // CHECK-LONG-NEXT: {{#1 0x.* in main .*print_stacktrace.c}}
diff --git a/compiler-rt/test/tysan/ptr-float.c b/compiler-rt/test/tysan/ptr-float.c
index aaa9895986988..145d5d8f289ea 100644
--- a/compiler-rt/test/tysan/ptr-float.c
+++ b/compiler-rt/test/tysan/ptr-float.c
@@ -7,7 +7,7 @@ void zero_array() {
   for (i = 0; i < 1; ++i)
     P[i] = 0.0f;
   // CHECK: ERROR: TypeSanitizer: type-aliasing-violation
-  // CHECK: WRITE of size 4 at {{.*}} with type float accesses an existing object of type p1 float
+  // CHECK: WRITE of size 4 at {{.*}} with type float accesses an existing object of type float*
   // CHECK: {{#0 0x.* in zero_array .*ptr-float.c:}}[[@LINE-3]]
 }
 
diff --git a/llvm/lib/Transforms/Instrumentation/TypeSanitizer.cpp b/llvm/lib/Transforms/Instrumentation/TypeSanitizer.cpp
index 87eba5f2c5242..e5109c047584e 100644
--- a/llvm/lib/Transforms/Instrumentation/TypeSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/TypeSanitizer.cpp
@@ -70,6 +70,12 @@ static cl::opt<bool> ClVerifyOutlinedInstrumentation(
              "function calls. This verifies that they behave the same."),
     cl::Hidden, cl::init(false));
 
+static cl::opt<bool> ClUseTBAATypeNames(
+    "tysan-use-tbaa-type-names",
+    cl::desc("Print TBAA-style type names for pointers rather than C-style "
+             "names (e.g. 'p2 int' rather than 'int**')"),
+    cl::Hidden, cl::init(false));
+
 STATISTIC(NumInstrumentedAccesses, "Number of instrumented accesses");
 
 namespace {
@@ -260,6 +266,29 @@ static std::string encodeName(StringRef Name) {
   return Output;
 }
 
+/// Converts pointer type names from TBAA "p2 int" style to C style ("int**").
+/// Currently leaves "omnipotent char" unchanged - not sure of a user-friendly name for this type.
+/// If the type name was changed, returns true and stores the new type name in `Dest`.
+/// Otherwise, returns false (`Dest` is unchanged).
+static bool convertTBAAStyleTypeNamesToCStyle(StringRef TypeName, std::string &Dest) {
+  if (!TypeName.consume_front("p"))
+    return false;
+
+  int Indirection;
+  if (TypeName.consumeInteger(10, Indirection))
+    return false;
+
+  if (!TypeName.consume_front(" "))
+    return false;
+
+  Dest.clear();
+  Dest.reserve(TypeName.size() + Indirection); // One * per indirection
+  Dest.append(TypeName);
+  Dest.append(Indirection, '*');
+
+  return true;
+}
+
 std::string
 TypeSanitizer::getAnonymousStructIdentifier(const MDNode *MD,
                                             TypeNameMapTy &TypeNames) {
@@ -355,7 +384,16 @@ bool TypeSanitizer::generateBaseTypeDescriptor(
   //   [2, member count, [type pointer, offset]..., name]
 
   LLVMContext &C = MD->getContext();
-  Constant *NameData = ConstantDataArray::getString(C, NameNode->getString());
+  StringRef TypeName = NameNode->getString();
+
+  // Convert LLVM-internal TBAA-style type names to C-style type names
+  // (more user-friendly)
+  std::string CStyleTypeName;
+  if (!ClUseTBAATypeNames)
+    if (convertTBAAStyleTypeNamesToCStyle(TypeName, CStyleTypeName))
+      TypeName = CStyleTypeName;
+
+  Constant *NameData = ConstantDataArray::getString(C, TypeName);
   SmallVector<Type *> TDSubTys;
   SmallVector<Constant *> TDSubData;
 

@gbMattN gbMattN requested review from fhahn and gbMattN November 4, 2025 15:13

* ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++
type ``char``.
* ``type p[x]``: This signifies pointers to the type. ``x`` is the number of indirections to reach the final value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use "print tbaa name" flag then this terminology is still used, so maybe we want to keep it in the docs? Adding a note before/ after that this is no longer default behaviour would maybe be better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need the flag. Given that we print the C type name for other cases, I think it would make sense to always print the pointer in C style, and remove the flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are always doing C style maybe it would make sense then to change "omnipotent char" to char as well? That or change the docs to state the name is permanent

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang Clang issues not falling into any other category compiler-rt:sanitizer compiler-rt llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants