Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sanitizer_symbolizer] Add end to end test for symbolizer markup. #77702

Conversation

avillega
Copy link
Contributor

Add more tests for sanitizer symbolizer markup, that make sure
the output of the symbolizer markup can be symbolized by llvm-symbolizer.

Created using spr 1.3.5
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 10, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Andres Villegas (avillega)

Changes

Add more tests for sanitizer symbolizer markup, that make sure
the output of the symbolizer markup can be symbolized by llvm-symbolizer.


Full diff: https://github.com/llvm/llvm-project/pull/77702.diff

2 Files Affected:

  • (added) compiler-rt/test/asan/TestCases/use-after-free-symbolizer-markup.cpp (+34)
  • (added) compiler-rt/test/tsan/simple_stack_symbolizer_markup.cpp (+59)
diff --git a/compiler-rt/test/asan/TestCases/use-after-free-symbolizer-markup.cpp b/compiler-rt/test/asan/TestCases/use-after-free-symbolizer-markup.cpp
new file mode 100644
index 00000000000000..548ae57b5c3797
--- /dev/null
+++ b/compiler-rt/test/asan/TestCases/use-after-free-symbolizer-markup.cpp
@@ -0,0 +1,34 @@
+// COM: End to end test for the sanitizer symbolizer markup. Since it uses debug info
+// COM: to do offline symbolization we only check that the current module correctly is correctly symbolized  
+// REQUIRES: linux
+// RUN: %clangxx_asan %s -Wl,--build-id=0x12345678 -o %t.main
+// RUN: mkdir -p %t/.build-id/12
+// RUN: cp %t.main %t/.build-id/12/345678.debug
+// RUN: %env_asan_opts=enable_symbolizer_markup=1 not %run %t.main 2>%t/sanitizer.out
+// RUN: llvm-symbolizer --filter-markup --debug-file-directory=%t < %t/sanitizer.out | FileCheck %s
+
+#include <stdlib.h>
+
+[[gnu::noinline]]
+char *alloc() {
+  char *x = (char*)malloc(10 * sizeof(char));
+  return x;
+}
+int main() {
+  char *x = alloc();
+  free(x);
+  return x[5];
+}
+// CHECK: {{.*ERROR: AddressSanitizer: heap-use-after-free on address}}
+// CHECK: {{0x.* at pc 0x.* bp 0x.* sp 0x.*}}
+// CHECK: {{READ of size 1 at 0x.* thread T0}}
+// CHECK: {{    #0 0x.* main .*use-after-free-symbolizer-markup.cpp:}}[[@LINE-5]]
+// CHECK: {{0x.* is located 5 bytes inside of 10-byte region .0x.*,0x.*}}
+// CHECK: {{freed by thread T0 here:}}
+// CHECK: {{    #1 0x.* main .*use-after-free-symbolizer-markup.cpp:}}[[@LINE-9]]
+// CHECK: {{previously allocated by thread T0 here:}} 
+// CHECK: {{    #1 0x.* alloc\(\) .*use-after-free-symbolizer-markup.cpp:}}[[@LINE-16]]
+// CHECK: {{    #2 0x.* main .*use-after-free-symbolizer-markup.cpp:}}[[@LINE-13]]
+// CHECK: Shadow byte legend (one shadow byte represents {{[0-9]+}} application bytes):
+// CHECK: Global redzone:
+// CHECK: ASan internal:
diff --git a/compiler-rt/test/tsan/simple_stack_symbolizer_markup.cpp b/compiler-rt/test/tsan/simple_stack_symbolizer_markup.cpp
new file mode 100644
index 00000000000000..5798986d73839e
--- /dev/null
+++ b/compiler-rt/test/tsan/simple_stack_symbolizer_markup.cpp
@@ -0,0 +1,59 @@
+// REQUIRES: linux
+// RUN: %clangxx_tsan %s -Wl,--build-id=0x12345678 -O1 -o %t.main 
+// RUN: mkdir -p %t/.build-id/12
+// RUN: cp %t.main %t/.build-id/12/345678.debug
+// RUN: %env_tsan_opts=enable_symbolizer_markup=1 %deflake %run %t.main >%t/sanitizer.out
+// RUN: llvm-symbolizer --filter-markup --debug-file-directory=%t < %t/sanitizer.out | FileCheck %s --dump-input=always
+
+#include "test.h"
+
+int Global;
+
+void __attribute__((noinline)) foo1() {
+  Global = 42;
+}
+
+void __attribute__((noinline)) bar1() {
+  volatile int tmp = 42;
+  int tmp2 = tmp;
+  (void)tmp2;
+  foo1();
+}
+
+void __attribute__((noinline)) foo2() {
+  volatile int tmp = Global;
+  int tmp2 = tmp;
+  (void)tmp2;
+}
+
+void __attribute__((noinline)) bar2() {
+  volatile int tmp = 42;
+  int tmp2 = tmp;
+  (void)tmp2;
+  foo2();
+}
+
+void *Thread1(void *x) {
+  barrier_wait(&barrier);
+  bar1();
+  return NULL;
+}
+
+int main() {
+  barrier_init(&barrier, 2);
+  pthread_t t;
+  pthread_create(&t, NULL, Thread1, NULL);
+  bar2();
+  barrier_wait(&barrier);
+  pthread_join(t, NULL);
+}
+
+//      CHECK: WARNING: ThreadSanitizer: data race
+//      CHECK:   Write of size 4 at {{.*}} by thread T1:
+//      CHECK:     #0 {{0x.*}} foo1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[@LINE-40]]
+// CHECK-NEXT:     #1 {{0x.*}} bar1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[@LINE-34]]
+// CHECK-NEXT:     #2 {{0x.*}} Thread1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[@LINE-17]]
+//      CHECK:   Previous read of size 4 at {{.*}} by main thread:
+//      CHECK:     #0 {{0x.*}} foo2{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[@LINE-33]]
+// CHECK-NEXT:     #1 {{0x.*}} bar2{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[@LINE-25]]
+// CHECK-NEXT:     #2 {{0x.*}} main{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[@LINE-13]]

@avillega
Copy link
Contributor Author

I plan to add some ubsan and hwasan test aswell, but asan and tsan are just faster to add right now.

Copy link

github-actions bot commented Jan 10, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@fmayer
Copy link
Contributor

fmayer commented Jan 10, 2024

I plan to add some ubsan and hwasan test aswell, but asan and tsan are just faster to add right now.

FWIW it might be better to add them separately anyway, to make rolling back easier if a buildbot breaks.

Created using spr 1.3.5
Created using spr 1.3.5
return x[5];
}
// CHECK: {{.*ERROR: AddressSanitizer: heap-use-after-free on address}}
// CHECK: {{0x.* at pc 0x.* bp 0x.* sp 0x.*}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you expect these to be in order, CHECK-NEXT: or CHECK-DAG: are probably better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added CHECK-NEXT and CHECK-DAG in some places. because of how the symvolizer markup context elements currently work, it is possible to get a line containing module information in the middle of two lines that might be expected to be NEXT to each other. I think it is something we might want to review.

// CHECK: {{freed by thread T0 here:}}
// CHECK: {{ #1 0x.* main .*use-after-free-symbolizer-markup.cpp:}}[[@LINE-9]]
// CHECK: {{previously allocated by thread T0 here:}}
// CHECK: {{ #1 0x.* alloc\(\) .*use-after-free-symbolizer-markup.cpp:}}[[@LINE-16]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CHECK: {{ #1 0x.* alloc\(\) .*use-after-free-symbolizer-markup.cpp:}}[[@LINE-16]]
// CHECK: {{ #1 0x.* alloc\(\) .*use-after-free-symbolizer-markup.cpp:}}[[#@LINE-16]]

Note: [[@LINE]] is deprecated. Use [[#@LINE]].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

compiler-rt/test/tsan/simple_stack_symbolizer_markup.cpp Outdated Show resolved Hide resolved
compiler-rt/test/tsan/simple_stack_symbolizer_markup.cpp Outdated Show resolved Hide resolved
Created using spr 1.3.5
@avillega avillega requested a review from ilovepi January 11, 2024 20:27
Created using spr 1.3.5
Copy link
Contributor

@ilovepi ilovepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly good IMO, modulo a few small nits in the FileCheck directives that should be applied to both files as appropriate.


// CHECK: WARNING: ThreadSanitizer: data race
// CHECK: Write of size 4 at {{.*}} by thread T1:
// CHECK-DAG: #0 {{0x.*}} foo1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 39]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way CHECK-DAG is used here is a bit strange IMO. The ordering shouldn't be interleaved in anyway, right? The frames should print in a set order.

In that case CHECK: seems more appropriate, with CHECK-NEXT: for the following frames.

// CHECK-DAG: #0 {{0x.*}} foo1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 39]]
// CHECK-NEXT: #1 {{0x.*}} bar1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 34]]
// CHECK-NEXT: #2 {{0x.*}} Thread1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 17]]
// CHECK-DAG: Previous read of size 4 at {{.*}} by main thread:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this always print before the following frames? If so, then CHECK: is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// CHECK-NEXT: #1 {{0x.*}} bar1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 34]]
// CHECK-NEXT: #2 {{0x.*}} Thread1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 17]]
// CHECK-DAG: Previous read of size 4 at {{.*}} by main thread:
// CHECK-DAG: #0 {{0x.*}} foo2{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 33]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this always comes after the `Previous read of size...", then use CHECK-NEXT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. Contextual elements might get emitted any time a call to print a non contextual element happens. So it is possible that in between Previous read of size 4 at {{.*}} by main thread: and #0... some contextual elements get printed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While what I wrote is true for real workloads, it might be impossible to trigger unless I specifically add code to the test to do it, i.e. load a module in at runtime. I might add a tests for that in another PR though. For now will try to make the tests use CHECK and CHECK-NEXT instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more sophisticated users of FileCheck then me, but my rule of thumb is:

  • If the ordering is fixed, but other elements that you will not try to match may occur, use CHECK. (If its ambiguous, match some other property first)
  • If the order not fixed for elements I need to match, use CHECK-DAG.
  • If the order is completely fixed: use CHECK-NEXT after the initial match.

That's served me pretty well, but YMMV.

In this case, we can probably just use CHECK, if its possible other output will be interleaved, but the total order is fixed (which I think is this case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// CHECK: WARNING: ThreadSanitizer: data race
// CHECK: Write of size 4 at {{.*}} by thread T1:
// CHECK-DAG: #0 {{0x.*}} foo1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 39]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CHECK-DAG: #0 {{0x.*}} foo1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 39]]
// CHECK: #0 0x[[%.8X]] foo1{{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 39]]

or

Suggested change
// CHECK-DAG: #0 {{0x.*}} foo1{{.*}} {{.*}}simple_stack_symbolizer_markup.cpp:[[#@LINE - 39]]
// CHECK: #0 0x[[%.8X]] foo1
// CHECK-SAME: simple_stack_symbolizer_markup.cpp:[[#@LINE - 40]]

This version requires more changes to LINE directives, though.

If the {{.*}}foo1{{.*}} are required for name mangling, it's acceptable to either make this a C test or to use extern "C" on the function declarations to avoid mangling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{.*}}simple_stack_symbolizer_markup.cpp is not for name mangling or matching spaces. It is to avoid having to match the whole file path. The markup filter uses absolute paths when printing file names. foo1{{.*}} is necessary because depending on some flags it is possible that foo1 is printed slightly different, so I am trying to make the test less fragile. I considered 0x[[%.8X]] but was worried that not all targets might print 8 byte addresses and went with the less specific version and try to match anything after the 0x, Do you think this might be a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{.*}}simple_stack_symbolizer_markup.cpp is not for name mangling or matching spaces. It is to avoid having to match the whole file path. The markup filter uses absolute paths when printing file names.

You can probably omit the filename, since you're going to match the LINE. That is a pretty common approach, but I don't feel strongly one way or another on that. So its up to you.

foo1{{.*}} is necessary because depending on some flags it is possible that foo1 is printed slightly different, so I am trying to make the test less fragile.

If there is different output based on flags, its probably worth another test with and without those flags. Well, unless that's tested elsewhere. But even if that's the case, you should probably just set one of those flags to make it print the way you expect, and add a comment about why you only want to test the single config here.

I considered 0x[[%.8X]] but was worried that not all targets might print 8 byte addresses and went with the less specific version and try to match anything after the 0x, Do you think this might be a problem?

well, you can just match 0x[[#]] if its a problem, but this is only supported on Linux, right? Its probably OK to assume 64bit addresses in that case. I'm fine either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the redundant patterns, but kept the {{0x.}} match for addresses because I could find examples in the llvm-symbolizer docs of usages of different address sizes.

compiler-rt/test/tsan/simple_stack_symbolizer_markup.cpp Outdated Show resolved Hide resolved
Created using spr 1.3.5
@llvmbot llvmbot added compiler-rt:asan Address sanitizer compiler-rt:tsan Thread sanitizer labels Mar 11, 2024
@avillega avillega merged commit 41fa0ea into users/avillega/main.sanitizer_symbolizer-add-end-to-end-test-for-symbolizer-markup Mar 11, 2024
8 checks passed
@avillega avillega deleted the users/avillega/sanitizer_symbolizer-add-end-to-end-test-for-symbolizer-markup branch March 11, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants