Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddressSanitizer support #694

Closed
grondo opened this Issue Jun 13, 2016 · 29 comments

Comments

Projects
None yet
3 participants
@grondo
Copy link
Contributor

grondo commented Jun 13, 2016

@dongahn worked long ago on a branch with Asan support here

https://github.com/dongahn/flux-core/commits/san

The development of which was documented in #359, however I did not find an official PR.
We should rebase Dong's work and merge it if possible.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jun 27, 2016

Ack.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 6, 2016

It seems, clang 3.7.0 or higher doesn't like a libev function definition.

cab668{dahn}113: use clang-3.7.0
Prepending: clang-3.7.0 (ok)

make[3]: Entering directory `/g/g0/dahn/workspace/Flux/flux-core/src/common/libev'
depbase=`echo ev.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
    /bin/sh ../../../libtool  --tag=CC   --mode=compile clang -DHAVE_CONFIG_H -I. -I../../../config  -w   -g -O2 -fsanitize=address -fno-omit-frame-pointer -MT ev.lo -MD -MP -MF $depbase.Tpo -c -o ev.lo ev.c &&\
    mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  clang -DHAVE_CONFIG_H -I. -I../../../config -w -g -O2 -fsanitize=address -fno-omit-frame-pointer -MT ev.lo -MD -MP -MF .deps/ev.Tpo -c ev.c  -fPIC -DPIC -o .libs/ev.o
ev.c:1029:42: error: '_Noreturn' keyword must precede function declarator
  ecb_inline void ecb_unreachable (void) ecb_noreturn;
                                         ^~~~~~~~~~~~
  _Noreturn 
ev.c:832:26: note: expanded from macro 'ecb_noreturn'
  #define ecb_noreturn   _Noreturn
                         ^
1 error generated.
make[3]: *** [ev.lo] Error 1
make[3]: Leaving directory `/g/g0/dahn/workspace/Flux/flux-core/src/common/libev'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/g/g0/dahn/workspace/Flux/flux-core/src/common'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/g/g0/dahn/workspace/Flux/flux-core/src'
make: *** [all-recursive] Error 1

For the time being, clang 3.5.0 doesn't seem to fail on this so I will use that version to test my Sanitizer support. But we probably should look at this issue a bit more carefully. It may simply boil down to some simple macro support.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 6, 2016

This was pointed out on the libev list a while back.

We're running libev 4.19. Upstream is 4.22 now.

It does look like it might be fixed now. Our code:

#if ECB_GCC_VERSION(4,5)
  #define ecb_unreachable() __builtin_unreachable ()
#else
  /* this seems to work fine, but gcc always emits a warning for it :/ */
  ecb_inline void ecb_unreachable (void) ecb_noreturn;
  ecb_inline void ecb_unreachable (void) { }
#endif

Upstream code:

#if ECB_GCC_VERSION(4,5) || ECB_CLANG_BUILTIN(__builtin_unreachable)
1121      #define ecb_unreachable() __builtin_unreachable ()
1122    #else
1123      /* this seems to work fine, but gcc always emits a warning for it :/ */
1124      ecb_inline ecb_noreturn void ecb_unreachable (void);
1125      ecb_inline ecb_noreturn void ecb_unreachable (void) { }
1126    #endif

I'll open a separate issue to update libev.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 6, 2016

Thanks!

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

I am running make check with AddressSanitizer-enabled build and found that it reported the following error and stopped the make check. I forget how to continue on an error with make check? Any one?

I would like to submit a PR after validating that all of the tests can be run under various Sanitizer tools first. I think we can defer resolving any errors that are true positives, of course.

=================================================================
==36257==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffcbf0 at pc 0x000000440715 bp 0x7fffffffca50 sp 0x7fffffffc210
WRITE of size 56 at 0x7fffffffcbf0 thread T0
    #0 0x440714 in unpoison_tm /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:573
    #1 0x440714 in __interceptor_mktime.part.68 /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:684
    #2 0x4ba1da in parse_date_basic /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/approxidate.c:581:22
    #3 0x4ba76c in approxidate /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/approxidate.c:919:7
    #4 0x4b75d5 in approxidate_tm /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/cronodate.c:15:9
    #5 0x4b75d5 in cronodate_check_match /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/cronodate.c:26
    #6 0x4b75d5 in main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/cronodate.c:135
    #7 0x2aaaabeced1c in __libc_start_main /usr/src/debug/glibc-2.12-2-gc4ccff1/csu/libc-start.c:226
    #8 0x4b67bc in _start (/g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test_cronodate.t+0x4b67bc)

Address 0x7fffffffcbf0 is located in stack of thread T0 at offset 208 in frame
    #0 0x4b8a4f in parse_date_basic /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/approxidate.c:530

  This frame has 6 object(s):
    [32, 40) 'end.i147'
    [64, 72) 'end.i130'
    [96, 104) 'time.i'
    [128, 136) 'end.i'
    [160, 208) 'tm'
    [240, 244) 'dummy_offset' <== Memory access at offset 208 partially underflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:573 unpoison_tm
Shadow bytes around the buggy address:
  0x10007fff7920: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7930: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7960: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 00 f2 f2 f2
=>0x10007fff7970: 00 f2 f2 f2 00 f2 f2 f2 00 00 00 00 00 00[f2]f2
  0x10007fff7980: f2 f2 04 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7990: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff79a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff79b0: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 00 f2 f2 f2
  0x10007fff79c0: 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  ASan internal:           fe
==36257==ABORTING
FAIL: test_cronodate.t
ok 1 - wallclock_get_zulu() works: 2016-07-06T23:57:49.013747Z
@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

Hm, this could be a true positive? From time.h:

struct tm
{
  int tm_sec;           /* Seconds. [0-60] (1 leap second) */
  int tm_min;           /* Minutes. [0-59] */
  int tm_hour;          /* Hours.   [0-23] */
  int tm_mday;          /* Day.     [1-31] */
  int tm_mon;           /* Month.   [0-11] */
  int tm_year;          /* Year - 1900.  */
  int tm_wday;          /* Day of week. [0-6] */
  int tm_yday;          /* Days in year.[0-365] */
  int tm_isdst;         /* DST.     [-1/0/1]*/

#ifdef  __USE_BSD
  long int tm_gmtoff;       /* Seconds east of UTC.  */
  __const char *tm_zone;    /* Timezone abbreviation.  */
#else
  long int __tm_gmtoff;     /* Seconds east of UTC.  */
  __const char *__tm_zone;  /* Timezone abbreviation.  */
#endif
};

So, struct tm may have more fields than what's explained in the man page:

       Broken-down time is stored in the structure tm which is defined in <time.h> as follows:

           struct tm {
               int tm_sec;         /* seconds */
               int tm_min;         /* minutes */
               int tm_hour;        /* hours */
               int tm_mday;        /* day of the month */
               int tm_mon;         /* month */
               int tm_year;        /* year */
               int tm_wday;        /* day of the week */
               int tm_yday;        /* day in the year */
               int tm_isdst;       /* daylight saving time */
           };

Given the augmented tm structure at here, mktemp can actually overflow?

I will add these extra two fields into the augmented structure and see what AddressSanitizer says.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

It turned out the same buffer flow issue reported by AddressSanitizer failed two test cases:

test_approxidate.t and test_cronodate.t

Interestingly enough, if I pad the structure to:

struct atm {
    int tm_sec;
    int tm_min;
    int tm_hour;
    int tm_mday;
    int tm_mon;
    int tm_year;
    int tm_wday;
    int tm_yday;
    int tm_isdst;
    long int __tm_gmtoff;
    __const char *__tm_zone;
    long tm_usec;
};

AddressSanitizer doesn't report the overflow issue for these two cases any longer. In fact, test_cronodate.t always succeeded. However, test_approxidate.t fails with a similar symptom reported in Issue #715.

not ok 34 - line 96: 1388620800 = 1388534400

#   Failed test 'line 96: 1388620800 = 1388534400
'
#   at test/approxidate.c line 96.
not ok 35 - line 99: 1388620800 = 1388534400

#   Failed test 'line 99: 1388620800 = 1388534400
'
#   at test/approxidate.c line 99.
not ok 36 - line 112: 1481328000 = 1481241600

#   Failed test 'line 112: 1481328000 = 1481241600
'
#   at test/approxidate.c line 112.
ok 37 - usec calculation for anonymous time is correct
1..37
@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

test_nodeset.t also failed with leaks:

==46454==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x497a79 in __interceptor_malloc /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:40
    #1 0x4b88f6 in nodeset_create_size /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/nodeset.c:104:20
    #2 0x4b9023 in nodeset_create /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/nodeset.c:125:12
    #3 0x4b9023 in nodeset_create_string /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/nodeset.c:171
    #4 0x4b76a5 in main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/nodeset.c:207:9
    #5 0x2aaaabeced1c in __libc_start_main /usr/src/debug/glibc-2.12-2-gc4ccff1/csu/libc-start.c:226

Indirect leak of 136 byte(s) in 1 object(s) allocated from:
    #0 0x497a79 in __interceptor_malloc /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:40
    #1 0x4bd32a in vebnew /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/veb.c:231:8
    #2 0x4b8969 in nodeset_create_size /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/nodeset.c:109:12
    #3 0x4b9023 in nodeset_create /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/nodeset.c:125:12
    #4 0x4b9023 in nodeset_create_string /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/nodeset.c:171
    #5 0x4b76a5 in main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/nodeset.c:207:9
    #6 0x2aaaabeced1c in __libc_start_main /usr/src/debug/glibc-2.12-2-gc4ccff1/csu/libc-start.c:226

SUMMARY: AddressSanitizer: 184 byte(s) leaked in 2 allocation(s).
FAIL: test_nodeset.t

These seem to be minor leaks at here and AddressSanitizer doesn't complain when nodeset_destory (n) is added.

When fixes are minor enough, I will make a patch and commit them as part of the upcoming Sanitizer PR.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

Finally,

ok 1 - coproc_create works
==47009==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
ok 2 - coproc args are valid
ok 3 - coproc_start works
ok 4 - coproc returned
ok 5 - rc is set to coproc return value
ok 6 - coproc args are valid
ok 7 - coproc_start works
ok 8 - coproc did not return (yielded)
ok 9 - coproc_resume works
ok 10 - coproc did not return (yielded)
ok 11 - coproc_resume works
ok 12 - coproc returned
ok 13 - rc is set to coproc return value
ok 14 - coproc_resume on returned coproc fails with EINVAL
ASAN:SIGSEGV
=================================================================
==47009==ERROR: AddressSanitizer: SEGV on unknown address 0x2aaab200b000 (pc 0x000000421ca8 bp 0x2aaab200b000 sp 0x7ffffffe9600 T0)
    #0 0x421ca7 in __asan::AsanChunk::UsedSize(bool) /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/asan_allocator2.cc:157
    #1 0x421ca7 in QuarantineChunk /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/asan_allocator2.cc:447
    #2 0x421ca7 in Deallocate /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/asan_allocator2.cc:476
    #3 0x421ca7 in __asan::asan_free(void*, __sanitizer::StackTrace*, __asan::AllocType) /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/asan_allocator2.cc:598
    #4 0x4977ce in free /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:31
    #5 0x4b7ee9 in coproc_destroy /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/coproc.c:74:13
    #6 0x4b6f99 in main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:138:5
    #7 0x2aaaabeced1c in __libc_start_main /usr/src/debug/glibc-2.12-2-gc4ccff1/csu/libc-start.c:226
    #8 0x4b662c in _start (/g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test_coproc.t+0x4b662c)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/asan/asan_allocator2.cc:157 __asan::AsanChunk::UsedSize(bool)
==47009==ABORTING
FAIL: test_coproc.t

In this case, I have to believe that this is a false positive with lack of coproc support. I will try to find if I can somehow blacklist such cases...

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 7, 2016

Nice work, thanks for tackling this @dongahn !

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

@grondo: Thanks. I am looking at the code for test_cronodate.t but I'm not sure exactly what's going on with test_cronodate.t after padding the structure. I can use your help if you are familiar with this code :-)

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 7, 2016

I'm not completely familiar with AddressSanitizer output, but is it complaining because approxidate uses a struct atm that is cast to struct tm:

(Is this the struct you are padding? Sorry if I didn't completely follow...)

/**
 * Maintains compatibility with the default struct tm,
 * but adds a field for usec.
 */
struct atm {
    int tm_sec;
    int tm_min;
    int tm_hour;
    int tm_mday;
    int tm_mon;
    int tm_year;
    int tm_wday;
    int tm_yday;
    int tm_isdst;
    long tm_usec;
};
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 7, 2016

Arggh! Now I see your comments above, sorry! Pondering...

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

Isn't it nice when the internal structure definition and man page definition don't agree? :-(

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 7, 2016

Hm, I don't see the struct atm definition in upstream git/date.c from which approxidate.c is derived. I wonder if the approxidate struct atm was a kind of bad idea.

Currently, approxidate is only used for the cronodate tests. I wonder if would be better to remove it for now and adapt test_cronodate.t rather than try to fix this issue.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

I won't be surprised if the upstream guys removed the struct atm because of the memory issues like what Sanitizer spotted...

Currently, approxidate is only used for the cronodate tests. I wonder if would be better to remove it for now and adapt test_cronodate.t rather than try to fix this issue.

I don't object to this idea.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 7, 2016

I don't object to this idea.

I could work on this tomorrow. Sorry that ended up going in when it wasn't exactly needed. 👎

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

Thank you @grondo!

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

FYI -- I did make -i check at least to see what errors are reported across all of our regression tests. It turned out ASan reported many leaks and as a result failed many tests. Here is the leak reports:
asan.zip. I have to think that the detected errors are a mixed bag of false and true positives. As such, it will take some time to go through each report and to fix or blacklist it until all regression tests will pass. I think that should be another PR.

OK. Moving on to other sanitizers.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

I think ThreadSanitizer found one minor data race in a test case:

1..29
ok 1 - coproc_create works
ok 2 - coproc args are valid
ok 3 - coproc_start works
ok 4 - coproc returned
ok 5 - rc is set to coproc return value
ok 6 - coproc args are valid
ok 7 - coproc_start works
ok 8 - coproc did not return (yielded)
ok 9 - coproc_resume works
ok 10 - coproc did not return (yielded)
ok 11 - coproc_resume works
ok 12 - coproc returned
ok 13 - rc is set to coproc return value
ok 14 - coproc_resume on returned coproc fails with EINVAL
ok 15 - pthread_create OK
==================
WARNING: ThreadSanitizer: data race (pid=117579)
  Write of size 4 at 0x7ffff8f20770 by thread T1:
    #0 vok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:65:5 (test_coproc.t+0x00000009ad4a)
    #1 ok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:93:5 (test_coproc.t+0x00000009af48)
    #2 threadmain /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:85:5 (test_coproc.t+0x000000099647)

  Previous write of size 4 at 0x7ffff8f20770 by main thread:
    #0 vok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:65:5 (test_coproc.t+0x00000009ad4a)
    #1 ok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:93:5 (test_coproc.t+0x00000009af48)
    #2 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:141:5 (test_coproc.t+0x00000009999b)
    #3 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:117:5 (test_coproc.t+0x0000000997ea)
    #4 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:108:5 (test_coproc.t+0x000000099743)

  Location is global 'current_test' of size 4 at 0x7ffff8f20770 (test_coproc.t+0x000000fe4770)

  Thread T1 (tid=117581, running) created by main thread at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (test_coproc.t+0x00000004b9d3)
    #1 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:141:5 (test_coproc.t+0x00000009997a)
    #2 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:117:5 (test_coproc.t+0x0000000997ea)
    #3 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:108:5 (test_coproc.t+0x000000099743)

SUMMARY: ThreadSanitizer: data race /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:65 vok_at_loc
==================
ok 16 - coproc_create works in a pthread
ok 17 - coproc_start works in a pthread
ok 18 - coproc_start did not return (yielded)
ok 19 - pthread_join OK

What this is saying is, this can be executed concurrently through two different threads (main thread and pthread 1). ++ operator is almost always atomic but what we found is such a benign race can be malignant when combined with compiler optimization. BTW, is tap thread safe?

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

It also complains about using a signal-unsafe call inside of a signal w/ coproc tests:

ok 26 - coproc_start works
ok 27 - coproc successfully scribbled on stack
==================
WARNING: ThreadSanitizer: signal-unsafe call inside of a signal (pid=117579)
    #0 malloc /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:461 (test_coproc.t+0x000000041d8f)
    #1 vstrdupf /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:29:11 (test_coproc.t+0x00000009ab34)
    #2 vok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:61:18 (test_coproc.t+0x00000009ad20)
    #3 ok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:93:5 (test_coproc.t+0x00000009af48)
    #4 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:201:5 (test_coproc.t+0x000000099e67)
    #5 CallUserSignalHandler /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1700 (test_coproc.t+0x0000000245ab)
    #6 rtl_generic_sighandler /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1781 (test_coproc.t+0x0000000245ab)
    #7 rtl_sighandler(int) /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1803 (test_coproc.t+0x0000000245ab)
    #8 trampoline /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/coproc.c:85:13 (test_coproc.t+0x00000009a809)
    #9 <null> <null>:0 (libc.so.6+0x0000000438ef)
    #10 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:201:5 (test_coproc.t+0x000000099e46)

<CUT>

    #254 <null> <null>:0 (libc.so.6+0x0000000438ef)
    #255 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:150:13 (test_coproc.t+0x000000099a2a)
    #256 trampoline /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/coproc.c:85:13 (test_coproc.t+0x00000009a809)
    #257 <null> <null>:0 (libc.so.6+0x0000000438ef)

SUMMARY: ThreadSanitizer: signal-unsafe call inside of a signal /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:29 vstrdupf
==================


ok 28 - coproc_start works
==================
WARNING: ThreadSanitizer: signal-unsafe call inside of a signal (pid=117579)
    #0 free /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:508 (test_coproc.t+0x000000047cd6)
    #1 vok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:85:5 (test_coproc.t+0x00000009ae94)
    #2 ok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:93:5 (test_coproc.t+0x00000009af48)
    #3 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:201:5 (test_coproc.t+0x000000099e67)
    #4 CallUserSignalHandler /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1700 (test_coproc.t+0x0000000245ab)
    #5 rtl_generic_sighandler /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1781 (test_coproc.t+0x0000000245ab)
    #6 rtl_sighandler(int) /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1803 (test_coproc.t+0x0000000245ab)
    #7 trampoline /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/coproc.c:85:13 (test_coproc.t+0x00000009a809)


<CUT>

    #255 trampoline /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/coproc.c:85:13 (test_coproc.t+0x00000009a809)
    #256 <null> <null>:0 (libc.so.6+0x0000000438ef)
    #257 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:150:13 (test_coproc.t+0x000000099a2a)

SUMMARY: ThreadSanitizer: signal-unsafe call inside of a signal /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:85 vok_at_loc
==================

==================
WARNING: ThreadSanitizer: signal-unsafe call inside of a signal (pid=117579)
    #0 malloc /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:461 (test_coproc.t+0x000000041d8f)
    #1 vstrdupf /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:29:11 (test_coproc.t+0x00000009ab34)
    #2 vok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:61:18 (test_coproc.t+0x00000009ad20)

<CUT>

    #254 <null> <null>:0 (libc.so.6+0x0000000438ef)
    #255 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:150:13 (test_coproc.t+0x000000099a2a)
    #256 trampoline /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/coproc.c:85:13 (test_coproc.t+0x00000009a809)
    #257 <null> <null>:0 (libc.so.6+0x0000000438ef)

SUMMARY: ThreadSanitizer: signal-unsafe call inside of a signal /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:29 vstrdupf

ok 29 - coproc scribbled on guard page and segfaulted
==================
WARNING: ThreadSanitizer: signal-unsafe call inside of a signal (pid=117579)
    #0 free /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:508 (test_coproc.t+0x000000047cd6)
    #1 vok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:85:5 (test_coproc.t+0x00000009ae94)
    #2 ok_at_loc /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:93:5 (test_coproc.t+0x00000009af48)
    #3 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:203:5 (test_coproc.t+0x000000099e98)
    #4 CallUserSignalHandler /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1700 (test_coproc.t+0x0000000245ab)
    #5 rtl_generic_sighandler /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_inte

<CUT>

    #257 main /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/test/coproc.c:150:13 (test_coproc.t+0x000000099a2a)

SUMMARY: ThreadSanitizer: signal-unsafe call inside of a signal /g/g0/dahn/workspace/Flux/flux-core/src/common/libtap/tap.c:85 vok_at_loc
==================
ThreadSanitizer: reported 5 warnings

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 7, 2016

Hm, just as an experiment I wonder if this would quiet AddressSanitizer:

diff --git a/src/common/libutil/approxidate.c b/src/common/libutil/approxidate.c
index e2b2bd6..9d0e0d2 100644
--- a/src/common/libutil/approxidate.c
+++ b/src/common/libutil/approxidate.c
@@ -20,15 +20,7 @@
  * but adds a field for usec.
  */
 struct atm {
-       int tm_sec;
-       int tm_min;
-       int tm_hour;
-       int tm_mday;
-       int tm_mon;
-       int tm_year;
-       int tm_wday;
-       int tm_yday;
-       int tm_isdst;
+       struct tm;
        long tm_usec;
 };

With the version of GCC I'm using, for this to work requires CFLAGS=-fms-extensions, however anonymous structs in C do seem to be part of the ISO C11 standard, as stated in the GCC documentation

It would be interesting to know if this resolves the issue for ASan.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

i will try this. Like I said, "padding" the atm structure also quiets Asan. But it failed three approxidate tests determinstically. We will see what your syggestion will do.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

From ThreadSanitizer, I see many data race reports at czmq routines called by service callback routines. My initial guess is these are false positives: in particular if czmq/zmq uses its own way to synchronize memory accesses across threads...

This is another thing we need to follow up as a separate issue.

One representative report looks like:

==================
WARNING: ThreadSanitizer: data race (pid=149830)
  Read of size 1 at 0x7d1c0000a400 by main thread:
    #0 strncpy /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:665 (lt-flux-broker+0x000000048efa)
    #1 stdlog_decode /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/stdlog.c:106:5 (lt-flux-broker+0x0000000bb05e)
    #2 logbuf_append /g/g0/dahn/workspace/Flux/flux-core/src/broker/log.c:493:9 (lt-flux-broker+0x0000000b0db0)
    #3 cmb_log_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2172:9 (lt-flux-broker+0x0000000a4230)
    #4 svc_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/service.c:141:10 (lt-flux-broker+0x0000000ad62a)
    #5 broker_request_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2837:14 (lt-flux-broker+0x0000000a29ee)
    #6 module_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2686:18 (lt-flux-broker+0x0000000a6512)
    #7 module_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:358:9 (lt-flux-broker+0x0000000a8e15)
    #8 zmq_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:395:9 (libflux-core.so.0+0x00000001152b)
    #9 check_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/ev_zmq.c:77:9 (libflux-core.so.0+0x00000003e40a)
    #10 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #11 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #12 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #13 main /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:672:9 (lt-flux-broker+0x0000000a1065)

  Previous write of size 1 at 0x7d1c0000a400 by thread T4:
    #0 memcpy /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:608 (lt-flux-broker+0x00000003f846)
    #1 <null> <null>:0 (libczmq.so.3+0x000000024ad5)
    #2 op_send /g/g0/dahn/workspace/Flux/flux-core/src/connectors/shmem/shmem.c:105:25 (shmem.so+0x000000008d11)
    #3 flux_send /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/handle.c:445:9 (libflux-core.so.0+0x00000000f630)
    #4 client_read_cb /g/g0/dahn/workspace/Flux/flux-core/src/modules/connector-local/local.c:559:21 (connector-local.so+0x00000000a5c5)
    #5 fd_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:338:9 (libflux-core.so.0+0x0000000111c5)
    #6 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #7 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #8 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #9 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/connector-local/local.c:773:9 (connector-local.so+0x00000000952a)
    #10 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Location is heap block of size 100 at 0x7d1c0000a3a0 allocated by thread T4:
    #0 malloc /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:461 (lt-flux-broker+0x0000000475ef)
    #1 <null> <null>:0 (libzmq.so.5+0x000000022f58)
    #2 op_send /g/g0/dahn/workspace/Flux/flux-core/src/connectors/shmem/shmem.c:105:25 (shmem.so+0x000000008d11)
    #3 flux_send /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/handle.c:445:9 (libflux-core.so.0+0x00000000f630)
    #4 client_read_cb /g/g0/dahn/workspace/Flux/flux-core/src/modules/connector-local/local.c:559:21 (connector-local.so+0x00000000a5c5)
    #5 fd_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:338:9 (libflux-core.so.0+0x0000000111c5)
    #6 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #7 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #8 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #9 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/connector-local/local.c:773:9 (connector-local.so+0x00000000952a)
    #10 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Thread T4 (tid=150139, running) created by main thread at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 module_start /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:368:19 (lt-flux-broker+0x0000000a9666)
    #2 module_start_all /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:655 (lt-flux-broker+0x0000000a9666)
    #3 load_modules /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:1427:5 (lt-flux-broker+0x0000000a0ec6)
    #4 main /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:645 (lt-flux-broker+0x0000000a0ec6)

SUMMARY: ThreadSanitizer: data race /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/stdlog.c:106 stdlog_decode

Essentially, the report says: strncpy called by the main thread and memcpy in czmq by Pthread #4 access the same memory location without a synchronization known to TSan (e.g., pthread mutex).

If these are indeed false positives, we need to annotate czmq/zmq library codes with TSan's annotation API.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

This heap-use-after-free report on the rpc ref count code seems a bit suspicious to me:

WARNING: ThreadSanitizer: heap-use-after-free (pid=152005)
  Write of size 4 at 0x7d1c000037f8 by thread T5:
    #0 flux_rpc_usecount_decr /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/rpc.c:66:9 (libflux-core.so.0+0x00000001c56e)
    #1 flux_rpc_destroy /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/rpc.c:90:5 (libflux-core.so.0+0x00000001c520)
    #2 content_load_request_send /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:350:9 (kvs.so+0x00000000d1a7)
    #3 load /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:369 (kvs.so+0x00000000d1a7)
    #4 heartbeat_cb /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:779:11 (kvs.so+0x00000000a565)
    #5 dispatch_message /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/dispatch.c:472:17 (libflux-core.so.0+0x000000014f7d)
    #6 handle_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/dispatch.c:567 (libflux-core.so.0+0x000000014f7d)
    #7 handle_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:280:9 (libflux-core.so.0+0x000000010e0b)
    #8 check_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/ev_flux.c:72:9 (libflux-core.so.0+0x00000001fae3)
    #9 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #10 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #11 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #12 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:1766:9 (kvs.so+0x000000009851)
    #13 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Previous write of size 8 at 0x7d1c000037f8 by thread T5:
    #0 free /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:508 (lt-flux-broker+0x00000004d536)
    #1 flux_rpc_usecount_decr /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/rpc.c:83:9 (libflux-core.so.0+0x00000001c672)
    #2 flux_rpc_destroy /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/rpc.c:90:5 (libflux-core.so.0+0x00000001c520)
    #3 content_load_completion /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:333:5 (kvs.so+0x00000000d394)
    #4 content_load_request_send /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:349:9 (kvs.so+0x00000000d19f)
    #5 load /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:369 (kvs.so+0x00000000d19f)
    #6 heartbeat_cb /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:779:11 (kvs.so+0x00000000a565)
    #7 dispatch_message /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/dispatch.c:472:17 (libflux-core.so.0+0x000000014f7d)
    #8 handle_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/dispatch.c:567 (libflux-core.so.0+0x000000014f7d)
    #9 handle_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:280:9 (libflux-core.so.0+0x000000010e0b)
    #10 check_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/ev_flux.c:72:9 (libflux-core.so.0+0x00000001fae3)
    #11 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #12 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #13 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #14 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/kvs/kvs.c:1766:9 (kvs.so+0x000000009851)
    #15 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Thread T5 (tid=152264, running) created by main thread at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 module_start /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:368:19 (lt-flux-broker+0x0000000a7c2c)
    #2 cmb_insmod_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2041:9 (lt-flux-broker+0x0000000a3ba0)
    #3 svc_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/service.c:141:10 (lt-flux-broker+0x0000000ad62a)
    #4 broker_request_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2851:14 (lt-flux-broker+0x0000000a29d1)
    #5 parent_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2653:18 (lt-flux-broker+0x0000000a19ba)
    #6 parent_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/overlay.c:545:9 (lt-flux-broker+0x0000000ac85b)
    #7 zmq_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:395:9 (libflux-core.so.0+0x00000001152b)
    #8 check_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/ev_zmq.c:77:9 (libflux-core.so.0+0x00000003e40a)
    #9 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #10 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #11 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #12 main /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:672:9 (lt-flux-broker+0x0000000a1065)

SUMMARY: ThreadSanitizer: heap-use-after-free /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/rpc.c:66 flux_rpc_usecount_decr
@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

These ones complains about Sophia code:

12-content-sqlite.t
# ./t0013-content-sophia.t: flux session size will be 17
ok 1 - load content-sophia module on rank 0
==================
WARNING: ThreadSanitizer: data race (pid=175572)
  Atomic write of size 1 at 0x7d780002e998 by thread T6 (mutexes: write M92):
    #0 __tsan_atomic8_exchange /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:576 (lt-flux-broker+0x00000006d315)
    #1 sx_vlsn /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/transaction/sx.c:98:10 (content-sophia.so+0x000000078f72)
    #2 se_schedule /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:618:19 (content-sophia.so+0x0000000ad0a6)
    #3 se_scheduler /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:809 (content-sophia.so+0x0000000ad0a6)
    #4 se_worker /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:370:8 (content-sophia.so+0x0000000ae613)

  Previous read of size 8 at 0x7d780002e998 by thread T5 (mutexes: write M89):
    #0 memcpy /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:608 (lt-flux-broker+0x00000003f846)
    #1 se_metart /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_meta.c:768:2 (content-sophia.so+0x0000000a2d7a)
    #2 se_metaquery /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_meta.c:800:2 (content-sophia.so+0x0000000a8294)
    #3 se_metaget_object /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_meta.c:849 (content-sophia.so+0x0000000a8294)
    #4 sp_getobject /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:206:12 (content-sophia.so+0x0000000b04d3)
    #5 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:131:32 (content-sophia.so+0x000000009723)
    #6 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x000000009723)
    #7 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  As if synchronized via sleep:
    #0 nanosleep /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:258 (lt-flux-broker+0x00000003f995)
    #1 ss_sleep /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_time.c:17:2 (content-sophia.so+0x0000000ae57f)
    #2 se_worker /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:374 (content-sophia.so+0x0000000ae57f)

  Location is heap block of size 3040 at 0x7d780002e800 allocated by thread T5:
    #0 malloc /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:461 (lt-flux-broker+0x0000000475ef)
    #1 se_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:220:10 (content-sophia.so+0x00000009cde3)
    #2 sp_env /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:49:9 (content-sophia.so+0x0000000af713)
    #3 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:125:26 (content-sophia.so+0x00000000965f)
    #4 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x00000000965f)
    #5 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Mutex M92 (0x7d780002f1a0) created at:
    #0 pthread_mutex_init /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1032 (lt-flux-broker+0x00000003d6e5)
    #1 ss_mutexinit /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_mutex.h:20:2 (content-sophia.so+0x00000009d79a)
    #2 se_scheduler_init /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:257 (content-sophia.so+0x00000009d79a)
    #3 se_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:279 (content-sophia.so+0x00000009d79a)
    #4 sp_env /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:49:9 (content-sophia.so+0x0000000af713)
    #5 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:125:26 (content-sophia.so+0x00000000965f)
    #6 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x00000000965f)
    #7 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Mutex M89 (0x7d780002e838) created at:
    #0 pthread_mutex_init /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1032 (lt-flux-broker+0x00000003d6e5)
    #1 ss_mutexinit /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_mutex.h:20:2 (content-sophia.so+0x00000009d3b8)
    #2 se_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:264 (content-sophia.so+0x00000009d3b8)
    #3 sp_env /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:49:9 (content-sophia.so+0x0000000af713)
    #4 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:125:26 (content-sophia.so+0x00000000965f)
    #5 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x00000000965f)
    #6 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Thread T6 (tid=176227, running) created by thread T5 at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 ss_threadnew /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_thread.c:15:9 (content-sophia.so+0x0000000ae4dc)
    #2 se_workernew /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_worker.c:36 (content-sophia.so+0x0000000ae4dc)
    #3 se_workerpool_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_worker.c:85 (content-sophia.so+0x0000000ae4dc)
    #4 se_scheduler_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:383:7 (content-sophia.so+0x0000000bb75b)
    #5 se_open /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:74 (content-sophia.so+0x0000000bb75b)
    #6 sp_open /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:75:11 (content-sophia.so+0x0000000af985)
    #7 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:130:20 (content-sophia.so+0x000000009702)
    #8 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x000000009702)
    #9 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Thread T5 (tid=176226, running) created by main thread at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 module_start /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:368:19 (lt-flux-broker+0x0000000a7c2c)
    #2 cmb_insmod_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2041:9 (lt-flux-broker+0x0000000a3ba0)
    #3 svc_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/service.c:141:10 (lt-flux-broker+0x0000000ad62a)
    #4 broker_request_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2851:14 (lt-flux-broker+0x0000000a29d1)
    #5 module_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2686:18 (lt-flux-broker+0x0000000a6512)
    #6 module_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:358:9 (lt-flux-broker+0x0000000a8e15)
    #7 zmq_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:395:9 (libflux-core.so.0+0x00000001152b)
    #8 check_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/ev_zmq.c:77:9 (libflux-core.so.0+0x00000003e40a)
    #9 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #10 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #11 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #12 main /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:672:9 (lt-flux-broker+0x0000000a1065)

SUMMARY: ThreadSanitizer: data race /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/transaction/sx.c:98 sx_vlsn
==================
==================
WARNING: ThreadSanitizer: data race (pid=175572)
  Read of size 1 at 0x7d780002e834 by thread T8:
    #0 ss_spinlock /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_spinlock.h:60:8 (content-sophia.so+0x0000000ae5ac)
    #1 se_status /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_status.h:67 (content-sophia.so+0x0000000ae5ac)
    #2 se_statusactive /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_status.h:105 (content-sophia.so+0x0000000ae5ac)
    #3 se_active /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.h:63 (content-sophia.so+0x0000000ae5ac)
    #4 se_worker /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:367 (content-sophia.so+0x0000000ae5ac)

  Previous atomic write of size 1 at 0x7d780002e834 by thread T6:
    #0 __tsan_atomic8_store /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:549 (lt-flux-broker+0x00000006c583)
    #1 se_statusactive_is /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_status.h:90:2 (content-sophia.so+0x0000000ae5f9)
    #2 se_statusactive /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_status.h:105 (content-sophia.so+0x0000000ae5f9)
    #3 se_active /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.h:63 (content-sophia.so+0x0000000ae5f9)
    #4 se_worker /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:367 (content-sophia.so+0x0000000ae5f9)

  Location is heap block of size 3040 at 0x7d780002e800 allocated by thread T5:
    #0 malloc /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:461 (lt-flux-broker+0x0000000475ef)
    #1 se_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:220:10 (content-sophia.so+0x00000009cde3)
    #2 sp_env /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:49:9 (content-sophia.so+0x0000000af713)
    #3 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:125:26 (content-sophia.so+0x00000000965f)
    #4 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x00000000965f)
    #5 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Thread T8 (tid=176229, running) created by thread T5 at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 ss_threadnew /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_thread.c:15:9 (content-sophia.so+0x0000000ae4dc)
    #2 se_workernew /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_worker.c:36 (content-sophia.so+0x0000000ae4dc)
    #3 se_workerpool_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_worker.c:85 (content-sophia.so+0x0000000ae4dc)
    #4 se_scheduler_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:383:7 (content-sophia.so+0x0000000bb75b)
    #5 se_open /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:74 (content-sophia.so+0x0000000bb75b)
    #6 sp_open /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:75:11 (content-sophia.so+0x0000000af985)
    #7 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:130:20 (content-sophia.so+0x000000009702)
    #8 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x000000009702)
    #9 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Thread T6 (tid=176227, running) created by thread T5 at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 ss_threadnew /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_thread.c:15:9 (content-sophia.so+0x0000000ae4dc)
    #2 se_workernew /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_worker.c:36 (content-sophia.so+0x0000000ae4dc)
    #3 se_workerpool_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_worker.c:85 (content-sophia.so+0x0000000ae4dc)
    #4 se_scheduler_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:383:7 (content-sophia.so+0x0000000bb75b)
    #5 se_open /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:74 (content-sophia.so+0x0000000bb75b)
    #6 sp_open /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:75:11 (content-sophia.so+0x0000000af985)
    #7 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:130:20 (content-sophia.so+0x000000009702)
    #8 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x000000009702)
    #9 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Thread T5 (tid=176226, running) created by main thread at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 module_start /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:368:19 (lt-flux-broker+0x0000000a7c2c)
    #2 cmb_insmod_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2041:9 (lt-flux-broker+0x0000000a3ba0)
    #3 svc_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/service.c:141:10 (lt-flux-broker+0x0000000ad62a)
    #4 broker_request_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2851:14 (lt-flux-broker+0x0000000a29d1)
    #5 module_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2686:18 (lt-flux-broker+0x0000000a6512)
    #6 module_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:358:9 (lt-flux-broker+0x0000000a8e15)
    #7 zmq_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:395:9 (libflux-core.so.0+0x00000001152b)
    #8 check_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/ev_zmq.c:77:9 (libflux-core.so.0+0x00000003e40a)
    #9 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #10 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #11 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #12 main /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:672:9 (lt-flux-broker+0x0000000a1065)

SUMMARY: ThreadSanitizer: data race /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_spinlock.h:60 ss_spinlock
==================
==================
WARNING: ThreadSanitizer: data race (pid=175572)
  Atomic write of size 1 at 0x7d780002f138 by thread T5 (mutexes: write M89):
    #0 __tsan_atomic8_store /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:549 (lt-flux-broker+0x00000006c583)
    #1 se_reqwrite /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_mutex.h:35:2 (content-sophia.so+0x0000000a25d4)
    #2 se_execute /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_execute.c:101:27 (content-sophia.so+0x0000000bad52)
    #3 se_dbwrite /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_db.c:469 (content-sophia.so+0x0000000bad52)
    #4 se_dbset /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_db.c:484:9 (content-sophia.so+0x0000000ba314)
    #5 sp_set /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:248:11 (content-sophia.so+0x0000000b08a3)
    #6 store_cb /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:233:9 (content-sophia.so+0x000000008f46)
    #7 dispatch_message /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/dispatch.c:472:17 (libflux-core.so.0+0x000000014f7d)
    #8 handle_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/dispatch.c:567 (libflux-core.so.0+0x000000014f7d)
    #9 handle_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:280:9 (libflux-core.so.0+0x000000010e0b)
    #10 check_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/ev_flux.c:72:9 (libflux-core.so.0+0x00000001fae3)
    #11 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #12 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #13 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #14 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:400:9 (content-sophia.so+0x0000000097eb)
    #15 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Previous read of size 1 at 0x7d780002f138 by thread T9:
    #0 ss_spinlock /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_spinlock.h:60:8 (content-sophia.so+0x00000007d1bb)
    #1 sl_poolrotate_ready /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/log/sl.c:210 (content-sophia.so+0x00000007d1bb)
    #2 se_rotate /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:734:11 (content-sophia.so+0x0000000ab79a)
    #3 se_scheduler /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:812 (content-sophia.so+0x0000000ab79a)
    #4 se_worker /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:370:8 (content-sophia.so+0x0000000ae613)

  Location is heap block of size 3040 at 0x7d780002e800 allocated by thread T5:
    #0 malloc /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:461 (lt-flux-broker+0x0000000475ef)
    #1 se_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:220:10 (content-sophia.so+0x00000009cde3)
    #2 sp_env /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:49:9 (content-sophia.so+0x0000000af713)
    #3 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:125:26 (content-sophia.so+0x00000000965f)
    #4 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x00000000965f)
    #5 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Mutex M89 (0x7d780002e838) created at:
    #0 pthread_mutex_init /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1032 (lt-flux-broker+0x00000003d6e5)
    #1 ss_mutexinit /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_mutex.h:20:2 (content-sophia.so+0x00000009d3b8)
    #2 se_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:264 (content-sophia.so+0x00000009d3b8)
    #3 sp_env /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:49:9 (content-sophia.so+0x0000000af713)
    #4 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:125:26 (content-sophia.so+0x00000000965f)
    #5 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x00000000965f)
    #6 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

  Thread T5 (tid=176226, running) created by main thread at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 module_start /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:368:19 (lt-flux-broker+0x0000000a7c2c)
    #2 cmb_insmod_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2041:9 (lt-flux-broker+0x0000000a3ba0)
    #3 svc_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/service.c:141:10 (lt-flux-broker+0x0000000ad62a)
    #4 broker_request_sendmsg /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2851:14 (lt-flux-broker+0x0000000a29d1)
    #5 module_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:2686:18 (lt-flux-broker+0x0000000a6512)
    #6 module_cb /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:358:9 (lt-flux-broker+0x0000000a8e15)
    #7 zmq_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:395:9 (libflux-core.so.0+0x00000001152b)
    #8 check_cb /g/g0/dahn/workspace/Flux/flux-core/src/common/libutil/ev_zmq.c:77:9 (libflux-core.so.0+0x00000003e40a)
    #9 ev_invoke_pending /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3088:11 (libflux-core.so.0+0x0000000509c7)
    #10 ev_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libev/ev.c:3488:7 (libflux-core.so.0+0x0000000520f6)
    #11 flux_reactor_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libflux/reactor.c:134:13 (libflux-core.so.0+0x0000000108d3)
    #12 main /g/g0/dahn/workspace/Flux/flux-core/src/broker/broker.c:672:9 (lt-flux-broker+0x0000000a1065)

  Thread T9 (tid=176230, running) created by thread T5 at:
    #0 pthread_create /collab/usr/global/tools/clang/chaos_5_x86_64_ib/clang-omp-3.5.0/src/llvm-3.5.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:848 (lt-flux-broker+0x000000051233)
    #1 ss_threadnew /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_thread.c:15:9 (content-sophia.so+0x0000000ae4dc)
    #2 se_workernew /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_worker.c:36 (content-sophia.so+0x0000000ae4dc)
    #3 se_workerpool_new /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_worker.c:85 (content-sophia.so+0x0000000ae4dc)
    #4 se_scheduler_run /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se_scheduler.c:383:7 (content-sophia.so+0x0000000bb75b)
    #5 se_open /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/environment/se.c:74 (content-sophia.so+0x0000000bb75b)
    #6 sp_open /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/sophia/sophia.c:75:11 (content-sophia.so+0x0000000af985)
    #7 getctx /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:130:20 (content-sophia.so+0x000000009702)
    #8 mod_main /g/g0/dahn/workspace/Flux/flux-core/src/modules/content-sophia/content-sophia.c:385 (content-sophia.so+0x000000009702)
    #9 module_thread /g/g0/dahn/workspace/Flux/flux-core/src/broker/module.c:145:9 (lt-flux-broker+0x0000000a7ec7)

SUMMARY: ThreadSanitizer: data race /g/g0/dahn/workspace/Flux/flux-core/src/common/libsophia/sophia/std/ss_mutex.h:35 se_reqwrite

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

TSan has an issue with Lua:

not ok 3 - lalarm: is function
#     Failed test (./lua/t0007-alarm.t at line 6)
#     error loading module 'lalarm' from file '/g/g0/dahn/workspace/Flux/flux-core/src/bindings/lua/.libs/lalarm.so':
#   /g/g0/dahn/workspace/Flux/flux-core/src/bindings/lua/.libs/lalarm.so: undefined symbol: __tsan_init isn't a 'function' it's a 'string'
/usr/bin/lua: ./lua/t0007-alarm.t:9: attempt to call local 'alarm' (a string value)
stack traceback:
    ./lua/t0007-alarm.t:9: in main chunk
    [C]: ?
FAIL: lua/t0007-alarm.t
@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 7, 2016

OK. I looked through all of the reports, and it seems the basics of TSan work reasonably well. I will group all these issues above into a few categories and create an issue for each so that we can follow up in a manageable way.

Because Leak and Memory sanitizer aren't supported by lower version of compilers, I will submit a PR only with Address and ThreadSantizer. LeakSanitizer is only a subset of AddressSanitizer anyway.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 8, 2016

OK. I just submitted a PR (#726) and also created a set of issues to follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.