Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test hangs with TSAN #5117

Closed
takuro-sato opened this issue Mar 16, 2023 · 2 comments
Closed

Test hangs with TSAN #5117

takuro-sato opened this issue Mar 16, 2023 · 2 comments
Labels

Comments

@takuro-sato
Copy link
Contributor

takuro-sato commented Mar 16, 2023

part of #4994

It happens in Virtual MultiThreadTsan CI
https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=67000&view=logs&j=7944d106-e7da-5d45-12ec-6dfc8e3f5a03&t=564afb6e-c35d-5add-8e9b-52387eab437a

Reproduce

Also you can reproduce it locally.

$ CC=$(which clang-15) CXX=$(which clang++-15) cmake -GNinja -DCOMPILE_TARGET=virtual -DWORKER_THREADS=2 -DTSAN=ON -DLVI_MITIGATIONS=OFF -DCMAKE_BUILD_TYPE=Debug -L ..
$ ninja
$ sudo ./tests.sh --verbose

Note that the tests you see this bug are different depending of clang version.
If you use clang-12, you don't see this in ds_test, but still do in e2e_logging_cft etc.

In which test cases does it happen?

ds_test and many E2E tests.

Smallest test to reproduce it.

ds_test is the handy one to reproduce.
Looks like it happens in "Unique thread IDs" test case in thread_messaging.cpp. The ds_test passes without it.

test 7
      Start  7: ds_test

7: Test command: /__w/1/s/build/ds_test
7: Environment variables: 
7:  TSAN_OPTIONS=suppressions=/__w/1/s/tsan_env_suppressions
7: Test timeout computed to be: 10000000

In other terminal,

takurosato@takurosato-virtual2:~/CCF/build-clang15$ ps -ef | grep ds_test
root      114509  114508 11 17:15 pts/15   00:00:41 /home/takurosato/CCF/build-clang15/ds_test
takuros+  115531   14029  0 17:21 pts/17   00:00:00 grep --color=auto ds_test
takurosato@takurosato-virtual2:~/CCF/build-clang15$ sudo lldb
(lldb) process attach --pid 114509
Process 114509 stopped
* thread #1, name = 'ds_test', stop reason = signal SIGSTOP
    frame #0: 0x00007f7878db5cd7 libpthread.so.0`__pthread_clockjoin_ex + 599
libpthread.so.0`__pthread_clockjoin_ex:
->  0x7f7878db5cd7 <+599>: movl   %r8d, %edi
    0x7f7878db5cda <+602>: callq  0x7f7878dbf0a0            ; __pthread_disable_asynccancel
    0x7f7878db5cdf <+607>: movl   (%r13), %r12d
    0x7f7878db5ce3 <+611>: testl  %r12d, %r12d
  thread #2, name = 'ds_test', stop reason = signal SIGSTOP
    frame #0: 0x000055b7bd3dc57e ds_test`__sanitizer::internal_usleep(unsigned long long) + 62
ds_test`__sanitizer::internal_usleep:
->  0x55b7bd3dc57e <+62>: retq
    0x55b7bd3dc57f:       nop
ds_test`__sanitizer::internal_execve:
    0x55b7bd3dc580 <+0>:  movl   $0x3b, %eax
    0x55b7bd3dc585 <+5>:  syscall
  thread #3, name = 'ds_test', stop reason = signal SIGSTOP
    frame #0: 0x00007f7878dbb376 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 534
libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2:
->  0x7f7878dbb376 <+534>: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f7878dbb37c <+540>: ja     0x7f7878dbb4b8            ; <+856>
    0x7f7878dbb382 <+546>: movl   %r8d, %edi
    0x7f7878dbb385 <+549>: callq  0x7f7878dbf0a0            ; __pthread_disable_asynccancel

Executable module set to "/home/takurosato/CCF/build-clang15/ds_test".
Architecture set to: x86_64-pc-linux-gnu.
(lldb) thread backtrace all
* thread #1, name = 'ds_test', stop reason = signal SIGSTOP
  * frame #0: 0x00007f7878db5cd7 libpthread.so.0`__pthread_clockjoin_ex + 599
    frame #1: 0x000055b7bd3ff32d ds_test`__interceptor_pthread_join + 237
    frame #2: 0x00007f7878e6c173 libc++.so.1`std::__1::thread::join() + 19
    frame #3: 0x000055b7bd6de75b ds_test`DOCTEST_ANON_FUNC_6() + 843
    frame #4: 0x000055b7bd490ebf ds_test`doctest::Context::run(this=0x00007ffc730da6e0) + 8335
    frame #5: 0x000055b7bd492828 ds_test`main(argc=1, argv=0x00007ffc730da7e8) + 88
    frame #6: 0x00007f78786dc083 libc.so.6`__libc_start_main + 243
    frame #7: 0x000055b7bd3d183e ds_test`_start + 46
  thread #2, name = 'ds_test', stop reason = signal SIGSTOP
    frame #0: 0x000055b7bd3dc57e ds_test`__sanitizer::internal_usleep(unsigned long long) + 62
    frame #1: 0x000055b7bd46553f ds_test`__tsan::BackgroundThread(void*) + 159
    frame #2: 0x00007f7878db4609 libpthread.so.0`start_thread + 217
    frame #3: 0x00007f78787d7133 libc.so.6`__clone + 67
  thread #3, name = 'ds_test', stop reason = signal SIGSTOP
    frame #0: 0x00007f7878dbb376 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 534
    frame #1: 0x000055b7bd47d0b0 ds_test`__tsan::call_pthread_cancel_with_cleanup(int (*)(void*), void (*)(void*), void*) + 48
    frame #2: 0x000055b7bd3ffcec ds_test`__interceptor_pthread_cond_wait + 524
    frame #3: 0x00007f7878e5cdef libc++.so.1`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 15
    frame #4: 0x000055b7bd6e5733 ds_test`DOCTEST_ANON_FUNC_6(this=0x00007b0c00021608)::$_0::operator()() const + 227
    frame #5: 0x000055b7bd6e55e6 ds_test`decltype(__f=0x00007b0c00021608)::$_0>()()) std::__1::__invoke[abi:v15007]<DOCTEST_ANON_FUNC_6()::$_0>(DOCTEST_ANON_FUNC_6()::$_0&&) + 38
    frame #6: 0x000055b7bd6e559e ds_test`void std::__1::__thread_execute[abi:v15007]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, DOCTEST_ANON_FUNC_6()::$_0>(__t=size=2, (null)=__tuple_indices<> @ 0x00007f786aff6518)::$_0>&, std::__1::__tuple_indices<>) + 46
    frame #7: 0x000055b7bd6e5210 ds_test`void* std::__1::__thread_proxy[abi:v15007]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, DOCTEST_ANON_FUNC_6()::$_0> >(__vp=0x00007b0c00021600) + 128
    frame #8: 0x000055b7bd3fef77 ds_test`__tsan_thread_start_func + 167
    frame #9: 0x00007f7878db4609 libpthread.so.0`start_thread + 217
    frame #10: 0x00007f78787d7133 libc.so.6`__clone + 67
@takuro-sato
Copy link
Contributor Author

As far as I found, google/sanitizers#498 is the most closest problem in terms of that pthread_cond_wait is releated. But it's very old so I am not sure it's the same problem

@takuro-sato
Copy link
Contributor Author

takuro-sato commented Mar 17, 2023

It reproduces with a small c++ code outside CCF

takurosato@takurosato-virtual2:~/misc/tsan$ cat main.cpp 
#include <iostream>
#include <thread>
#include <algorithm>
#include <vector>
#include <mutex>
#include <memory>
#include <atomic>
#include <condition_variable>

namespace threading
{
  using ThreadID = uint16_t;
  static constexpr ThreadID MAIN_THREAD_ID = 0;
  static std::atomic<ThreadID> next_thread_id = MAIN_THREAD_ID;

  uint16_t get_current_thread_id()
  {
    thread_local ThreadID this_thread_id = next_thread_id.fetch_add(1);
    return this_thread_id;
  }

  void reset_thread_id_generator()
  {
    next_thread_id.store(MAIN_THREAD_ID);
  }
}

int main() {
  std::mutex assigned_ids_lock;
  std::vector<uint16_t> assigned_ids;

  const auto main_thread_id = threading::get_current_thread_id();
  // REQUIRE(main_thread_id == threading::MAIN_THREAD_ID);
  assigned_ids.push_back(main_thread_id);

  std::mutex all_done_lock;
  std::condition_variable all_done;

  auto fn = [&]() {
    {
      std::lock_guard<std::mutex> guard(assigned_ids_lock);
      const auto current_thread_id = threading::get_current_thread_id();
      assigned_ids.push_back(current_thread_id);
    }

    {
      std::unique_lock lock(all_done_lock);
      all_done.wait(lock);
    }
  };

  constexpr size_t num_threads = 20;
  constexpr size_t expected_ids = num_threads + 1; // Includes MAIN_THREAD_ID
  std::vector<std::thread> threads;
  for (auto i = 0; i < num_threads; ++i)
  {
    threads.emplace_back(fn);
  }

  size_t attempts = 0;
  constexpr size_t max_attempts = 5;
  while (true)
  {
    {
      std::lock_guard<std::mutex> guard(assigned_ids_lock);
      if (assigned_ids.size() == expected_ids)
      {
        all_done.notify_all();
        break;
      }
    }

    // REQUIRE(++attempts < max_attempts);
    std::this_thread::sleep_for(std::chrono::milliseconds(50));
  }

  // REQUIRE(assigned_ids.size() == expected_ids);

  for (auto& thread : threads)
  {
    thread.join();
  }

  const auto unique = std::unique(assigned_ids.begin(), assigned_ids.end());
  // REQUIRE_MESSAGE(
  //   unique == assigned_ids.end(),
  //   fmt::format(
  //     "Thread IDs are not unique: {}", fmt::join(assigned_ids, ", ")));
}
takurosato@takurosato-virtual2:~/misc/tsan$ cat run.sh 
#!/bin/bash
clang++-15 -g -pthread -stdlib=libc++ -fsanitize=thread -std=c++20 main.cpp && ./a.out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants