Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang Thread Safety Analysis warnings in time_controller_clock.cpp #795

Closed
sloretz opened this issue Jun 25, 2021 · 11 comments
Closed

Clang Thread Safety Analysis warnings in time_controller_clock.cpp #795

sloretz opened this issue Jun 25, 2021 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@sloretz
Copy link
Contributor

sloretz commented Jun 25, 2021

Description

The latest nightly_linux_clang_libcxx job shows new thread safety analysis warnings.

Expected Behavior

No thread safety analysis warnings

Actual Behavior

Clang reports thread safety analysis warnings

To Reproduce

  1. Go to https://ci.ros2.org/view/nightly/job/nightly_linux_clang_libcxx/829/clang-tidy/new/
  2. Scroll down to the 'Issues' at the bottom of the page
  3. Expand all the time_controller_clock.cpp warnings

System (please complete the following information)

  • OS: Ubuntu Focal
  • ROS 2 Distro: Latest development
  • Version: master / bb9c7ae

Additional context

All the offending code appears to come from this commit 6f7503e #779

[ 17%] Building CXX object CMakeFiles/rosbag2_cpp.dir/src/rosbag2_cpp/clocks/time_controller_clock.cpp.o
/home/jenkins-agent/workspace/nightly_linux_clang_libcxx/ws/src/ros2/rosbag2/rosbag2_cpp/src/rosbag2_cpp/clocks/time_controller_clock.cpp:143:5: warning: reading variable 'cv' requires holding mutex 'state_mutex' [-Wthread-safety-analysis]
    cv.notify_all();
    ^
/home/jenkins-agent/workspace/nightly_linux_clang_libcxx/ws/src/ros2/rosbag2/rosbag2_cpp/src/rosbag2_cpp/clocks/time_controller_clock.cpp:202:7: warning: calling function 'process_callback' requires holding mutex 'callback_list_mutex_' exclusively [-Wthread-safety-analysis]
      process_callback(handler, time_jump, true);
      ^
/home/jenkins-agent/workspace/nightly_linux_clang_libcxx/ws/src/ros2/rosbag2/rosbag2_cpp/src/rosbag2_cpp/clocks/time_controller_clock.cpp:210:7: warning: calling function 'process_callback' requires holding mutex 'callback_list_mutex_' exclusively [-Wthread-safety-analysis]
      process_callback(handler, time_jump, false);
@sloretz sloretz added the bug Something isn't working label Jun 25, 2021
@MichaelOrlov MichaelOrlov self-assigned this Jun 26, 2021
@MichaelOrlov
Copy link
Contributor

I don't see a bugs reported by this warnings in the code, rather this is a bug in our tooling environment.

@sloretz @emersonknapp How to know which version of the libcxx we are using in this nightly check?
It looks like we are using very outdated version which doesn't know about clang's thread safety annotations.

According to the https://stackoverflow.com/a/38442968 the thread-safety annotations became included in the libcxx since March 15, 2016. per libcxx Add clang thread safety annotations to mutex and lock_guard PR.

This adds clang thread safety annotations to std::mutex and
std::lock_guard so code using these types can use these types directly
instead of having to wrap the types to provide annotations. These checks
when enabled by -Wthread-safety provide simple but useful static
checking to detect potential race conditions.
See http://clang.llvm.org/docs/ThreadSafetyAnalysis.html for details.

It looks like our libcxx version doesn't have these thread safety annotations.

As regards to the warning from

/home/jenkinsagent/workspace/nightly_linux_clang_libcxx/ws/src/ros2/rosbag2/rosbag2_cpp/src/rosbag2_cpp/clocks/time_controller_clock.cpp:143:5: warning: reading variable 'cv' requires holding mutex 'state_mutex' [-Wthread-safety-analysis]
cv.notify_all();

This is really false positive in thread safety analysis. Because according to the https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all

The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.

We shouldn't call notify_all under the mutex lock since we could ignore this notification in waiting thread by going to wait again.
And I don't know how to suppress this false positive. Any advice?

@clalancette
Copy link
Contributor

@sloretz @emersonknapp How to know which version of the libcxx we are using in this nightly check?
It looks like we are using very outdated version which doesn't know about clang's thread safety annotations.

According to the https://stackoverflow.com/a/38442968 the thread-safety annotations became included in the libcxx since March 15, 2016. per libcxx Add clang thread safety annotations to mutex and lock_guard PR.

The version we are using is what is shipped with Ubuntu 20.04; it gets installed in the Dockerfile here. Checking locally, it looks like it is probably libc++-10-dev.

I am fairly confident that version supports thread annotations; the entire reason we stood up that clang-libcxx job in the first place was to test out thread annotations. I think @emersonknapp set up that job, so he may have more details.

@emersonknapp
Copy link
Collaborator

@MichaelOrlov the standard ci_launcher and the GitHub Actions do not run Clang, they use GCC (with libstdc++). The Clang (with libcxx) check is a special separate nighly job on ci.ros2.org that is not run anywhere else. You can try this build locally using the clang-libcxx colcon mixin from https://github.com/colcon/colcon-mixin-repository

@MichaelOrlov
Copy link
Contributor

@emersonknapp This is not about reproduction of the issue. I am clearly understand what these issues about and that this job running nightly.

According to the my analysis our libcxx version doesn't know about thread safety annotations and this is why we have warnings from thread safety analysis when trying to use thread safety annotations with std::lock_guard and std::mutex.

Please refer to the https://stackoverflow.com/a/38442968 the thread-safety annotations became included in the libcxx since March 15, 2016. per libcxx Add clang thread safety annotations to mutex and lock_guard PR.

@clalancette
Copy link
Contributor

The question is why it isn't working with the version of libc++-10 in Ubuntu 20.04. That version was released in March 2020, so it is clearly new enough to have the PRs you referenced incorporated. So the possibilities seem to be:

  1. They are removed/compiled out of the Ubuntu version for some reason
  2. We are using them wrong
  3. There is something wrong with the code

I don't know which of those is the case, but that's what we need to figure out.

@emersonknapp
Copy link
Collaborator

emersonknapp commented Jun 28, 2021

The question is why it isn't working with the version of libc++-10 in Ubuntu 20.04.

Was this upgraded in the last 2 weeks? I'm sure it's what we've been running the entire time. And it supports the thread safety analysis. We were successfully using these annotations in Bionic for the Dashing release.

I'm confused what the confusion is. The buildfarm and default Focal installations support this feature, and are correctly pointing out that the code is wrong.

According to the my analysis our libcxx version doesn't know about thread safety annotations

The thread safety analysis checks have been working the entire time, as you will notice there are already annotations in the PlayerClock code - the newest warnings were introduced, as @clalancette said, in #779 - in https://ci.ros2.org/view/nightly/job/nightly_linux_clang_libcxx/818/ there were only warnings from rmw_fastrtps. After merging the PR, in https://ci.ros2.org/view/nightly/job/nightly_linux_clang_libcxx/819/ - the new warnings were introduced. It is a problem with the latest rosbag2 code - not with the infrastructure.

If you look at the warnings that the CI shows you https://ci.ros2.org/view/nightly/job/nightly_linux_clang_libcxx/819/clang/folder.535941802/source.5d4ff16c-783b-4241-a873-4a9e5dc3e2c2/#143 - it's clear that yes, you are accessing the variable cv without holding the mutex, because it is acquired in a limited scope.

@clalancette
Copy link
Contributor

Was this upgraded in the last 2 weeks?

It doesn't look like it; the last Ubuntu build of this package in 20.04 was in April: https://launchpad.net/ubuntu/+source/llvm-toolchain-10

@MichaelOrlov
Copy link
Contributor

@emersonknapp @clalancette I've tried to dig deeper as much I can on my side in this issue, and here is my observation:

  1. I don't see issues with code since it's pretty trivial. The warnings refer to the following two functions:process_callbacks_before_jump(const rcl_time_jump_t & time_jump) and
    process_callbacks_after_jump(const rcl_time_jump_t & time_jump)
  void process_callbacks_before_jump(const rcl_time_jump_t & time_jump)
  {
    std::lock_guard<std::mutex> lock(callback_list_mutex_);
    for (auto const & handler : callback_list_) {
      process_callback(handler, time_jump, true);
    }
  }

  void process_callbacks_after_jump(const rcl_time_jump_t & time_jump)
  {
    std::lock_guard<std::mutex> lock(callback_list_mutex_);
    for (auto const & handler : callback_list_) {
      process_callback(handler, time_jump, false);
    }
  }

As you can see, there obviously taking callback_list_mutex_ exclusively via std::lock_guard<std::mutex> before calling annotated process_callback(handler, time_jump, false); function. There literally 3 lines of code in both cases.

  1. Unfortunately I was not able to build rosbag2 code locally with clang and libcxx via colcon build --mixin clang-libcxx firstly build fails with linker errors. Can't find dependencies in rcutils. Of course because ROS2 rolling also should be build with --mixin clang-libcxx. I have tried to build ROS2 rolling with --mixin clang-libcxx but it fails with some compilation errors on packages with QT.

  2. Unfortunately I don't have Ubuntu 20.04 by the hand, my working environment is Ubuntu 18.04, although I have tried to install latest version of the clang and libcxx in ADE to see how source files looks like. Not sure what version uses Ubuntu 20.04 in my system I have relevant /usr/lib/llvm-10/include/c++/v1/__mutex_base file in llvm10. I anticipate that Ubuntu 20.04 uses the same version but would encourage you to double check it.
    I've verified and confirm that __mutex_base file have relevant delta form https://reviews.llvm.org/D14731

template <class _Mutex>
class _LIBCPP_TEMPLATE_VIS _LIBCPP_THREAD_SAFETY_ANNOTATION(scoped_lockable)
lock_guard
{
public:
    typedef _Mutex mutex_type;

private:
    mutex_type& __m_;
public:

    _LIBCPP_NODISCARD_EXT _LIBCPP_INLINE_VISIBILITY
    explicit lock_guard(mutex_type& __m) _LIBCPP_THREAD_SAFETY_ANNOTATION(acquire_capability(__m))
        : __m_(__m) {__m_.lock();}

    _LIBCPP_NODISCARD_EXT _LIBCPP_INLINE_VISIBILITY
    lock_guard(mutex_type& __m, adopt_lock_t) _LIBCPP_THREAD_SAFETY_ANNOTATION(requires_capability(__m))
        : __m_(__m) {}
    _LIBCPP_INLINE_VISIBILITY
    ~lock_guard() _LIBCPP_THREAD_SAFETY_ANNOTATION(release_capability()) {__m_.unlock();}

private:
    lock_guard(lock_guard const&) _LIBCPP_EQUAL_DELETE;
    lock_guard& operator=(lock_guard const&) _LIBCPP_EQUAL_DELETE;
};
  1. There are still open question why clang thread safety analysis throwing warning to the trivial usage of the lock_guard the way if it wouldn't be annotated. I had in mind some version that it could be that --mixin clang-libcxx running build with stdlibc instead of libcxx. But don't know how to verify this version.

@MichaelOrlov
Copy link
Contributor

MichaelOrlov commented Jun 29, 2021

@emersonknapp @clalancette it seems I was digging in wrong direction. Sorry about confusion.

I see that in the same file we also using RCPPUTILS_TSA_REQUIRES(mutex) annotation with another functions alongside with the same locking primitive lock_guard and it seems no errors for those cases.
It means that most probably problem not in the licxx library.

I see one significant difference in my case which might affect thread safety analysis void process_callback() declared as static function.

I can try to prepare PR with removing static from void process_callback() to see if it will eliminate issue. But I will need help with running build with clang+libcxx for verification.

What do you think?

@MichaelOrlov
Copy link
Contributor

@clalancette @emersonknapp
I've ran ci_linux_clang_libcxx on prepared PR 799 it seems warnings are gone in Rosbag2.
Please review 799 and verify that I ran ci_linux_clang_libcxx job correctly.

@MichaelOrlov
Copy link
Contributor

Closing this issue as resolved since 799 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants