Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid using dds common public mutex directly #725

Merged

Conversation

iuhilnehc-ynos
Copy link
Contributor

to address one warning TSAN about lock-order-inversion mentioned at ros2/ros2#1488 (comment).

TSAN warning log

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=1926091)
  Cycle in lock order graph: M254870179752816080 (0x000000000000) => M277106926251347976 (0x000000000000) => M254870179752816080

  Mutex M277106926251347976 acquired here while holding mutex M254870179752816080 in main thread:
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4240 (libtsan.so.0+0x53908)
    #1 __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749 (librmw_fastrtps_cpp.so+0x2669e)
    #2 std::mutex::lock() /usr/include/c++/11/bits/std_mutex.h:100 (librmw_fastrtps_cpp.so+0x26726)
    #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/include/c++/11/bits/std_mutex.h:229 (librmw_fastrtps_cpp.so+0x283ec)
    #4 rmw_create_client /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_client.cpp:440 (librmw_fastrtps_cpp.so+0x46b7a)
    #5 rmw_create_client /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rmw_implementation/rmw_implementation/src/functions.cpp:484 (librmw_implementation.so+0x7e28)
    #6 rcl_client_init /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rcl/rcl/src/rcl/client.c:107 (librcl.so+0x1909d)
    #7 rcl_action_client_init /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rcl/rcl_action/src/rcl_action/action_client.c:218 (librcl_action.so+0x6f19)
    #8 TestActionGraphMultiNodeFixture_action_client_init_maybe_fail_Test::TestBody() /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rcl/rcl_action/test/rcl_action/test_graph.cpp:626 (test_graph__rmw_fastrtps_cpp+0x86fe6)
    #9 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2433 (test_graph__rmw_fastrtps_cpp+0xdbaa7)
    #10 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2469 (test_graph__rmw_fastrtps_cpp+0xd06c0)
    #11 testing::Test::Run() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2508 (test_graph__rmw_fastrtps_cpp+0xa07eb)
    #12 testing::TestInfo::Run() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2684 (test_graph__rmw_fastrtps_cpp+0xa1500)
    #13 testing::TestSuite::Run() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2816 (test_graph__rmw_fastrtps_cpp+0xa1fd2)
    #14 testing::internal::UnitTestImpl::RunAllTests() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:5338 (test_graph__rmw_fastrtps_cpp+0xb0868)
    #15 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2433 (test_graph__rmw_fastrtps_cpp+0xdd935)
    #16 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2469 (test_graph__rmw_fastrtps_cpp+0xd22fe)
    #17 testing::UnitTest::Run() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:4925 (test_graph__rmw_fastrtps_cpp+0xae93f)
    #18 RUN_ALL_TESTS() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/include/gtest/gtest.h:2473 (test_graph__rmw_fastrtps_cpp+0x95e1a)
    #19 main /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc:45 (test_graph__rmw_fastrtps_cpp+0x95d0e)

    Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message

  Mutex M254870179752816080 acquired here while holding mutex M277106926251347976 in main thread:
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4240 (libtsan.so.0+0x53908)
    #1 __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749 (librcl_logging_spdlog.so+0x10f5d)
    #2 std::mutex::lock() /usr/include/c++/11/bits/std_mutex.h:100 (librcl_logging_spdlog.so+0x11038)
    #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/include/c++/11/bits/std_mutex.h:229 (librcl_logging_spdlog.so+0x120d0)
    #4 rmw_fastrtps_shared_cpp::destroy_subscription(char const*, CustomParticipantInfo*, rmw_subscription_s*, bool) /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/subscription.cpp:44 (librmw_fastrtps_shared_cpp.so+0x155af0)
    #5 rmw_create_subscription /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_subscription.cpp:104 (librmw_fastrtps_cpp.so+0x6627b)
    #6 rmw_create_subscription /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rmw_implementation/rmw_implementation/src/functions.cpp:391 (librmw_implementation.so+0x708b)
    #7 rcl_subscription_init /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rcl/rcl/src/rcl/subscription.c:101 (librcl.so+0x351bb)
    #8 rcl_action_client_init /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rcl/rcl_action/src/rcl_action/action_client.c:222 (librcl_action.so+0x7563)
    #9 TestActionGraphMultiNodeFixture_action_client_init_maybe_fail_Test::TestBody() /home/chenlh/Projects/ROS2/ros2-humble/src/ros2/rcl/rcl_action/test/rcl_action/test_graph.cpp:626 (test_graph__rmw_fastrtps_cpp+0x86fe6)
    #10 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2433 (test_graph__rmw_fastrtps_cpp+0xdbaa7)
    #11 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2469 (test_graph__rmw_fastrtps_cpp+0xd06c0)
    #12 testing::Test::Run() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2508 (test_graph__rmw_fastrtps_cpp+0xa07eb)
    #13 testing::TestInfo::Run() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2684 (test_graph__rmw_fastrtps_cpp+0xa1500)
    #14 testing::TestSuite::Run() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2816 (test_graph__rmw_fastrtps_cpp+0xa1fd2)
    #15 testing::internal::UnitTestImpl::RunAllTests() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:5338 (test_graph__rmw_fastrtps_cpp+0xb0868)
    #16 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2433 (test_graph__rmw_fastrtps_cpp+0xdd935)
    #17 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:2469 (test_graph__rmw_fastrtps_cpp+0xd22fe)
    #18 testing::UnitTest::Run() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/./src/gtest.cc:4925 (test_graph__rmw_fastrtps_cpp+0xae93f)
    #19 RUN_ALL_TESTS() /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/include/gtest/gtest.h:2473 (test_graph__rmw_fastrtps_cpp+0x95e1a)
    #20 main /home/chenlh/Projects/ROS2/ros2-humble/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc:45 (test_graph__rmw_fastrtps_cpp+0x95d0e)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749 in __gthread_mutex_lock

Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with current fix, probably easier to backport.

I would like 2nd review on this. @clalancette can you check when you have time?

rmw_fastrtps_cpp/src/rmw_client.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of doing these locks and unlocks by hand. It seems brittle, and I don't think it actually fixes the underlying problem.

In particular, I think that the lock sequence in rmw_create_subscription is still going to have lock inversion problems with other pieces of code. Looking at it closer, I think what we may need to do here is to make it so that we always take the entity_creation_mutex (and hold it), then take the node_update_mutex. And do this in all codepaths.

Alternatively, we could consider moving the node_update_context locking into rmw_dds_common::Context, marking it private, and then removing the locking from here. That seems like a better idea anyway.

Thoughts?

@iuhilnehc-ynos
Copy link
Contributor Author

Alternatively, we could consider moving the node_update_context locking into rmw_dds_common::Context, marking it private, and then removing the locking from here. That seems like a better idea anyway.

Thoughts?

Thanks. I think it's a good idea. I'll use this method even though it refactors more code. These repositories (rmw_dds_common, rmw_fastrtps, rmw_cyclonedds, rmw_connextdds) need to be updated.

Chen Lihui added 2 commits October 17, 2023 12:57
This reverts commit cd368d9.

Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
@iuhilnehc-ynos iuhilnehc-ynos marked this pull request as draft October 17, 2023 05:40
@iuhilnehc-ynos
Copy link
Contributor Author

It depends on ros2/rmw_dds_common#73

Signed-off-by: Chen Lihui <lihui.chen@sony.com>
@iuhilnehc-ynos
Copy link
Contributor Author

@iuhilnehc-ynos iuhilnehc-ynos changed the title tighten the lock scope avoid usind dds common public mutex directly Oct 18, 2023
@iuhilnehc-ynos iuhilnehc-ynos marked this pull request as ready for review October 19, 2023 01:11
@iuhilnehc-ynos iuhilnehc-ynos changed the title avoid usind dds common public mutex directly avoid using dds common public mutex directly Oct 19, 2023
Chen Lihui added 2 commits October 19, 2023 17:32
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Copy link
Contributor

@MiguelCompany MiguelCompany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @iuhilnehc-ynos

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one thing that might improve this. Otherwise, this looks like a great fix.

Comment on lines 116 to 117
rmw_ret = rmw_fastrtps_shared_cpp::destroy_publisher(
eprosima_fastrtps_identifier, participant_info, publisher);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost feel like we should make this an RCPPUTILS_SCOPE_EXIT right after we successfully create the publisher above. That way it will be destroyed in all error paths. Thoughts?

(the same goes for the subscription and the dynamic versions below)

Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with minor comments.

rmw_fastrtps_cpp/src/rmw_publisher.cpp Outdated Show resolved Hide resolved
rmw_fastrtps_dynamic_cpp/src/rmw_publisher.cpp Outdated Show resolved Hide resolved
rmw_fastrtps_shared_cpp/src/rmw_node.cpp Outdated Show resolved Hide resolved
Chen Lihui added 2 commits October 31, 2023 09:19
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with green CI.

@clalancette clalancette merged commit d518e06 into ros2:rolling Nov 3, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants