Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'std::system_error' what(): Resource deadlock avoided #1519

Closed
LouisChen1905 opened this issue Jan 19, 2021 · 14 comments · Fixed by #1530
Labels
bug Something isn't working

Comments

@LouisChen1905
Copy link

LouisChen1905 commented Jan 19, 2021

Bug report

Required Info:

  • Operating System:
    • ubuntu 20.04-->
  • Installation type:
    • from source
  • Version or commit hash:
    • eloquent
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

I'm writing gtest scripts. My SetUp():

void SetUp()
{
        auto context = contexts::default_context::get_global_default_context();
        auto options = NodeOptions()
                .context(context)
            // BUG: if set true, test would crash: std::system_error Resource deadlock avoided!
                .use_intra_process_comms(true)
                .automatically_declare_parameters_from_overrides(true);
        mappingNode_ = std::make_shared<Node>("mapping", options);
        bool res = false;
        res = Mapping::getInstance()->initialize(mappingNode_);
        if(!res){
            RCLCPP_ERROR(mappingNode_->get_logger(), "Failed to initialize.");
        }

        executor = std::make_shared<executors::MultiThreadedExecutor>
                   (executor::ExecutorArgs(), 2);
        executor->add_node(mappingNode_);
}

In TEST_F(), if I used executor.spin_until_future_complete() to wait some messages, test program would crash when entering the second TEST_F(). If I use spin_some() instead, the program would run without crashing.
You can find the line above use_intra_process_comms(true). If it's set to true, 'Resource deadlock avoided' would occur in the second test case.

TEST_F()
{
    promise<void> promise;
    Subscription<Mode>::SharedPtr ModeSub;
    auto ModeCB = [&ModeSub, &promise](Mode::SharedPtr msg)->void{
      if(msg->mode == Mode::W &&
         msg->state == Mode::I){
        promise.set_value();
        ModeSub.reset();
      }
    };
    ModeSub =
        pubNode_->create_subscription<Mode>("aa",
                                                   QoS(QoSInitialization::from_rmw(rmw_qos_profile_default)),
                                                   ModeCB);
    shared_future<void> future(promise.get_future());
    ASSERT_EQ(executor->spin_until_future_complete(future, 2s), executor::FutureReturnCode::SUCCESS);
}
0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff6737859 in __GI_abort () at abort.c:79
#2  0x00007ffff6b0c951 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff6b1847c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff6b17459 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff6b17e11 in __gxx_personality_v0 () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff6914bdf in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007ffff6915271 in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00007ffff6b1878c in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff6b0f77f in std::__throw_system_error(int) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff7af3a99 in std::__shared_mutex_pthread::lock (this=0x55555617af50) at /usr/include/c++/9/shared_mutex:188
#11 0x00007ffff7af3b34 in std::shared_timed_mutex::lock (this=0x55555617af50) at /usr/include/c++/9/shared_mutex:459
#12 0x00007ffff7af519b in std::unique_lock<std::shared_timed_mutex>::lock (this=0x7ffffffea490)
    at /usr/include/c++/9/bits/unique_lock.h:141
#13 0x00007ffff7af45dd in std::unique_lock<std::shared_timed_mutex>::unique_lock (this=0x7ffffffea490, __m=...)
    at /usr/include/c++/9/bits/unique_lock.h:71
#14 0x00007ffff7af2d3e in rclcpp::experimental::IntraProcessManager::remove_subscription (this=0x55555617aea0, 
    intra_process_subscription_id=2)
    at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/intra_process_manager.cpp:86
#15 0x00007ffff7bd3b67 in rclcpp::SubscriptionBase::~SubscriptionBase (this=0x55555619ac90, __in_chrg=<optimized out>)
    at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/subscription_base.cpp:96
--Type <RET> for more, q to quit, c to continue without paging--
pp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > >::~Subscription (this=0x55555619ac90, __in_chrg=<optimized out>)
    at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/include/rclcpp/subscription.hpp:67
#17 0x00007ffff7c02087 in __gnu_cxx::new_allocator<rclcpp::Subscription<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > > >::destroy<rclcpp::Subscription<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > > > (this=0x55555619ac90, __p=0x55555619ac90)
    at /usr/include/c++/9/ext/new_allocator.h:153
#18 0x00007ffff7c00065 in std::allocator_traits<std::allocator<rclcpp::Subscription<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > > > >::destroy<rclcpp::Subscription<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > > > (__a=..., __p=0x55555619ac90)
    at /usr/include/c++/9/bits/alloc_traits.h:497

#19 0x00007ffff7bfd13d in std::_Sp_counted_ptr_inplace<rclcpp::Subscription<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > >, std::allocator<rclcpp::Subscription<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > > >, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x55555619ac80)
    at /usr/include/c++/9/bits/shared_ptr_base.h:557
#20 0x000055555563c380 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55555619ac80) at /usr/include/c++/9/bits/shared_ptr_base.h:155
#21 0x0000555555637ec3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7ffffffeb7d8, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#22 0x00007ffff7bd9ad8 in std::__shared_ptr<rclcpp::Subscription<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7ffffffeb7d0, __in_chrg=<optimized out>)
    at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#23 0x00007ffff7bda7aa in std::__shared_ptr<rclcpp::Subscription<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<rcl_interfaces::msg::ParameterEvent_<std::allocator<void> >, std::allocator<void> > >, (__gnu_cxx::_Lock_policy)2>::reset (this=0x5555561a2b80) at /usr/include/c++/9/bits/shared_ptr_base.h:1287
#24 0x00007ffff7bd785e in rclcpp::TimeSource::detachNode (this=0x5555561a2ac8) at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/time_source.cpp:132
#25 0x00007ffff7bd7ed0 in rclcpp::TimeSource::~TimeSource (this=0x5555561a2ac8, __in_chrg=<optimized out>) at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/time_source.cpp:180
#26 0x00007ffff7b5ec0e in rclcpp::node_interfaces::NodeTimeSource::~NodeTimeSource (this=0x5555561a2a50, __in_chrg=<optimized out>)
    at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/node_interfaces/node_time_source.cpp:49
#27 0x00007ffff7b5ecaa in rclcpp::node_interfaces::NodeTimeSource::~NodeTimeSource (this=0x5555561a2a50, __in_chrg=<optimized out>)
    at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/node_interfaces/node_time_source.cpp:50
#28 0x00007ffff7b1b780 in std::_Sp_counted_ptr<rclcpp::node_interfaces::NodeTimeSource*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555556198760) at /usr/include/c++/9/bits/shared_ptr_base.h:377
#29 0x000055555563c380 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555556198760) at /usr/include/c++/9/bits/shared_ptr_base.h:155
#30 0x0000555555637ec3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x555555fbb9c0, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#31 0x00007ffff7b1416c in std::__shared_ptr<rclcpp::node_interfaces::NodeTimeSourceInterface, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x555555fbb9b8, __in_chrg=<optimized out>)
    at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#32 0x00007ffff7b1418c in std::shared_ptr<rclcpp::node_interfaces::NodeTimeSourceInterface>::~shared_ptr (this=0x555555fbb9b8, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#33 0x00007ffff7b11fb0 in rclcpp::Node::~Node (this=0x555555fbb920, __in_chrg=<optimized out>) at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/node.cpp:185
#34 0x000055555568cec5 in __gnu_cxx::new_allocator<rclcpp::Node>::destroy<rclcpp::Node> (this=0x555555fbb920, __p=0x555555fbb920) at /usr/include/c++/9/ext/new_allocator.h:153
#35 0x000055555568b483 in std::allocator_traits<std::allocator<rclcpp::Node> >::destroy<rclcpp::Node> (__a=..., __p=0x555555fbb920) at /usr/include/c++/9/bits/alloc_traits.h:497
#36 0x0000555555687d19 in std::_Sp_counted_ptr_inplace<rclcpp::Node, std::allocator<rclcpp::Node>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555555fbb910) at /usr/include/c++/9/bits/shared_ptr_base.h:557
#37 0x000055555563c380 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555555fbb910) at /usr/include/c++/9/bits/shared_ptr_base.h:155
#38 0x0000555555637ec3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x555555fbef50, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#39 0x00005555556351de in std::__shared_ptr<rclcpp::Node, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x555555fbef48, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#40 0x00005555556351fe in std::shared_ptr<rclcpp::Node>::~shared_ptr (this=0x555555fbef48, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103

Expected behavior

Run through each test case.

Actual behavior

The first test case passed OK. But test program crashed when entering the second test case.

@clalancette
Copy link
Contributor

Are you really using Eloquent on Ubuntu 20.04? That isn't a supported combination (and in point of fact, Eloquent is end-of-life now anyway). Can you try this on Foxy on Ubuntu 20.04 and see if you are still running into problems?

@ivanpauno ivanpauno added the bug Something isn't working label Jan 20, 2021
@LouisChen1905 LouisChen1905 reopened this Jan 21, 2021
@iuhilnehc-ynos
Copy link
Collaborator

@LouisChen1905

I am interested in this issue.
Could you provide a sample or test that can be built and run to reproduce this problem?

BTW:

#14 0x00007ffff7af2d3e in rclcpp::experimental::IntraProcessManager::remove_subscription (this=0x55555617aea0, 
    intra_process_subscription_id=2)
    at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/intra_process_manager.cpp:86

Did you change the rclcpp source code? (https://github.com/ros2/rclcpp/blob/eloquent/rclcpp/src/rclcpp/intra_process_manager.cpp#L83-L87)

About std::system_error Resource deadlock avoided!, it seems there are two places to call std::unique_lock<std::shared_timed_mutex> lock(mutex_); in one thread.

@LouisChen1905
Copy link
Author

@iuhilnehc-ynos

I’ll write a sample program and upload soon.

@LouisChen1905
Copy link
Author

LouisChen1905 commented Jan 21, 2021

@iuhilnehc-ynos

Did you change the rclcpp source code?

I didn't change the source code.

Finally, I found the way to reproduce the error. If I implement the callback of subscription in the TestClass, the program runs ok. If I implement the callback in another class Module1 whose instance is a member of TestClass, the error is reproduced.

class TestClass
{
public:
  static std::shared_ptr<TestClass> getInstance(){
    static std::shared_ptr<TestClass> instance;
    if(!instance)
      instance.reset(new TestClass());
    return instance;
  }

  bool init(rclcpp::Node::SharedPtr node){
    node_ = node;

    rclcpp::Parameter param;
    node_->get_parameter("topic1", param);
    std::string topic = param.get_value<std::string>();
    RCLCPP_INFO_STREAM(node_->get_logger(), "read topic1: " <<topic);

    module1 = std::make_shared<Module1>(node_);
    //NOTE: uncomment this line to reproduce the bug
//    auto fcn = std::bind(&Module1::stringCallback, module1, std::placeholders::_1);
    // run ok
    auto fcn = std::bind(&TestClass::stringCallback, this, std::placeholders::_1);
    stringSub_ = node_->create_subscription<std_msgs::msg::String>(
          "/string",
          1,
          fcn);

    int16Pub_ = node_->create_publisher<std_msgs::msg::Int16>(
          "/int",1);
    auto pubCB = [this](){
      std_msgs::msg::Int16 msg;
      msg.set__data(1);
      int16Pub_->publish(msg);
    };

    pubTimer_ = node_->create_wall_timer(10ms, pubCB);

    return true;
  }

  void stringCallback(std_msgs::msg::String::SharedPtr msg){
//    RCLCPP_INFO(node_->get_logger(), "received message");
  }

protected:
  TestClass();
private:
  rclcpp::Node::SharedPtr node_;

  rclcpp::Subscription<std_msgs::msg::String>::SharedPtr stringSub_;

  rclcpp::Publisher<std_msgs::msg::Int16>::SharedPtr int16Pub_;
  rclcpp::TimerBase::SharedPtr pubTimer_;

  std::shared_ptr<Module1> module1;
};

The project is upload in the attached zip file.
bug_test.zip

@LouisChen1905
Copy link
Author

The problem also has something to do with the singleton implementation of TestClass. When TestClass is written like:

class TestClass
{
public:
/*  static std::shared_ptr<TestClass> getInstance(){
    static std::shared_ptr<TestClass> instance;
    if(!instance)
      instance.reset(new TestClass());
    return instance;
  }*/

  bool init(rclcpp::Node::SharedPtr node){
    node_ = node;

    rclcpp::Parameter param;
    node_->get_parameter("topic1", param);
    std::string topic = param.get_value<std::string>();
    RCLCPP_INFO_STREAM(node_->get_logger(), "read topic1: " <<topic);

    module1 = std::make_shared<Module1>(node_);
    //NOTE: uncomment this line to reproduce the bug
    auto fcn = std::bind(&Module1::stringCallback, module1, std::placeholders::_1);
    // run ok
//    auto fcn = std::bind(&TestClass::stringCallback, this, std::placeholders::_1);
    stringSub_ = node_->create_subscription<std_msgs::msg::String>(
          "/string",
          1,
          fcn);

    int16Pub_ = node_->create_publisher<std_msgs::msg::Int16>(
          "/int",1);
    auto pubCB = [this](){
      std_msgs::msg::Int16 msg;
      msg.set__data(1);
      int16Pub_->publish(msg);
    };

    pubTimer_ = node_->create_wall_timer(10ms, pubCB);

    return true;
  }

  void stringCallback(std_msgs::msg::String::SharedPtr msg){
//    RCLCPP_INFO(node_->get_logger(), "received message");
  }

  public:
  TestClass();
private:
  rclcpp::Node::SharedPtr node_;

  rclcpp::Subscription<std_msgs::msg::String>::SharedPtr stringSub_;

  rclcpp::Publisher<std_msgs::msg::Int16>::SharedPtr int16Pub_;
  rclcpp::TimerBase::SharedPtr pubTimer_;

  std::shared_ptr<Module1> module1;
};

And SetUp() is:

void SetUp(){
    auto context = contexts::default_context::get_global_default_context();
    auto options = NodeOptions()
            .context(context)
        // BUG: if set true, test would crash: std::system_error Resource deadlock avoided!
            .use_intra_process_comms(true)
            .automatically_declare_parameters_from_overrides(true);
    node_ = std::make_shared<Node>("test_bug", options);

    // NOTE:uncomment this line to reproduce bug
//    TestClass::getInstance()->init(node_);

    testClass_ = std::make_shared<TestClass>();
    testClass_->init(node_);

    pubNode_ = std::make_shared<Node>("pub_node", options);
    stringPub_ = pubNode_->create_publisher<std_msgs::msg::String>("/string", 1);
    auto pubTimerCB = [this](){
      std_msgs::msg::String msg;
      msg.set__data("test");
      stringPub_->publish(std::move(msg));
    };
    pubTimer_ = pubNode_->create_wall_timer(100ms, pubTimerCB);

//    auto intCallback = [this](std_msgs::msg::Int16::SharedPtr msg){
//      RCLCPP_INFO(pubNode_->get_logger(), "Receive int message");
//    };
//    intSub_ = pubNode_->create_subscription<Int16>("/int", 1, intCallback);

    executor_ = std::make_shared<executors::MultiThreadedExecutor>();
    executor_->add_node(node_);
    executor_->add_node(pubNode_);
  }

the test program runs OK.

@iuhilnehc-ynos
Copy link
Collaborator

I don't want to argue about whether the design of the sample is reasonable or not,
but I think rclcpp should not make users' application aborted no matter how user application to use this library.

We can find that

std::unique_lock<std::shared_timed_mutex> lock(mutex_);
subscriptions_.erase(intra_process_subscription_id);

  subscriptions_.erase(intra_process_subscription_id);
  ^^^^^^^^^^^^^^^^^^^^ // There exists a case to destroy
                       // `AnySubscriptionCallback any_callback_` of SubscriptionIntraProcess
                       //, which might keep the lifecycle of node that will call `IntraProcessManager::remove_subscription`.

How about adding a new recusive_mutex for subscriptions_? @ivanpauno , do you have any suggestions?

@fujitatomoya
Copy link
Collaborator

@iuhilnehc-ynos

How about adding a new recusive_mutex for subscriptions_?

could you elaborate why adding recursive_mutex is gonna address this issue?

the root cause of this is dead lock as following,

#9  0x00007ffff6b0f77f in std::__throw_system_error(int) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff7af3a99 in std::__shared_mutex_pthread::lock (this=0x55555617af50) at /usr/include/c++/9/shared_mutex:188
#11 0x00007ffff7af3b34 in std::shared_timed_mutex::lock (this=0x55555617af50) at /usr/include/c++/9/shared_mutex:459
#12 0x00007ffff7af519b in std::unique_lock<std::shared_timed_mutex>::lock (this=0x7ffffffea490)
    at /usr/include/c++/9/bits/unique_lock.h:141
#13 0x00007ffff7af45dd in std::unique_lock<std::shared_timed_mutex>::unique_lock (this=0x7ffffffea490, __m=...)
    at /usr/include/c++/9/bits/unique_lock.h:71
#14 0x00007ffff7af2d3e in rclcpp::experimental::IntraProcessManager::remove_subscription (this=0x55555617aea0, 
    intra_process_subscription_id=2)
    at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/intra_process_manager.cpp:86
#15 0x00007ffff7bd3b67 in rclcpp::SubscriptionBase::~SubscriptionBase (this=0x55555619ac90, __in_chrg=<optimized out>)
    at /home/chervon/ros2_eloquent_source/src/ros2/rclcpp/rclcpp/src/rclcpp/subscription_base.cpp:96

https://github.com/gcc-mirror/gcc/blob/e90bb02d5ba3853b4abb0c66f3e9a41bd03a4294/libstdc%2B%2B-v3/include/std/shared_mutex#L186-L188

trying to rwlock with writer access, it detects EDEADLK from glibc then raise exception.

@iuhilnehc-ynos
Copy link
Collaborator

iuhilnehc-ynos commented Jan 22, 2021

@fujitatomoya

trying to rwlock with writer access, it detects EDEADLK from glibc then raise exception.

About std::system_error Resource deadlock avoided!, it seems there are two places to call std::unique_lockstd::shared_timed_mutex lock(mutex_); in one thread. (As I mentioned at #1519 (comment))

a thread to run:
{
 std::unique_lock<std::shared_timed_mutex> lock(mutex_); 
 subscriptions_.erase(intra_process_subscription_id);   ==> if `erase` lead to run `std::unique_lock<std::shared_timed_mutex> lock(mutex_);` again, it throws exception(`EDEADLK`).
}

which kind of mutex do you think can fix this error? From what I know, recursive_mutext is a better choice.

@fujitatomoya
Copy link
Collaborator

if erase lead to run std::unique_lock<std::shared_timed_mutex> lock(mutex_); again, it throws exception(EDEADLK).

i think in that case, device_or_resource_busy (EBUSY) should be seen on stack trace because that is not a deadlock?

@iuhilnehc-ynos
Copy link
Collaborator

iuhilnehc-ynos commented Jan 22, 2021

@fujitatomoya

i think in that case, device_or_resource_busy (EBUSY) should be seen on stack trace because that is not a deadlock?

Thank you for your suggestion.

I think lock a mutex twice is also a simple type of deadlock,
so calling std::unique_lock<std::shared_timed_mutex> lock(mutex_); twice to throw EDEADLK is acceptable.

Based on your suggestion about throwing an exception in this case, so it's the users' responsibility to catch the exception.
IMO, users don't know what happened even if catching the exception. I'll wait to hear others' opinions.

@LouisChen1905
Copy link
Author

LouisChen1905 commented Jan 24, 2021

@iuhilnehc-ynos
Is it better for the output behaviour of rclcppfor users to be independent of the configuration input by NodeOptions. Since if .use_intra_process_comms(true) the code chrashes. If parameter is false, it runs without crashing. AFAK, the differencee between them is the different communicaton mechanisms leading to higher or lower efficiency. If one the options makes user's program crash, while the other does not, the user may get confused.

@iluetkeb
Copy link
Contributor

@LouisChen1905 I think you tagged me by mistake, right? I haven't been involved in this ticket before.

@iuhilnehc-ynos
Copy link
Collaborator

iuhilnehc-ynos commented Jan 28, 2021

$ git clone https://github.com/iuhilnehc-ynos/ros2_issues_test.git
$ cd ros2_issues_test
$ source /opt/ros/foxy/setup.bash      ## or use ros2 master workspace
$ colcon build --packages-select rclcpp_issue_1519       
$ source install/setup.bash
$ ./build/rclcpp_issue_1519/test1519
  • PR to fix this issue (for rclcpp master branch)

#1530

@iuhilnehc-ynos
Copy link
Collaborator

@LouisChen1905

Could you run your sample with master branch, at your convenience? Consider this PR breaking the ABI, I'll not provide a patch for eloquent or foxy.

If you have any questions, please reopen this. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants